Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram based operation Image enhancements Feature extraction –Edges –Regions
Recap Mid Level Vision –Input: features from low level processing –Data structures: lists, arrays, heterogeneous –Processing: Pixel/feature grouping operations Model based operations Object descriptions –Lines – orientation, location –Regions – central moments –Relationships amongst objects
What’s Left To Do? High Level Vision –Input: Symbolic objects from mid level processing Models of the objects to be recognized (a priori knowledge) –Data structures: lists, arrays, heterogeneous –Processing: Object correspondence –Local correspondences represent individual object (or component) recognitions –Global correspondences represent recognition of an object in the context of the scene Search problem –Typified by the “NP-Complete” question
High Level Vision Goal is to interpret the 2D structure (image pixels) as a 3D scene Humans do this very well –This is a problem for computer vision researchers –Our competition is fierce and unrelenting To achieve our goal we don’t necessarily need to mimic the biological system –We’re not trying to explain how human vision works, we’re just trying to achieve comparable results
High Level Vision The 3D scene is made up of –Objects –Illumination due to light sources The appearance of boundaries between object surfaces is dependent on their orientation relative to the light source (and surface material, sensing device…) –This is where we get edges from
Labeling Edges In a 3D scene, edges have very specific meanings –They separate objects from one another Occlusion –They demark sudden changes in surface orientation within a single object –The demark sudden changed in surface albedo (light reflectance) –A shadow cast by an object
Labeling Edges Edge detectors can provide some information regarding the meaning
Labeling Edges But additional information must be provided through logic and reasoning Under some constrained “worlds” we can identify all possible edge and vertex types –“Blocks World” assumption Toy blocks Trihedral vertices –Sounds simple but much has been learned from studying such worlds
Labeling Edges Blade edges –One continuous surface occludes another –Surface normal changes smoothly –Curved surfaces –Single arrowhead
Labeling Edges Limb edges –One continuous surface occludes another –Surface normal changes smoothly and becomes perpendicular to the viewing angle –Surface ultimately occludes itself –Curved surfaces –Double arrowhead
Labeling Edges Mark edges –Change of albedo (reflectance) on the surface –A marking on the surface –No occlusion is involved –Any shape surfaces –M–M M
Labeling Edges Crease edges –Change in surface due to the joining of two surfaces –Can be convex or concave –No occlusion is involved –Abrupt changes – not smooth, curved surfaces –Convex (+), Concave (-)
Labeling Edges Illumination edges –Change in illumination –Typically a shadow –No surface changes –S–S S
Labeling Edges Jump edges –A blade or limb with a large depth discontinuity across the boundary
Labeling Edges This edge labeling scheme is proposed by a few researchers There are variations You don’t have to do it this way if it doesn’t suit the needs of your application Choose whatever scheme you want –Just make sure you are consistent
Vertices A Vertex is the place where two or more lines intersect Observations regarding the types of vertices possible when mapping 3D objects into a 2D space have been made Assumes trihedral corners –Restricted to a “blocks world” but may be useful elsewhere
Vertices L junctions ++--
Vertices Arrow junctions
Vertices Fork junctions
Vertices T junctions + -
Vertex Labeling Assume the shape is sitting on a table S S ? ? is special edge called a “crack” May be labeled as an “S”
Edge/Vertex Labeling To do such labeling programmatically one would employ a Relaxation algorithm –Essentially, you start by assigning all labels to all edges –Eliminate inconsistent labels based on the vertex type, working on all edges in parallel –Repeat until no more changes occur
Perceptual Objects Outside of the blocks world –Long, thin ribbon-like objects Made of [nearly] parallel lines Straight or Curved –Region objects Homogeneous intensity Homogeneous texture Bounded by well defined edges
Perceptual Objects
Perceptual Organization
Model Graph Representation R1 R2 R3 G1 Intersects(70) Intersects(40) Bounds Rx – Ribbon Structure Gx – Region Structure
Model Graph Representation Each node may have additional information –Ribbon nodes may have Width Orientation Intensity –Region nodes may have Moments Intensity
Model Contents The Model may be a 3D representation If camera information such as –Orientation –Distance –Lens –etc. is available, then… This information can be used to transform the model into the image space –Create a 2D rendering of the 3D model –The book refers to this a Pose Consistency –The reverse problem can also be estimated
Model Matching Matching can proceed after feature extraction –Extract features from the image –Create the scene graph –Match the scene graph to the model graph using graph theoretic methods Matching can proceed during feature extraction –Guide the areas of concentration for the feature detectors
Model Matching Matching that proceeds after feature extraction –Depth first tree search –Can be a very, very slow process –Heuristics may help Anchor to the most important/likely objects Matching that proceeds during feature extraction –Set up processing windows within the image –System may hallucinate (see things that aren’t really there)
Model Matching Difficult to make the system… –Orientation dependent –Illumination dependent “Difficult” doesn’t mean “impossible” –Just means it’ll take more time to perform the search
Relaxation Labeling Formally stated –An iterative process that attempts to assign labels to objects based on local constraints (based on an object’s description) and global constraints (how the assignment affects the labeling of other objects) The technique has been used for many matching applications –Object labeling –Stereo correspondence –Motion correspondence –Model matching
Perceptual Grouping How many rectangles are in the picture?
Perceptual Grouping How many rectangles are in the picture? –One possible answer
Perceptual Grouping It depends on what the picture represents –What is the underlying model? Toy blocks? Projective aerial view of a building complex? Rectangles drawn on a flat piece of paper? –Was the image sensor noisy? (long lines got broken up) –Depending on the answer, you may solve the problem with Relaxation labeling Graph matching Neural network based training/learning
Summary High level vision is probably the least understood –It requires more than just an understanding of detectors –It requires understanding of the data structures used to represent the objects and the logic structures for reasoning about the objects This is where computer vision moves from the highly mathematical field of image processing into the symbolic field of artificial intelligence
Things To Do Reading for Next Week –Multi-Frame Processing Chapter 10 – The Geometry of Multiple Views Chapter 11 – Stereopsis
Final Exam See handout