Last Week Recognized the fact that the 2D image is a representation of a 3D scene thus contains a consistent interpretation –Labeled edges –Labeled vertices Matching techniques for object recognition –Graph theoretic –Relaxation –Perceptual organization (neural networks)
This Week Look at direct measurement of 3D attributes via stereo cameras Look at other uses of matching –Stereo correspondence –Motion correspondence
Stereo Vision Goal is to extract scene depth via multiple monocular images with a passive sensor –Note that this can be done by other “active” means such as LIDAR (LIght Detection And Ranging)
Stereo Vision Humans do it well from a single image and very, very well through stereo images Not well understood what the mechanism is –We understand the biological design, but not the exact algorithm Goal of computer vision is not to mimic the mechanics of the biological system, but to mimic the functionality of the system
Stereo Vision Depth information will be used to… –Differentiate objects from background –Differentiate objects from one another –Expose camouflaged objects Basic method is to take advantage of the lateral displacement of the image of a 3D object in two cameras with different, but overlapping views –Lateral displacement is also known as disparity
Stereo Vision Two sub-problems –Correspondence problem The problem of measuring the disparity of each point in the two eye (camera) projections –Interpretation problem The use of disparity information to recover the orientation and distance of surfaces in the scene
Stereo Algorithmic Steps Basic steps to be performed in any stereo imaging system –Image Acquisition –Camera Modeling –Feature Extraction –Image Matching –Depth Determination –Depth Interpolation
Image Acquisition Just as the name implies Capturing two images with a very specific camera geometry
Camera Modeling Related to Image Acquisition For accurate depth results the camera parameters must be known Also, the relationship between the two cameras must be known
Stereo Imaging Geometry f (focal length) Right Camera Axis Left Camera Axis Stereo Baseline B Right Image Left Image Scene The result is two images that are slightly different
Feature Acquisition These are the image objects that will be matched between the left and right images –Gray level pixel based –Edge based –Line based –Region based –Hybrid approaches All techniques have been tried –All provide some degree of success –All have drawbacks
Image Matching By far the most difficult part of the stereo problem Also called the “stereo correspondence problem” When people “study” stereo imaging, this is generally what they are looking at The question is: Which parts (pixels, edges, lines, etc.) of the left image correspond to which parts of the right image?
Image Matching Gray level based –Take a section of one image and use it as a convolution mask over the other Edge based –Extract edges then take a section of one edge image and used it as a convolution mask over the other Line based –Extract edges, form line segments, then match using a relaxation technique Region based –Extract regions then match using a relaxation technique Hybrid approach –Use matched regions (or lines) as guides to further pixel level matches
Image Matching Issues Density of depth map –Would like to have a depth measurement at every image pixel This means a correspondence between every pixel in each image must be made –Clearly difficult (if not impossible) to do Gray level matching is the only real hope All other approaches will not provide a dense map, especially the region based approach Thus the study of hybrid algorithms
Depth Map CSC508 15
Image Matching Issues Photometric variation –The two cameras image the scene from two different viewpoints, by definition –Thus the lighting on the scene differs for the two cameras Shadows, reflectance, etc. –Affects all matching and feature extraction techniques
Image Matching Issues Occlusion –When the image of one object is blocked by another in one of the two cameras It’s a 3D scene so this will happen! –Some features will show up in one image and not the other thus making matching impossible –Affects all matching and feature extraction techniques
Image Matching Issues Repetitive texture –i.e. A brick wall (or any other regular, repeated pattern texture) –Makes the matching process very difficult although some sort of a relaxation algorithm may address the issue –Region based matching may be used to address this issue
Image Matching Issues Lack of texture –i.e. Smooth, feature-less objects –If there are no features, there is no way to match –Region based matching may be used to address this issue
Depth Determination It’s all math! –And relatively simple math at that.
CSC Depth Determination P l (X l,Y l ) P r (X r,Y r ) (X w,0,Z w ) P w (X w,Y w,Z w ) XwXw YwYw ZwZw (X w,0,0) XlXl YlYl YrYr XrXr f (focal length) Right Camera Axis Left Camera Axis Stereo Baseline B Right Image Left Image
Depth Determination Depth (distance of a pixel location to the baseline) can be determined through simple algebraic and geometric relationships is referred to as the stereo disparity –i.e. the difference in how the two cameras saw an object
Depth Interpolation We want to describe surfaces, not individual points In the event that we don’t get a dense depth map (and we rarely do) we must interpolate the missing points –What we get is called a sparse depth map
Depth Interpolation Three basic methods –Relaxation – surface fitting with constraints Similar in nature to the relaxation labeling –Analytic – surface fitting to a specified model (equation) –Heuristic – use of local neighborhoods and predetermined rules Use of “educated guesses” and “higher level scene knowledge” – AI technique
Assumptions To Make Life Easier From psychological studies… –In light of ambiguities in the matching problem, matches which preserve “figural continuity” are to be preferred –That is, we prefer smooth surfaces over sharp changes –This isn’t really a problem since the sharp changes [in all likelihood] won’t result in ambiguities
Assumptions To Make Life Easier Epipolarity (epipolar lines) –The camera geometry can be defined such that a point feature in one image must lie on a specific line in the other image –This constrains the search to multiple 1D problems
Epipolar Lines P l (X l,Y l ) P r (X r,Y r ) (X w,0,Z w ) P w (X w,Y w,Z w ) XwXw YwYw ZwZw (X w,0,0) XlXl YlYl YrYr XrXr f (focal length) Right Camera Axis Left Camera Axis Stereo Baseline B Right Image Left Image
Stereo Pair Images Left Camera Right Camera
Depth Map Rendering
Gray Level Rendering
Final Thoughts Yes, it can be done with more than two cameras –This improves the accuracy of (removes ambiguity from) the match Yes, it can be done with one camera –Simply move the camera along the baseline snapping pictures as it goes
Motion Processing Whereas stereo processing worked on two (or more) frames taken at the same time, motion processing works on two (or more) frames taken at different times
Motion Processing Uses for motion processing –Scene segmentation –Motion detection (is something moving?) Security applications –Motion estimation (how is the object moving?) MPEG uses this to predict future frames –3D structure determination Multiple views of an object as it moves –Object tracking Defense industry makes great use of this –Separate camera motion from object motion Camera stablization
Motion Processing Approaches range from simple… –Frame-to-frame subtraction to intermediate… –Frame-to-frame correspondence to difficult… –Statistical based processing for tracking
Correspondence The frame-to-frame correspondence problem is essentially the same as that for stereo processing –But, it may be more difficult since… objects may be moving towards the camera (they get larger) objects may be moving away from the camera (they get smaller) objects may be rotating (they change shape)
Frame Subtraction Avoids the correspondence operation all together Problems arise in that objects lacking texture do not get detected We also must address the threshold selection problem Assumes that the scene changes will be small due to the short time duration between frames Variations include learning the background (static scene) and subtracting it from the live (dynamic scene)
Frame Subtraction Frame(n)Frame(n + 1) Frame(n) - Frame(n + 1) enhanced
Optical Flow Apparent motion of the brightness patterns within an image You end up with pictures as shown –In this case the camera was moving towards the object
Another Example
Optical Flow Its basically frame-to-frame subtraction with a lot more information From the optical flow field various parameters can be measured –Object shape –Object segmentation –Camera motion –Multiple object motions
Motion Estimation in MPEG Select an image block from frame f n Select a larger image block from frame f n+1 Center the f n block on the f n+1 block Compute correlation between the two blocks Spiral the f n block outward on the f n+1 block until the correlation yields a suitable response Image block from frame f n Image block from frame f n+1
Motion Estimation in MPEG The basic scheme using gray level correlation (matching) works because the premise is that there will be very small motions between frames In the event of large motions or illumination changes (or any other “drastic” changes) the systems reinitializes and doesn’t try to use any motion information
Object Tracking This is essentially motion prediction After observing a moving object can we predict where it will appear in the next frame?
Object Tracking Can be as simple as a low pass filter –A weighted average of the object’s position in previous frames –Heavily weight the newest frames Can be a complex statistical model taking into account noisy measurements –Kalman Filter As your confidence in the prediction increases the window in which you must perform the correspondence decreases in size –Basically, you’re trying to reduce the time to search
Summary We have merely touched on the basics of Computer Vision There is much, much more Hopefully, with this introduction you will be able to pursue other topic areas on your own
CSC Things To Do Final Exam due next week Course evaluation this week (online)