Computational Vision CSCI 363, Fall 2012 Lecture 17 Stereopsis II
Random Dot Stereogram
Crossed and uncrossed disparity 1 uncrossed (negative) disparity plane of fixation 2 crossed (positive) disparity
Angular disparity A plane of fixation a D crossed (positive) disparity B a b D i plane of fixation crossed (positive) disparity If fixating on A, the disparity of B is given by: d = b - a > 0 if B is closer than A < 0 if B is further than A a = 2tan-1(i/(2D))
Discrimination for near and far distances a changes rapidly for small D a changes slowly for large D Humans can discriminate surfaces that differ in disparity by about 5 seconds of arc. a D At D = .5 m, 5'' of arc = 0.015 cm in distance At D = 5 m, 5" of arc = 2 cm in distance
Stereo processing To determine depth from stereo disparity: Extract the "features" from the left and right images For each feature in the left image, find the corresponding feature in the right image. Measure the disparity between the two images of the feature. Use the disparity to compute the 3D location of the feature.
The Correspondence problem How do you determine which features from one image match features in the other image? (This problem is known as the correspondence problem). This could be accomplished if each image has well defined shapes or colors that can be matched. Problem: Random dot stereograms. Left Image Right Image Making a stereogram
Using Constraints to Solve the Problem To solve the correspondence problem, we need to make some assumptions (constraints) about how the matching is accomplished. Constraints used by many computer vision stereo algorithms: Uniqueness: Each point has at most one match in the other image. Similarity: Each feature matches a similar feature in the other image (i.e. you cannot match a white dot with a black dot). Continuity: Disparity tends to vary slowly across a surface. (Note: this is violated at depth edges). Epipolar constraint: Given a point in the image of one eye, the matching point in the image for the other eye must lie along a single line.
The Marr-Poggio algorithm The Marr-Poggio algorithm uses the 4 constraints to find good matches for a stereo pair. Input Left Right The input is a pair of images consisting of 1's and 0's (like a random dot stereogram, representing white and black dots).
Output of Marr Poggio disparity (limited to +/- 3 pixels) The output of the algorithm is a 3D array, C(x, y, d), where d is the disparity. A 1 is placed in a given position if there is evidence for that disparity at that position. A 0 indicates there is little or no evidence for that disparity at that position. Initially a 1 is placed in all possible matched disparity positions. disparity (limited to +/- 3 pixels) C(x, y, d) = 0 or 1
Using Constraints to solve the problem Our goal is to use an interactive process to change the state of the cells in the array until it reaches a final state with the correct disparities. We will use constraints to change the support for different matches. Two constraints are already incorporated in the initial state: Similarity: We only match 1's with 1's. Epipolar: We only match 1's with other 1's in the same horizontal row. Initially: C(x, y, d) = 1 if L(x,y) = R(x+d, y) 0 otherwise
Examining the 3D array We can examine the 3D array by looking at 2D slices. This will give us evidence for and against a given match. disparity Consider the possible match at (4, 4, 2).
Using the uniqueness constraint x x-d slice for y = 4 The number of other matches in a given column decreases support for a given match. d The uniqueness constraint is violated. We cannot have more than one disparity match for each position. Matches at (4, 4, -1) and (4, 4, -3) provide evidence against the match at (4, 4, 2).
Continuity x x-y plane at d = +2 The continuity constraint implies that we expect the neighbors of a true match to have similar disparities. y We have many 1's in neighboring positions at disparity +2. This provides support for the match at (4, 4, 2).
Other Possible matches for the right point If the point at (4,4) in the left image has disparity of +2, it matches the point at (6, 4) in the right image. Input Left Right If there are other points in the left image that can match (6,4) in the right image, this will count against a disparity of +2 for (4, 4)L
Another uniqueness measure x x-d slice through y = 4 If C(4, 4, 2) is correct, then the dot in the left image at (4,4) matches the dot in the right image at (6, 4). If there is strong evidence for a different match for the right image dot, this is evidence against the match at (4, 4, 2). d 1's along the diagonal reflect other possible matches for the right image dot. These decrease support for the match at (4, 4, 2).
Changing the states We incorporate the uniqueness and continuity constraints by computing a new state for each disparity at each location over numerous iterations. We compute each new state by taking into account the number of nearby neighbors with the same disparity (this provides support). We also take into account the number of other possible disparity matches that are supported at that location (this decreases support).
Putting it all together S = support, Ct(x, y, d) = state at time t, E = positive evidence, I = negative evidence, e = weighting factor, T = threshold S = Ct(x, y, d) + E - eI Ct+1(x, y, d) = 1 if S >= T Ct+1(x, y, d) = 0 if S < T Iterate until values don't change much with each iteration. Example: S(4, 4, 2) = 1 + 10 - e3 if e = 2.0 and T = 3.0 S(4, 4, 2) = 5; Ct+1(4, 4, 2) = 1
Demonstration of how it works Input stereogram State of network over time (Brightness indicates disparity).
Summary of Marr-Poggio Horizontal lines: constant Right eye position. Vertical lines: Constant left eye position. Diagonal lines: Constant disparity.
Limitations of the Marr-Poggio Algorithm Problems with Marr-Poggio algorithm: Does not specify what features are being matched. Does not make use of zero crossings or multiple spatial scales. Does not make use of vergence eye movements. Importance of Vergence Eye Movements: Humans can only fuse images with disparities of about +/- 10 min of arc. This range is called Panum's fusional area. If we need to fuse the images of an object that is nearer or farther than this, we make vergence eye movements to change the plane of fixation. Humans make many vergence eye movements as we scan the environment.