Structure from Motion, Feature Tracking, and Optical Flow Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/15/10 Many slides adapted.

Structure from Motion, Feature Tracking, and Optical Flow Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/15/10 Many slides adapted from Lana Lazebnik, Silvio Saverse, who in turn adapted slides from Steve Seitz, Rick Szeliski, Martial Hebert, Mark Pollefeys, and others

Last class Estimating 3D points and depth – Triangulation from corresponding points – Dense stereo Projective structure from motion

This class Factorization method for structure from motion Feature tracking Optical flow (dense tracking)

Structure from motion under orthographic projection 3D Reconstruction of a Rotating Ping-Pong Ball C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2):137-154, November 1992.Shape and motion from image streams under orthography: A factorization method. Reasonable choice when Change in depth of points in scene is much smaller than distance to camera Cameras do not move towards or away from the scene

Start with an affine camera model Affine projection is a linear mapping + translation in inhomogeneous coordinates x X a1a1 a2a2 Projection of world origin Does this ever happen?

Get rid of b by shifting to centroid 2d normalized point (observed) 3d normalized point Linear (affine) mapping

Suppose we know 3D points and affine camera parameters … then, we can compute the observed 2d positions of each point Camera Parameters (2mx3) 3D Points (3xn) 2D Image Points (2mxn) What rank is the matrix of 2D points?

Affine structure from motion Centering: subtract the centroid of the image points For simplicity, assume that the origin of the world coordinate system is at the centroid of the 3D points After centering, each normalized point x ij is related to the 3D point X i by

Affine structure from motion Given: m images of n fixed 3D points: x ij = A i X j + b i, i = 1,…, m, j = 1, …, n Problem: use the mn correspondences x ij to estimate m projection matrices A i and translation vectors b i, and n points X j The reconstruction is defined up to an arbitrary affine transformation Q (12 degrees of freedom): We have 2mn knowns and 8m + 3n unknowns (minus 12 dof for affine ambiguity) Thus, we must have 2mn >= 8m + 3n – 12 For two views, we need four point correspondences

What if we instead observe corresponding 2d image points? Can we recover the camera parameters and 3d points? cameras (2 m) points (n)

Affine structure from motion Let’s create a 2m × n data (measurement) matrix: cameras (2 m × 3) points (3 × n) The measurement matrix D = MS must have rank 3! C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2):137-154, November 1992.Shape and motion from image streams under orthography: A factorization method.

Factorizing the measurement matrix Source: M. Hebert AX

Factorizing the measurement matrix Source: M. Hebert Singular value decomposition of D:

Factorizing the measurement matrix Source: M. Hebert Obtaining a factorization from SVD:

Affine ambiguity The decomposition is not unique. We get the same D by using any 3×3 matrix C and applying the transformations A → AC, X →C -1 X That is because we have only an affine transformation and we have not enforced any Euclidean constraints (like forcing the image axes to be perpendicular, for example) Source: M. Hebert

Orthographic: image axes are perpendicular and of unit length Eliminating the affine ambiguity x X a1a1 a2a2 a 1 · a 2 = 0 |a 1 | 2 = |a 2 | 2 = 1 Source: M. Hebert

Solve for orthographic constraints Solve for L = CC T Recover C from L by Cholesky decomposition: L = CC T Update A and X: A = AC, X = C -1 X where ~ ~ Three equations for each image i

Algorithm summary Given: m images and n tracked features x ij For each image i, center the feature coordinates Construct a 2m × n measurement matrix D: – Column j contains the projection of point j in all views – Row i contains one coordinate of the projections of all the n points in image i Factorize D: – Compute SVD: D = U W V T – Create U 3 by taking the first 3 columns of U – Create V 3 by taking the first 3 columns of V – Create W 3 by taking the upper left 3 × 3 block of W Create the motion and shape matrices: – M = U 3 W 3 ½ and S = W 3 ½ V 3 T (or M = U 3 and S = W 3 V 3 T ) Eliminate affine ambiguity Source: M. Hebert

Dealing with missing data So far, we have assumed that all points are visible in all views In reality, the measurement matrix typically looks something like this: One solution: – solve using a dense submatrix of visible points (as in last lecture) – Iteratively add new cameras cameras points

Reconstruction results C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2):137-154, November 1992.Shape and motion from image streams under orthography: A factorization method.

Dealing with missing data Possible solution: decompose matrix into dense sub- blocks, factorize each sub-block, and fuse the results – Finding dense maximal sub-blocks of the matrix is NP- complete (equivalent to finding maximal cliques in a graph) Incremental bilinear refinement (1)Perform factorization on a dense sub-block (2) Solve for a new 3D point visible by at least two known cameras (linear least squares) (3) Solve for a new camera that sees at least three known 3D points (linear least squares) F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects. PAMI 2007.Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects.

Recovering motion Feature-tracking – Extract visual features (corners, textured areas) and “track” them over multiple frames Optical flow – Recover image motion at each pixel from spatio-temporal image brightness variations (optical flow) B. Lucas and T. Kanade. An iterative image registration technique with an application toAn iterative image registration technique with an application to stereo vision.stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 674–679, 1981. Two problems, one registration method

Feature tracking Last problem required corresponding points in images If motion is small, tracking is an easy way to get them

Feature tracking Challenges – Need good features to track – Points may appear or disappear: need to be able to add/delete tracked points – Some points may change appearance over time (e.g., due to rotation, moving into shadows, etc.) – Drift: small errors can accumulate if appearance model is updated

Feature tracking Given two subsequent frames, estimate the point translation Key assumptions of Lucas-Kanade Tracker Brightness constancy: projection of the same point looks the same in every frame Small motion: points do not move very far Spatial coherence: points move like their neighbors I(x,y,t–1)I(x,y,t)I(x,y,t)

Brightness Constancy Equation: Linearizing right hand side using Taylor expansion: The brightness constancy constraint I(x,y,t–1)I(x,y,t)I(x,y,t) Hence, Image derivative along x

The brightness constancy constraint How many equations and unknowns per pixel? The component of the motion perpendicular to the gradient (i.e., parallel to the edge) cannot be measured edge (u,v)(u,v) (u’,v’) gradient (u+u’,v+v’) If (u, v ) satisfies the equation, so does (u+u’, v+v’ ) if One equation (this is a scalar equation!), two unknowns (u,v) Can we use this equation to recover image motion (u,v) at each pixel?

The aperture problem Actual motion

The aperture problem Perceived motion

The barber pole illusion http://en.wikipedia.org/wiki/Barberpole_illusion

Solving the ambiguity… How to get more equations for a pixel? Spatial coherence constraint Assume the pixel’s neighbors have the same (u,v) – If we use a 5x5 window, that gives us 25 equations per pixel B. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 674–679, 1981.

Least squares problem: Solving the ambiguity…

Matching patches across images Overconstrained linear system The summations are over all pixels in the K x K window Least squares solution for d given by

Conditions for solvability – Optimal (u, v) satisfies Lucas-Kanade equation Does this remind you of anything? When is This Solvable? A T A should be invertible A T A should not be too small due to noise –eigenvalues 1 and 2 of A T A should not be too small A T A should be well-conditioned – 1 / 2 should not be too large ( 1 = larger eigenvalue) Criteria for Harris corner detector

Eigenvectors and eigenvalues of A T A relate to edge direction and magnitude The eigenvector associated with the larger eigenvalue points in the direction of fastest intensity change The other eigenvector is orthogonal to it M = A T A is the second moment matrix ! (Harris corner detector…)

Interpreting the eigenvalues 1 2 “Corner” 1 and 2 are large, 1 ~ 2 1 and 2 are small “Edge” 1 >> 2 “Edge” 2 >> 1 “Flat” region Classification of image points using eigenvalues of the second moment matrix:

Edge – gradients very large or very small – large  1, small 2

Low-texture region – gradients have small magnitude – small  1, small 2

High-texture region – gradients are different, large magnitudes – large  1, large 2

The aperture problem resolved Actual motion

The aperture problem resolved Perceived motion

Dealing with larger movements: Iterative refinement 1.Initialize (u,v) = (0,0) 2.Compute (u,v) by 3.Shift window by (u, v) 4.Repeat steps 2-3 until small change Only I t changes per iteration 2 nd moment matrix for feature patch in first image displacement I t = I(x, y, t-1) - I(x+u, y+v, t)

image I image J Gaussian pyramid of image 1 (t)Gaussian pyramid of image 2 (t+1) image 2 image 1 Dealing with larger movements: coarse-to- fine registration run iterative L-K upsample......

Shi-Tomasi feature tracker Find good features using eigenvalues of second- moment matrix – Key idea: “good” features to track are the ones whose motion can be estimated reliably From frame to frame, track with Lucas-Kanade – This amounts to assuming a translation model for frame-to-frame feature movement Check consistency of tracks by affine registration to the first observed instance of the feature – Affine model is more accurate for larger displacements – Comparing to the first frame helps to minimize drift J. Shi and C. Tomasi. Good Features to Track. CVPR 1994.Good Features to Track

Tracking example J. Shi and C. Tomasi. Good Features to Track. CVPR 1994.Good Features to Track

Summary of KLT tracking Find a good point to track (harris corner) Use intensity second moment matrix and difference across frames to find displacement Iterate and use coarse-to-fine search to deal with larger movements When creating long tracks, check appearance of registered patch against appearance of initial patch to find points that have drifted

Picture courtesy of Selim Temizer - Learning and Intelligent Systems (LIS) Group, MIT Optical flow Vector field function of the spatio-temporal image brightness variations

Motion and flow Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys, S. Saverese, L. Lazebnik

Motion and perceptual organization Sometimes, motion is the only cue

Motion and perceptual organization Even “impoverished” motion data can evoke a strong percept G. Johansson, “Visual Perception of Biological Motion and a Model For Its Analysis", Perception and Psychophysics 14, 201-211, 1973.

Uses of motion Estimating 3D structure Segmenting objects based on motion cues Learning and tracking dynamical models Recognizing events and activities Improving video quality (motion stabilization)

Motion field The motion field is the projection of the 3D scene motion into the image What would the motion field of a non-rotating ball moving towards the camera look like?

Optical flow Definition: optical flow is the apparent motion of brightness patterns in the image Ideally, optical flow would be the same as the motion field Have to be careful: apparent motion can be caused by lighting changes without any actual motion – Think of a uniform rotating sphere under fixed lighting vs. a stationary sphere under moving illumination

Lucas-Kanade Optical Flow Same as Lucas-Kanade feature tracking, but for each pixel – As we saw, works better for textured pixels Operations can be done one frame at a time, rather than pixel by pixel – Efficient

Multi-resolution Lucas Kanade Algorithm

60 Iterative Refinement Iterative Lukas-Kanade Algorithm 1.Estimate velocity at each pixel by solving Lucas- Kanade equations 2.Warp I(t-1) towards I(t) using the estimated flow field - use image warping techniques 3.Repeat until convergence * From Khurram Hassan-Shafique CAP5415 Computer Vision 2003

image I image J Gaussian pyramid of image 1 (t)Gaussian pyramid of image 2 (t+1) image 2 image 1 Coarse-to-fine optical flow estimation run iterative L-K warp & upsample......

image I image H Gaussian pyramid of image 1Gaussian pyramid of image 2 image 2 image 1 u=10 pixels u=5 pixels u=2.5 pixels u=1.25 pixels Coarse-to-fine optical flow estimation

Example * From Khurram Hassan-Shafique CAP5415 Computer Vision 2003

Multi-resolution registration * From Khurram Hassan-Shafique CAP5415 Computer Vision 2003

Optical Flow Results * From Khurram Hassan-Shafique CAP5415 Computer Vision 2003

Errors in Lucas-Kanade The motion is large – Possible Fix: Descriptor matching A point does not move like its neighbors – Possible Fix: Motion segmentation Brightness constancy does not hold – Possible Fix: Gradient constancy

State-of-the-art optical flow Start with something similar to Lucas-Kanade + gradient constancy + energy minimization with smoothing term + region matching + descriptor matching (long-range) Large displacement optical flowLarge displacement optical flow, Brox et al., CVPR 2009

Stereo vs. Optical Flow Similar dense matching procedures Why don’t we typically use epipolar constraints for optical flow? B. Lucas and T. Kanade. An iterative image registration technique with an application toAn iterative image registration technique with an application to stereo vision.stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 674–679, 1981.

Summary Major contributions from Lucas, Tomasi, Kanade – Structure from motion – Tracking feature points – Optical flow Key ideas – Factorization for special case of SFM – By assuming brightness constancy, truncated Taylor expansion leads to simple and fast patch matching across frames – Coarse-to-fine registration

Next class Kalman filter tracking (with David)

Structure from Motion, Feature Tracking, and Optical Flow Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/15/10 Many slides adapted.

Similar presentations

Presentation on theme: "Structure from Motion, Feature Tracking, and Optical Flow Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/15/10 Many slides adapted."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Structure from Motion, Feature Tracking, and Optical Flow Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/15/10 Many slides adapted.

Similar presentations

Presentation on theme: "Structure from Motion, Feature Tracking, and Optical Flow Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/15/10 Many slides adapted."— Presentation transcript:

Similar presentations

About project

Feedback