The Measurement of Visual Motion P. Anandan Microsoft Research
Direct Methods: Methods for motion and/or shape estimation, which recover the unknown parameters directly from measurable image quantities at each pixel in the image. Minimization step: Direct methods: Error measure based on dense measurable image quantities. Feature-based methods: Error measure based on distances of destinct features.
WHY BOTHER Visual Motion can be annoying –Camera instabilities, jitter –Measure it. Remove it. Visual Motion indicates dynamics in the scene –Moving objects, behavior –Track objects and analyze trajectories Visual Motion reveals spatial layout of the scene –Motion parallax
Video Enhancement Visual Motion can be annoying –Camera instabilities, jitter –Measure it. Remove it.
Temporal Information Visual Motion indicates dynamics in the scene –Moving objects, behavior –Track objects and analyze trajectories
Spatial Layout Visual Motion reveals spatial layout of the scene –Motion parallax Sprite Viewer Demo
Classes of Techniques Feature-based methods –Extract salient visual features (corners, textured areas) and track them over multiple frames –Analyze the global pattern of motion vectors of these features –Sparse motion fields, but possibly robust tracking –Suitable especially when image motion is large (10-s of pixels) Direct-methods –Directly recover image motion from spatio temporal image brightness variations –Global motion parameters directly recovered without an intermediate feature motion calculation –Dense motion fields, but more sensitive to appearance variations –Suitable for video and when image motion is small (< 10 pixels) Our Focus Today
Brightness Constancy Equation: The Brightness Constraint Or, better still, Minimize : Linearizing (assuming small (u,v)):
Gradient Constraint (or the Optical Flow Constraint) Minimizing: The gradient constraint – only one constraint for each pixel In general Hence,
Aperture Problem and Normal Flow
The gradient constraint: Defines a line in the (u,v) space u v Normal Flow:
Local Patch Analysis
Combining Local Constraints u v etc.
Patch Translation Minimizing Assume a single velocity for all pixels within an image patch On the LHS: sum of the 2x2 outer product tensor of the gradient vector
Singularities and the Aperture Problem Let Algorithm: At each pixel compute by solving M is singular if all gradient vectors point in the same direction -- e.g., along an edge -- of course, trivially singular if the summation is over a single pixel -- i.e., only normal flow is available (aperture problem) Corners and textured areas are OK and
Iterative Refinement Estimate velocity at each pixel using one iteration of Lucas and Kanade estimation Warp one image toward the other using the estimated flow field (easier said than done) Refine estimate by repeating the process
Motion and Gradients Consider 1-d signal; assume linear function of x x I t=0 t=1 t
Iterative refinement x t x t t x BUT!!
Limits of the gradient method 1.Fails when intensity structure within window is poor 2.Fails when the displacement is large (typical operating range is motion of 1 pixel) –Linearization of brightness is suitable only for small displacements Also, brightness is not strictly constant in images –actually less problematic than it appears, since we can pre-filter images to make them look similar
Pyramids Pyramids were introduced as a multi-resolution image computation paradigm in the early 80s. The most popular pyramid is the Burt pyramid, which foreshadows wavelets Two kinds of pyramids: Low pass or “Gaussian pyramid” Band-pass or “Laplacian pyramid”
Gaussian Pyramid Convolve image with a small Gaussian kernel –Typically 5x5 Subsample (decimate by 2) to get lower resolution image Repeat for more levels A sequence of low-pass filtered images
Laplacian pyramid Laplacian as Difference of Gaussian Band-pass filtered images Highlights edges at different spatial scales For matching, this is less sensitive to image illumination changes But more noisy than using Gaussians
image I image J JwJw warp refine + Pyramid of image JPyramid of image I image I image J Coarse-to-Fine Estimation u=10 pixels u=5 pixels u=2.5 pixels u=1.25 pixels ==> small u and v...
Global Motion Models 2D Models: Affine Quadratic Planar projective transform (Homography) 3D Models: Instantaneous camera motion models Homography+epipole Plane+Parallax
Example: Affine Motion Substituting into the B.C. Equation: Each pixel provides 1 linear constraint in 6 global unknowns (minimum 6 pixels necessary) Least Square Minimization (over all pixels):
Quadratic – instantaneous approximation to planar motion Other 2D Motion Models Projective – exact planar motion
3D Motion Models Local Parameter: Instantaneous camera motion: Global parameters: Homography+Epipole Local Parameter: Residual Planar Parallax Motion Global parameters: Local Parameter:
Correlation and SSD For larger displacements, do template matching –Define a small area around a pixel as the template –Match the template against each pixel within a search area in next image. –Use a match measure such as correlation, normalized correlation, or sum-of-squares difference –Choose the maximum (or minimum) as the match –Sub-pixel interpolation also possible
SSD Surface – Textured area
SSD Surface -- Edge
SSD Surface – homogeneous area
Discrete Search vs. Gradient Based Estimation Consider image I translated by The discrete search method simply searches for the best estimate. The gradient method linearizes the intensity function and solves for the estimate
Consider image I translated by Uncertainty in Local Estimation Now, This assumes uniform priors on the velocity field
Quadratic Approximation When are small After some fiddling around, we can show
Posterior uncertainty At edges is singular, but just take pseudo-inverse Note that the error is always convex, since is positive semi-definite i.e., even for occluded points and other false matches, this is the case… seems a bit odd!
Match plus confidence Numerically compute error for various Search for the peak Numerically fit a qudratic to around the peak Find sub-pixel estimate for and covariance If the matrix is negative, it is false match Or even better, if you can afford it, simply maintain a discrete sampling of and
Quadratic Approximation and Covariance Estimation When where
Choosing the Correlation Window Size Small windows lead to more false matches Large windows are better this way, but… –Neighboring flow vectors will be more correlated (since the template windows have more in common) –Flow resolution also lower (same reason) –More expensive to compute Another way to look at this: Small windows are good for local search but more precise and less smooth Large windows good for global search but less precise and more smooth method
Robust Estimation Standard Least Squares Estimation allows too much influence for outlying points
Robust Estimation Robust gradient constraint Robust SSD
Influence functions and Redescening Estimators GO TO THE BOARD
Reweighted Least-Squares Robust Minimization (over all pixels): An Outlier Measure Residual normal flow
Outlier rejection Original Outliers Synthesized
Locking Property ORIGINAL ENHNACED
Original sequencePlane-aligned sequenceRecovered shape “block sequence” [Kumar-Anandan-Hanna’94] Dense 3D Reconstruction (Plane+Parallax)
Original sequence Plane-aligned sequence Recovered shape
Dense 3D Reconstruction Brightness Constancy constraint The intersection of the two line constraints uniquely defines the displacement. Epipolar line epipole p
Limitations Limited search range (up to 10% of the image size). Brightness constancy.
Multi-sensor Alignment Originals: IR and EO images
Direct Correlation Based Alignment
Example: Affine Motion Substituting into the B.C. Equation: Each pixel provides 1 linear constraint in 6 global unknowns yIxIIyIxII yyyxxx (minimum 6 pixels necessary)
The Brightness Constraint Brightness Constancy Equation: Linearizing J (assume small (u,v)): In practice (although not necessary) one assumes
Motion Models Motion vector (u,v) has two unknowns (at each pixel) Brightness Constraint provides one equation One approach is to use a regularizer (e.g., Horn and Schunk) Alternative: Global Motion Model Constraint –e.g., Affine, Quadratic, Projective (planar) transform (2D Models) –Instantaneous camera motion, homography+epipole, Plane+Parallax, etc. (3D Models)
J JwJw I warp refine + J JwJw I warp refine + J pyramid construction J JwJw I warp refine + I pyramid construction Coarse-to-Fine Estimation
Affine Motion Estimation Each pixel provides one constraint Least square estimation; Minimize:
Coarse-to-Fine Estimation Problem: Brightness linearization assumes small (u,v). Solution: Iterative refinement Coarse-to-fine estimation within a pyramid