Structure from Motion, Feature Tracking, and Optical Flow Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/15/10 Many slides adapted.

Slides:



Advertisements
Similar presentations
Structure from motion.
Advertisements

Optical Flow Estimation
Motion.
The fundamental matrix F
Feature Tracking and Optical Flow
Lecture 11: Structure from Motion
Computer Vision Optical Flow
Structure from motion.
Algorithms & Applications in Computer Vision
Automatic Image Alignment (direct) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and Rick.
Announcements Quiz Thursday Quiz Review Tomorrow: AV Williams 4424, 4pm. Practice Quiz handout.
Lecture 9 Optical Flow, Feature Tracking, Normal Flow
Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.
Announcements Project1 artifact reminder counts towards your grade Demos this Thursday, 12-2:30 sign up! Extra office hours this week David (T 12-1, W/F.
Feature matching and tracking Class 5 Read Section 4.1 of course notes Read Shi and Tomasi’s paper on.
Announcements Project 1 test the turn-in procedure this week (make sure your folder’s there) grading session next Thursday 2:30-5pm –10 minute slot to.
Visual motion Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys.
Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.
Motion Computing in Image Analysis
Optical Flow Estimation
Lecture 19: Optical flow CS6670: Computer Vision Noah Snavely
Visual motion Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys.
Motion Estimation Today’s Readings Trucco & Verri, 8.3 – 8.4 (skip 8.3.3, read only top half of p. 199) Numerical Recipes (Newton-Raphson), 9.4 (first.
1 Stanford CS223B Computer Vision, Winter 2006 Lecture 7 Optical Flow Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado Slides.
COMP 290 Computer Vision - Spring Motion II - Estimation of Motion field / 3-D construction from motion Yongjik Kim.
Matching Compare region of image to region of image. –We talked about this for stereo. –Important for motion. Epipolar constraint unknown. But motion small.
Where have all the flowers gone? (motion and tracking) Computer Vision COS429 Fall /02 Guest lecture by Andras Ferencz Many slides adapted from.
KLT tracker & triangulation Class 6 Read Shi and Tomasi’s paper on good features to track
Announcements Project1 due Tuesday. Motion Estimation Today’s Readings Trucco & Verri, 8.3 – 8.4 (skip 8.3.3, read only top half of p. 199) Supplemental:
CSCE 641 Computer Graphics: Image Registration Jinxiang Chai.
Motion and optical flow Thursday, Nov 20 Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys, S. Lazebnik.
Optical flow Combination of slides from Rick Szeliski, Steve Seitz, Alyosha Efros and Bill Freeman.
Feature Tracking and Optical Flow
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 19 – Dense motion estimation (OF) 1.
Feature Tracking and Optical Flow Computer Vision ECE 5554 Virginia Tech Devi Parikh 11/12/13 Slides borrowed from Derek Hoiem, who adapted many from Lana.
Tzu ming Su Advisor : S.J.Wang MOTION DETAIL PRESERVING OPTICAL FLOW ESTIMATION 2013/1/28 L. Xu, J. Jia, and Y. Matsushita. Motion detail preserving optical.
Structure from Motion Computer Vision CS 143, Brown James Hays 11/18/11 Many slides adapted from Derek Hoiem, Lana Lazebnik, Silvio Saverese, Steve Seitz,
Structure from Motion Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/06/12 Many slides adapted from Lana Lazebnik, Silvio Saverese,
Motion Segmentation By Hadas Shahar (and John Y.A.Wang, and Edward H. Adelson, and Wikipedia and YouTube) 1.
CSE 185 Introduction to Computer Vision Feature Tracking and Optical Flow.
Visual motion Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys.
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Computer Vision – Lecture 18 Motion and Optical Flow Bastian Leibe RWTH.
3D Imaging Motion.
Optical Flow. Distribution of apparent velocities of movement of brightness pattern in an image.
Miguel Tavares Coimbra
Motion Estimation Today’s Readings Trucco & Verri, 8.3 – 8.4 (skip 8.3.3, read only top half of p. 199) Newton's method Wikpedia page
776 Computer Vision Jan-Michael Frahm & Enrique Dunn Spring 2013.
Feature Tracking and Optical Flow Computer Vision ECE 5554, ECE 4984 Virginia Tech Devi Parikh 11/10/15 Slides borrowed from Derek Hoiem, who adapted many.
Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.
Structure from Motion Paul Heckbert, Nov , Image-Based Modeling and Rendering.
Motion Estimation Today’s Readings Trucco & Verri, 8.3 – 8.4 (skip 8.3.3, read only top half of p. 199) Newton's method Wikpedia page
Motion estimation Digital Visual Effects, Spring 2006 Yung-Yu Chuang 2005/4/12 with slides by Michael Black and P. Anandan.
Optical flow and keypoint tracking Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys.
MASKS © 2004 Invitation to 3D vision Lecture 3 Image Primitives andCorrespondence.
Image Motion. The Information from Image Motion 3D motion between observer and scene + structure of the scene –Wallach O’Connell (1953): Kinetic depth.
Motion estimation Parametric motion (image alignment) Tracking Optical flow.
Motion estimation Digital Visual Effects, Spring 2005 Yung-Yu Chuang 2005/3/23 with slides by Michael Black and P. Anandan.
Feature Tracking and Optical Flow
CPSC 641: Image Registration
Motion and Optical Flow
Feature Tracking and Optical Flow
Motion Estimation Today’s Readings
Announcements more panorama slots available now
Announcements Questions on the project? New turn-in info online
Structure from motion.
Announcements more panorama slots available now
23 April 2019.
Optical flow Computer Vision Spring 2019, Lecture 21
Open Topics.
Optical flow and keypoint tracking
Presentation transcript:

Structure from Motion, Feature Tracking, and Optical Flow Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/15/10 Many slides adapted from Lana Lazebnik, Silvio Saverse, who in turn adapted slides from Steve Seitz, Rick Szeliski, Martial Hebert, Mark Pollefeys, and others

Last class Estimating 3D points and depth – Triangulation from corresponding points – Dense stereo Projective structure from motion

This class Factorization method for structure from motion Feature tracking Optical flow (dense tracking)

Structure from motion under orthographic projection 3D Reconstruction of a Rotating Ping-Pong Ball C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2): , November 1992.Shape and motion from image streams under orthography: A factorization method. Reasonable choice when Change in depth of points in scene is much smaller than distance to camera Cameras do not move towards or away from the scene

Start with an affine camera model Affine projection is a linear mapping + translation in inhomogeneous coordinates x X a1a1 a2a2 Projection of world origin Does this ever happen?

Get rid of b by shifting to centroid 2d normalized point (observed) 3d normalized point Linear (affine) mapping

Suppose we know 3D points and affine camera parameters … then, we can compute the observed 2d positions of each point Camera Parameters (2mx3) 3D Points (3xn) 2D Image Points (2mxn) What rank is the matrix of 2D points?

Affine structure from motion Centering: subtract the centroid of the image points For simplicity, assume that the origin of the world coordinate system is at the centroid of the 3D points After centering, each normalized point x ij is related to the 3D point X i by

Affine structure from motion Given: m images of n fixed 3D points: x ij = A i X j + b i, i = 1,…, m, j = 1, …, n Problem: use the mn correspondences x ij to estimate m projection matrices A i and translation vectors b i, and n points X j The reconstruction is defined up to an arbitrary affine transformation Q (12 degrees of freedom): We have 2mn knowns and 8m + 3n unknowns (minus 12 dof for affine ambiguity) Thus, we must have 2mn >= 8m + 3n – 12 For two views, we need four point correspondences

What if we instead observe corresponding 2d image points? Can we recover the camera parameters and 3d points? cameras (2 m) points (n)

Affine structure from motion Let’s create a 2m × n data (measurement) matrix: cameras (2 m × 3) points (3 × n) The measurement matrix D = MS must have rank 3! C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2): , November 1992.Shape and motion from image streams under orthography: A factorization method.

Factorizing the measurement matrix Source: M. Hebert AX

Factorizing the measurement matrix Source: M. Hebert Singular value decomposition of D:

Factorizing the measurement matrix Source: M. Hebert Singular value decomposition of D:

Factorizing the measurement matrix Source: M. Hebert Obtaining a factorization from SVD:

Factorizing the measurement matrix Source: M. Hebert Obtaining a factorization from SVD:

Affine ambiguity The decomposition is not unique. We get the same D by using any 3×3 matrix C and applying the transformations A → AC, X →C -1 X That is because we have only an affine transformation and we have not enforced any Euclidean constraints (like forcing the image axes to be perpendicular, for example) Source: M. Hebert

Orthographic: image axes are perpendicular and of unit length Eliminating the affine ambiguity x X a1a1 a2a2 a 1 · a 2 = 0 |a 1 | 2 = |a 2 | 2 = 1 Source: M. Hebert

Solve for orthographic constraints Solve for L = CC T Recover C from L by Cholesky decomposition: L = CC T Update A and X: A = AC, X = C -1 X where ~ ~ Three equations for each image i

Algorithm summary Given: m images and n tracked features x ij For each image i, center the feature coordinates Construct a 2m × n measurement matrix D: – Column j contains the projection of point j in all views – Row i contains one coordinate of the projections of all the n points in image i Factorize D: – Compute SVD: D = U W V T – Create U 3 by taking the first 3 columns of U – Create V 3 by taking the first 3 columns of V – Create W 3 by taking the upper left 3 × 3 block of W Create the motion and shape matrices: – M = U 3 W 3 ½ and S = W 3 ½ V 3 T (or M = U 3 and S = W 3 V 3 T ) Eliminate affine ambiguity Source: M. Hebert

Dealing with missing data So far, we have assumed that all points are visible in all views In reality, the measurement matrix typically looks something like this: One solution: – solve using a dense submatrix of visible points (as in last lecture) – Iteratively add new cameras cameras points

Reconstruction results C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2): , November 1992.Shape and motion from image streams under orthography: A factorization method.

Dealing with missing data Possible solution: decompose matrix into dense sub- blocks, factorize each sub-block, and fuse the results – Finding dense maximal sub-blocks of the matrix is NP- complete (equivalent to finding maximal cliques in a graph) Incremental bilinear refinement (1)Perform factorization on a dense sub-block (2) Solve for a new 3D point visible by at least two known cameras (linear least squares) (3) Solve for a new camera that sees at least three known 3D points (linear least squares) F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects. PAMI 2007.Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects.

Recovering motion Feature-tracking – Extract visual features (corners, textured areas) and “track” them over multiple frames Optical flow – Recover image motion at each pixel from spatio-temporal image brightness variations (optical flow) B. Lucas and T. Kanade. An iterative image registration technique with an application toAn iterative image registration technique with an application to stereo vision.stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 674–679, Two problems, one registration method

Feature tracking Last problem required corresponding points in images If motion is small, tracking is an easy way to get them

Feature tracking Challenges – Need good features to track – Points may appear or disappear: need to be able to add/delete tracked points – Some points may change appearance over time (e.g., due to rotation, moving into shadows, etc.) – Drift: small errors can accumulate if appearance model is updated

Feature tracking Given two subsequent frames, estimate the point translation Key assumptions of Lucas-Kanade Tracker Brightness constancy: projection of the same point looks the same in every frame Small motion: points do not move very far Spatial coherence: points move like their neighbors I(x,y,t–1)I(x,y,t)I(x,y,t)

Brightness Constancy Equation: Linearizing right hand side using Taylor expansion: The brightness constancy constraint I(x,y,t–1)I(x,y,t)I(x,y,t) Hence, Image derivative along x

The brightness constancy constraint How many equations and unknowns per pixel? The component of the motion perpendicular to the gradient (i.e., parallel to the edge) cannot be measured edge (u,v)(u,v) (u’,v’) gradient (u+u’,v+v’) If (u, v ) satisfies the equation, so does (u+u’, v+v’ ) if One equation (this is a scalar equation!), two unknowns (u,v) Can we use this equation to recover image motion (u,v) at each pixel?

The aperture problem Actual motion

The aperture problem Perceived motion

The barber pole illusion

The barber pole illusion

Solving the ambiguity… How to get more equations for a pixel? Spatial coherence constraint Assume the pixel’s neighbors have the same (u,v) – If we use a 5x5 window, that gives us 25 equations per pixel B. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 674–679, 1981.

Least squares problem: Solving the ambiguity…

Matching patches across images Overconstrained linear system The summations are over all pixels in the K x K window Least squares solution for d given by

Conditions for solvability – Optimal (u, v) satisfies Lucas-Kanade equation Does this remind you of anything? When is This Solvable? A T A should be invertible A T A should not be too small due to noise –eigenvalues 1 and 2 of A T A should not be too small A T A should be well-conditioned – 1 / 2 should not be too large ( 1 = larger eigenvalue) Criteria for Harris corner detector

Eigenvectors and eigenvalues of A T A relate to edge direction and magnitude The eigenvector associated with the larger eigenvalue points in the direction of fastest intensity change The other eigenvector is orthogonal to it M = A T A is the second moment matrix ! (Harris corner detector…)

Interpreting the eigenvalues 1 2 “Corner” 1 and 2 are large, 1 ~ 2 1 and 2 are small “Edge” 1 >> 2 “Edge” 2 >> 1 “Flat” region Classification of image points using eigenvalues of the second moment matrix:

Edge – gradients very large or very small – large  1, small 2

Low-texture region – gradients have small magnitude – small  1, small 2

High-texture region – gradients are different, large magnitudes – large  1, large 2

The aperture problem resolved Actual motion

The aperture problem resolved Perceived motion

Dealing with larger movements: Iterative refinement 1.Initialize (u,v) = (0,0) 2.Compute (u,v) by 3.Shift window by (u, v) 4.Repeat steps 2-3 until small change Only I t changes per iteration 2 nd moment matrix for feature patch in first image displacement I t = I(x, y, t-1) - I(x+u, y+v, t)

image I image J Gaussian pyramid of image 1 (t)Gaussian pyramid of image 2 (t+1) image 2 image 1 Dealing with larger movements: coarse-to- fine registration run iterative L-K upsample......

Shi-Tomasi feature tracker Find good features using eigenvalues of second- moment matrix – Key idea: “good” features to track are the ones whose motion can be estimated reliably From frame to frame, track with Lucas-Kanade – This amounts to assuming a translation model for frame-to-frame feature movement Check consistency of tracks by affine registration to the first observed instance of the feature – Affine model is more accurate for larger displacements – Comparing to the first frame helps to minimize drift J. Shi and C. Tomasi. Good Features to Track. CVPR 1994.Good Features to Track

Tracking example J. Shi and C. Tomasi. Good Features to Track. CVPR 1994.Good Features to Track

Summary of KLT tracking Find a good point to track (harris corner) Use intensity second moment matrix and difference across frames to find displacement Iterate and use coarse-to-fine search to deal with larger movements When creating long tracks, check appearance of registered patch against appearance of initial patch to find points that have drifted

Picture courtesy of Selim Temizer - Learning and Intelligent Systems (LIS) Group, MIT Optical flow Vector field function of the spatio-temporal image brightness variations

Motion and flow Many slides adapted from S. Seitz, R. Szeliski, M. Pollefeys, S. Saverese, L. Lazebnik

Motion and perceptual organization Sometimes, motion is the only cue

Motion and perceptual organization Even “impoverished” motion data can evoke a strong percept G. Johansson, “Visual Perception of Biological Motion and a Model For Its Analysis", Perception and Psychophysics 14, , 1973.

Motion and perceptual organization Even “impoverished” motion data can evoke a strong percept G. Johansson, “Visual Perception of Biological Motion and a Model For Its Analysis", Perception and Psychophysics 14, , 1973.

Uses of motion Estimating 3D structure Segmenting objects based on motion cues Learning and tracking dynamical models Recognizing events and activities Improving video quality (motion stabilization)

Motion field The motion field is the projection of the 3D scene motion into the image What would the motion field of a non-rotating ball moving towards the camera look like?

Optical flow Definition: optical flow is the apparent motion of brightness patterns in the image Ideally, optical flow would be the same as the motion field Have to be careful: apparent motion can be caused by lighting changes without any actual motion – Think of a uniform rotating sphere under fixed lighting vs. a stationary sphere under moving illumination

Lucas-Kanade Optical Flow Same as Lucas-Kanade feature tracking, but for each pixel – As we saw, works better for textured pixels Operations can be done one frame at a time, rather than pixel by pixel – Efficient

Multi-resolution Lucas Kanade Algorithm

60 Iterative Refinement Iterative Lukas-Kanade Algorithm 1.Estimate velocity at each pixel by solving Lucas- Kanade equations 2.Warp I(t-1) towards I(t) using the estimated flow field - use image warping techniques 3.Repeat until convergence * From Khurram Hassan-Shafique CAP5415 Computer Vision 2003

image I image J Gaussian pyramid of image 1 (t)Gaussian pyramid of image 2 (t+1) image 2 image 1 Coarse-to-fine optical flow estimation run iterative L-K warp & upsample......

image I image H Gaussian pyramid of image 1Gaussian pyramid of image 2 image 2 image 1 u=10 pixels u=5 pixels u=2.5 pixels u=1.25 pixels Coarse-to-fine optical flow estimation

Example * From Khurram Hassan-Shafique CAP5415 Computer Vision 2003

Multi-resolution registration * From Khurram Hassan-Shafique CAP5415 Computer Vision 2003

Optical Flow Results * From Khurram Hassan-Shafique CAP5415 Computer Vision 2003

Optical Flow Results * From Khurram Hassan-Shafique CAP5415 Computer Vision 2003

Errors in Lucas-Kanade The motion is large – Possible Fix: Descriptor matching A point does not move like its neighbors – Possible Fix: Motion segmentation Brightness constancy does not hold – Possible Fix: Gradient constancy

State-of-the-art optical flow Start with something similar to Lucas-Kanade + gradient constancy + energy minimization with smoothing term + region matching + descriptor matching (long-range) Large displacement optical flowLarge displacement optical flow, Brox et al., CVPR 2009

Stereo vs. Optical Flow Similar dense matching procedures Why don’t we typically use epipolar constraints for optical flow? B. Lucas and T. Kanade. An iterative image registration technique with an application toAn iterative image registration technique with an application to stereo vision.stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 674–679, 1981.

Summary Major contributions from Lucas, Tomasi, Kanade – Structure from motion – Tracking feature points – Optical flow Key ideas – Factorization for special case of SFM – By assuming brightness constancy, truncated Taylor expansion leads to simple and fast patch matching across frames – Coarse-to-fine registration

Next class Kalman filter tracking (with David)