A Unified Algebraic Approach to 2D and 3D Motion Segmentation René Vidal Center for Imaging Science Johns Hopkins University
Motion segmentation: 2 views A static scene: multiple 2D motion models A dynamic scene: multiple 3D motion models Given an image sequence, determine Number of motion models (affine, Euclidean, etc.) Motion model: affine (2D) or Euclidean (3D) Segmentation: model to which each pixel belongs
Computer Vision Structure from motion and 3D reconstruction Input: Corresponding points in multiple images Output: camera motion, Euclidean scene structure Theory Multiview geometry: Multiple view matrix Multiple view normalized epipolar constraint Linear self-calibration Algorithms Multiple view matrix factorization algorithm Multiple view factorization for planar motions
Multibody Multiview Geometry Given an image sequence, determine Number of motion models (affine, Euclidean, etc.) Motion estimation: affine (2D) or Euclidean (3D) Data segmentation: model associated with each pixel Prior work 3D multibody multiple view geometry Points in a line Points in a conic Coplanar points linearly moving at constant speed Points in multiple planes Multibody Structure from Motion 3D Motion Segmentation: multibody epipolar constraint Affine Motion Segmentation: multibody brightness constancy and affine constraints
Multibody Multiview Geometry Multibody Structure from Motion 3D Motion Segmentation: multibody epipolar constraint Affine Motion Segmentation: multibody brightness constancy and affine constraints Generalized PCA Segmentation of mixtures of subspaces Image/video segmentation
Motion-based image segmentation Two motions Camera panning to the right Car translating to the right http://www.cs.otago.ac.nz/research/vision/Research/
Segmentation of linear motions Multiple objects translating in 3D
Piecewise Bilinear Data Multibody structure from motion
Motion Segmentation: bilinear data Rotation: Translation: Epipolar constraint Multiple motions Write Sym(F_1,…., F_n). Multibody epipolar constraint
Estimation of fundamental matrices Multibody epipolar constraint Lifting Embedding Given rank condition n linear system F Theorem: Multibody structure from motion [Vidal et al.] Factorization of bilinear forms can be reduced to factorization of linear forms Estimation of fundamental matrices can be reduced to GPCA
3D Motions: Multibody epipolar transfer Number of motions Lifting Multibody epipolar line Polynomial factorization
3D Motions: Multibody epipole Lifting The multibody epipole is the solution of the linear system Number of distinct epipoles Epipoles are obtained using polynomial factorization
3D Motions: Fundamental matrices Columns of are epipolar lines Polynomial factorization to compute them up to scale Scales can be computed linearly
Optimal 3D motion segmentation Zero-mean Gaussian noise Constrained optimization problem on Optimal function for 1 motion Optimal function for n motions Solved using Riemanian Gradient Descent
Comparison of 1 and n bodies
3D Motion Segmentation Results
Affine Motion Segmentation Results
Conclusions There is an algebraic/geometric solution to simultaneous model estimation and data segmentation for Mixtures of subspaces: linear constraints Motion segmentation: bilinear constraints Solution based on Polynomial factorization: linear algebra Solution is closed form if ngroups ≤ 4 Showed applications in Image segmentation: intensity and texture Video segmentation: affine and 3D motion segmentation
Ongoing work and future directions Machine Learning and Statistical Geometry Robust GPCA Connections with learning methods: Kernel PCA, etc. Model selection: different classes of models Estimating manifolds from sample data points Applications of GPCA in Computer Vision Cue integration Multiple view geometry of dynamic scenes Shape recognition: faces A geometric/statistical theory of segmentation?
Dynamic GPCA: Recognition/Synthesis of Human Motion Recognition/Synthesis of Dynamic Textures Given image data Estimate a mixture of linear dynamical models (linear hybrid systems) Use the models to segment/recognize Human activity Dynamic texture Use the models to synthesize human motion dynamic textures
Thanks Computer Vision Pursuit-Evasion Games Vision Based Landing Stefano Soatto, UCLA Yi Ma, UIUC Jana Kosecka, GMU John Oliensis, NEC Control/Hybrid Systems John Lygeros, Cambridge Shawn Schaffert Research Advisor Shankar Sastry Pursuit-Evasion Games Jin Kim David Shim Vision Based Landing Omid Shakernia Cory Sharp Formation Control Noah Cowan
Multiple View Geometry (MVG) Obtain camera motion and scene structure from multiple images of a cloud of 3D feature points
MVG: Anatomy of cases (state of the art) surface curve line point theory algorithm practice Euclidean affine projective 2 views 3 views 4 views m views algebra geometry optimization
MVG: A need for unification Euclidean surface curve line point 2 views 3 views 4 views m views theory algorithm practice affine projective algebra geometry optimization rank deficiency of Multiple View Matrix
MVG: The Multiple View Matrix Relationship between first and i-th views Theorem: [Rank deficiency of Multiple View Matrix] Theorem: [Dependency of multilinear constraints] Constraints among more than three views are algebraically dependent (quadrilinear in particular) (degenerate) (generic)
Texture segmentation Given a static image, determine Number of groups Segmentation: pixels that have the same texture
Generalized Principal Component Analysis (GPCA) Given data lying on a collection of subspaces Number of subspaces Model for each subspace: basis Segmentation: model to which each point belongs
Vision Based Formation Control Green follows red Blue follows green