Part II Applications of GPCA in Computer Vision

Slides:



Advertisements
Similar presentations
Analysis of Contour Motions Ce Liu William T. Freeman Edward H. Adelson Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.
Advertisements

Mapping: Scaling Rotation Translation Warp
N-view factorization and bundle adjustment CMPUT 613.
Optimization & Learning for Registration of Moving Dynamic Textures Junzhou Huang 1, Xiaolei Huang 2, Dimitris Metaxas 1 Rutgers University 1, Lehigh University.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Structure from motion.
MASKS © 2004 Invitation to 3D vision Lecture 8 Segmentation of Dynamical Scenes.
Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.
Uncalibrated Geometry & Stratification Sastry and Yang
Real-time Combined 2D+3D Active Appearance Models Jing Xiao, Simon Baker,Iain Matthew, and Takeo Kanade CVPR 2004 Presented by Pat Chan 23/11/2004.
Triangulation and Multi-View Geometry Class 9 Read notes Section 3.3, , 5.1 (if interested, read Triggs’s paper on MVG using tensor notation, see.
CS664 Lecture #19: Layers, RANSAC, panoramas, epipolar geometry Some material taken from:  David Lowe, UBC  Jiri Matas, CMP Prague
Agenda The Subspace Clustering Problem Computer Vision Applications
55:148 Digital Image Processing Chapter 11 3D Vision, Geometry Topics: Basics of projective geometry Points and hyperplanes in projective space Homography.
Computer vision: models, learning and inference
Geometry and Algebra of Multiple Views
Epipolar geometry The fundamental matrix and the tensor
Brief Introduction to Geometry and Vision
1 Preview At least two views are required to access the depth of a scene point and in turn to reconstruct scene structure Multiple views can be obtained.
Projective cameras Motivation Elements of Projective Geometry Projective structure from motion Planches : –
Periodic Motion Detection via Approximate Sequence Alignment Ivan Laptev*, Serge Belongie**, Patrick Perez* *IRISA/INRIA, Rennes, France **Univ. of California,
Combined Central and Subspace Clustering for Computer Vision Applications Le Lu 1 René Vidal 2 1 Computer Science Department, Johns Hopkins University,
Parameter estimation. 2D homography Given a set of (x i,x i ’), compute H (x i ’=Hx i ) 3D to 2D camera projection Given a set of (X i,x i ), compute.
Affine Structure from Motion
Raquel A. Romano 1 Scientific Computing Seminar May 12, 2004 Projective Geometry for Computer Vision Projective Geometry for Computer Vision Raquel A.
EECS 274 Computer Vision Affine Structure from Motion.
Joint Tracking of Features and Edges STAN BIRCHFIELD AND SHRINIVAS PUNDLIK CLEMSON UNIVERSITY ABSTRACT LUCAS-KANADE AND HORN-SCHUNCK JOINT TRACKING OF.
Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.
半年工作小结 报告人:吕小惠 2011 年 8 月 25 日. 报告提纲 一.学习了 Non-negative Matrix Factorization convergence proofs 二.学习了 Sparse Non-negative Matrix Factorization 算法 三.学习了线性代数中有关子空间等基础知.
Uncalibrated reconstruction Calibration with a rig Uncalibrated epipolar geometry Ambiguities in image formation Stratified reconstruction Autocalibration.
Structure from Motion Paul Heckbert, Nov , Image-Based Modeling and Rendering.
Structure from motion Multi-view geometry Affine structure from motion Projective structure from motion Planches : –
EECS 274 Computer Vision Projective Structure from Motion.
Parameter estimation class 5 Multiple View Geometry CPSC 689 Slides modified from Marc Pollefeys’ Comp
ICCV 2007 Optimization & Learning for Registration of Moving Dynamic Textures Junzhou Huang 1, Xiaolei Huang 2, Dimitris Metaxas 1 Rutgers University 1,
Motion Segmentation CAGD&CG Seminar Wanqiang Shen
11/25/03 3D Model Acquisition by Tracking 2D Wireframes Presenter: Jing Han Shiau M. Brown, T. Drummond and R. Cipolla Department of Engineering University.
Reconstruction of a Scene with Multiple Linearly Moving Objects Mei Han and Takeo Kanade CISC 849.
ROBUST SUBSPACE LEARNING FOR VISION AND GRAPHICS
CS4670 / 5670: Computer Vision Kavita Bala Lec 27: Stereo.
A Closed Form Solution to Direct Motion Segmentation
Motion Segmentation with Missing Data using PowerFactorization & GPCA
Projective Factorization of Multiple Rigid-Body Motions
Part I 1 Title 2 Motivation 3 Problem statement 4 Brief review of PCA
Computer Vision, Robotics, Machine Learning and Control Lab
René Vidal and Xiaodong Fan Center for Imaging Science
Segmentation of Dynamic Scenes
L-infinity minimization in geometric vision problems.
René Vidal Center for Imaging Science
René Vidal Center for Imaging Science
René Vidal Time/Place: T-Th 4.30pm-6pm, Hodson 301
Segmentation of Dynamic Scenes
A Unified Algebraic Approach to 2D and 3D Motion Segmentation
Segmentation of Dynamic Scenes from Image Intensities
Optical Flow Estimation and Segmentation of Moving Dynamic Textures
Modeling and Segmentation of Dynamic Textures
Observability, Observer Design and Identification of Hybrid Systems
Dynamic Scene Reconstruction using GPCA
Generalized Principal Component Analysis CVPR 2008
Parameter estimation class 5
Dynamical Statistical Shape Priors for Level Set Based Tracking
3D Photography: Epipolar geometry
Structure from motion Input: Output: (Tomasi and Kanade)
The Brightness Constraint
EE 290A Generalized Principal Component Analysis
Motion Segmentation at Any Speed
PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD
Uncalibrated Geometry & Stratification
Segmentation of Dynamical Scenes
Structure from motion Input: Output: (Tomasi and Kanade)
Presentation transcript:

Part II Applications of GPCA in Computer Vision René Vidal Center for Imaging Science Institute for Computational Medicine Johns Hopkins University Title: Segmentation of Dynamic Scenes and Textures Abstract: Dynamic scenes are video sequences containing multiple objects moving in front of dynamic backgrounds, e.g. a bird floating on water. One can model such scenes as the output of a collection of dynamical models exhibiting discontinuous behavior both in space, due to the presence of multiple moving objects, and in time, due to the appearance and disappearance of objects. Segmentation of dynamic scenes is then equivalent to the identification of this mixture of dynamical models from the image data. Unfortunately, although the identification of a single dynamical model is a well understood problem, the identification of multiple hybrid dynamical models is not. Even in the case of static data, e.g. a set of points living in multiple subspaces, data segmentation is usually thought of as a "chicken-and-egg" problem. This is because in order to estimate a mixture of models one needs to first segment the data and in order to segment the data one needs to know the model parameters. Therefore, static data segmentation is usually solved by alternating data clustering and model fitting using, e.g., the Expectation Maximization (EM) algorithm. Our recent work on Generalized Principal Component Analysis (GPCA) has shown that the "chicken-and-egg" dilemma can be tackled using algebraic geometric techniques. In the case of data living in a collection of (static) subspaces, one can segment the data by fitting a set of polynomials to all data points (without first clustering the data) and then differentiating these polynomials to obtain the model parameters for each group. In this talk, we will present ongoing work addressing the extension of GPCA to time-series data living in a collection of multiple moving subspaces. The approach combines classical GPCA with newly developed recursive hybrid system identification algorithms. We will also present applications of DGPCA in image/video segmentation, 3-D motion segmentation, dynamic texture segmentation, and heart motion analysis.

Part II: Applications in computer vision Applications to motion & video segmentation (10.30-11.15) 2-D and 3-D motion segmentation Temporal video segmentation Dynamic texture segmentation Applications to image representation and segmentation (11.15-12.00) Multi-scale hybrid linear models for sparse image representation Hybrid linear models for image segmentation PART I = 38 slices = 1 hour Motivation: 7 slides GPCA: 31 slides Part II: 31 slides = 1 hour Computer vision: 18 slides Biomedical imaging: 6 slides Control: 13 slides

Applications to motion and and video segmentation René Vidal Center for Imaging Science Institute for Computational Medicine Johns Hopkins University Title: Segmentation of Dynamic Scenes and Textures Abstract: Dynamic scenes are video sequences containing multiple objects moving in front of dynamic backgrounds, e.g. a bird floating on water. One can model such scenes as the output of a collection of dynamical models exhibiting discontinuous behavior both in space, due to the presence of multiple moving objects, and in time, due to the appearance and disappearance of objects. Segmentation of dynamic scenes is then equivalent to the identification of this mixture of dynamical models from the image data. Unfortunately, although the identification of a single dynamical model is a well understood problem, the identification of multiple hybrid dynamical models is not. Even in the case of static data, e.g. a set of points living in multiple subspaces, data segmentation is usually thought of as a "chicken-and-egg" problem. This is because in order to estimate a mixture of models one needs to first segment the data and in order to segment the data one needs to know the model parameters. Therefore, static data segmentation is usually solved by alternating data clustering and model fitting using, e.g., the Expectation Maximization (EM) algorithm. Our recent work on Generalized Principal Component Analysis (GPCA) has shown that the "chicken-and-egg" dilemma can be tackled using algebraic geometric techniques. In the case of data living in a collection of (static) subspaces, one can segment the data by fitting a set of polynomials to all data points (without first clustering the data) and then differentiating these polynomials to obtain the model parameters for each group. In this talk, we will present ongoing work addressing the extension of GPCA to time-series data living in a collection of multiple moving subspaces. The approach combines classical GPCA with newly developed recursive hybrid system identification algorithms. We will also present applications of DGPCA in image/video segmentation, 3-D motion segmentation, dynamic texture segmentation, and heart motion analysis.

3-D motion segmentation problem Given a set of point correspondences in multiple views, determine Number of motion models Motion model: affine, homography, fundamental matrix, trifocal tensor Segmentation: model to which each pixel belongs Mathematics of the problem depends on Number of frames (2, 3, multiple) Projection model (affine, perspective) Motion model (affine, translational, homography, fundamental matrix, etc.) 3-D structure (planar or not)

Taxonomy of problems 2-D Layered representation Probabilistic approaches: Jepson-Black’93, Ayer-Sawhney’95, Darrel-Pentland’95, Weiss-Adelson’96, Weiss’97, Torr-Szeliski-Anandan’99 Variational approaches: Cremers-Soatto ICCV’03 Initialization: Wang-Adelson’94, Irani-Peleg’92, Shi-Malik‘98, Vidal-Singaraju’05-’06 Multiple rigid motions in two perspective views Probabilistic approaches: Feng-Perona’98, Torr’98 Particular cases: Izawa-Mase’92, Shashua-Levin’01, Sturm’02, Multibody fundamental matrix: Wolf-Shashua CVPR’01, Vidal et al. ECCV’02, CVPR’03, IJCV’06 Motions of different types: Vidal-Ma-ECCV’04, Rao-Ma-ICCV’05 Multiple rigid motions in three perspective views Multibody trifocal tensor: Hartley-Vidal-CVPR’04 Multiple rigid motions in multiple affine views Factorization-based: Costeira-Kanade’98, Gear’98, Wu et al.’01, Kanatani’ et al.’01-02-04 Algebraic: Yan-Pollefeys-ECCV’06, Vidal-Hartley-CVPR’04 Multiple rigid motions in multiple perspective views Schindler et al. ECCV’06, Li et al. CVPR’07

A unified approach to motion segmentation Estimation of multiple motion models equivalent to estimation of one multibody motion model Eliminate feature clustering: multiplication Estimate a single multibody motion model: polynomial fitting Segment multibody motion model: polynomial differentiation chicken-and-egg

A unified approach to motion segmentation Applies to most motion models in computer vision All motion models can be segmented algebraically by Fitting multibody model: real or complex polynomial to all data Fitting individual model: differentiate polynomial at a data point

Segmentation of 3-D translational motions Multiple epipoles (translation) Epipolar constraint: plane in Plane normal = epipoles Data = epipolar lines Multibody epipolar constraint Epipoles are derivatives of at epipolar lines

Segmentation of 3-D translational motions

Single-body factorization Structure = 3D surface Affine camera model p = point f = frame Motion of one rigid-body lives in a 4-D subspace (Boult and Brown ’91, Tomasi and Kanade ‘92) P = #points F = #frames Motion = camera position and orientation

Multi-body factorization Given n rigid motions Motion segmentation is obtained from Leading singular vector of (Boult and Brown ’91) Shape interaction matrix (Costeira & Kanade ’95, Gear ’94) Number of motions (if fully-dimensional) Motion subspaces need to be independent (Kanatani ’01)

Multi-body factorization Sensitive to noise Kanatani (ICCV ’01): use model selection to scale Q Wu et al. (CVPR’01): project data onto subspaces and iterate Fails with partially dependent motions Zelnik-Manor and Irani (CVPR’03) Build similarity matrix from normalized Q Apply spectral clustering to similarity matrix Yan and Pollefeys (ECCV’06) Local subspace estimation + spectral clustering Kanatani (ECCV’04) Assume degeneracy is known: pure translation in the image Segment data by multi-stage optimization (multiple EM problems) Cannot handle missing data Gruber and Weiss (CVPR’04) Expectation Maximization

PowerFactorization+GPCA A motion segmentation algorithm that Is provably correct with perfect data Handles both independent and degenerate motions Handles both complete and incomplete data Project trajectories onto a 5-D subspace of Complete data: PCA or SVD Incomplete data: PowerFactorization Cluster projected subspaces using GPCA Non-iterative: can be used to initialize EM

Projection onto a 5-D subspace Motion of one rigid-body lives in 4-D subspace of By projecting onto a 5-D subspace of Number and dimensions of subspaces are preserved Motion segmentation is equivalent to clustering subspaces of dimension 2, 3 or 4 in Minimum #frames = 3 (CK needs a minimum of 2n frames for n motions) Can remove outliers by robustly fitting the 5-D subspace using Robust SVD (DeLaTorre-Black) What projection to use? PCA: 5 principal components RPCA: with outliers Motion 1 Motion 2

Projection onto a 5-D subspace PowerFactorization algorithm: Complete data Given A solve for B Orthonormalize B Given B solve for A Iterate Converges to rank-r approximation with rate Given , factor it as Incomplete data It diverges in some cases Works well with up to 30% of missing data Linear problem

Motion segmentation using GPCA Apply polynomial embedding to 5-D points Veronese map

Hopkins 155 motion segmentation database Collected 155 sequences 120 with 2 motions 35 with 3 motions Types of sequences Checkerboard sequences: mostly full dimensional and independent motions Traffic sequences: mostly degenerate (linear, planar) and partially dependent motions Articulated sequences: mostly full dimensional and partially dependent motions Point correspondences In few cases, provided by Kanatani & Pollefeys In most cases, extracted semi-automatically with OpenCV

Experimental results: Hopkins 155 database 2 motions, 120 sequences, 266 points, 30 frames

Experimental results: Hopkins 155 database 3 motions, 35 sequences, 398 points, 29 frames

Experimental results: missing data sequences There is no clear correlation between amount of missing data and percentage of misclassification This could be because convergence of PF depends more on “where” missing data is located than on “how much” missing data there is

Conclusions For two motions For three motions Algebraic methods (GPCA and LSA) are more accurate than statistical methods (RANSAC and MSL) LSA performs better on full and independent sequences, while GPCA performs better on degenerate and partially dependent LSA is sensitive to dimension of projection: d=4n better than d=5 MSL is very slow, RANSAC and GPCA are fast For three motions GPCA is not very accurate, but is very fast MSL is the most accurate, but it is very slow LSA is almost as accurate as MSL and almost as fast as GPCA

Segmentation of Dynamic Textures René Vidal Center for Imaging Science Institute for Computational Medicine Johns Hopkins University Title: Segmentation of Dynamic Scenes and Textures Abstract: Dynamic scenes are video sequences containing multiple objects moving in front of dynamic backgrounds, e.g. a bird floating on water. One can model such scenes as the output of a collection of dynamical models exhibiting discontinuous behavior both in space, due to the presence of multiple moving objects, and in time, due to the appearance and disappearance of objects. Segmentation of dynamic scenes is then equivalent to the identification of this mixture of dynamical models from the image data. Unfortunately, although the identification of a single dynamical model is a well understood problem, the identification of multiple hybrid dynamical models is not. Even in the case of static data, e.g. a set of points living in multiple subspaces, data segmentation is usually thought of as a "chicken-and-egg" problem. This is because in order to estimate a mixture of models one needs to first segment the data and in order to segment the data one needs to know the model parameters. Therefore, static data segmentation is usually solved by alternating data clustering and model fitting using, e.g., the Expectation Maximization (EM) algorithm. Our recent work on Generalized Principal Component Analysis (GPCA) has shown that the "chicken-and-egg" dilemma can be tackled using algebraic geometric techniques. In the case of data living in a collection of (static) subspaces, one can segment the data by fitting a set of polynomials to all data points (without first clustering the data) and then differentiating these polynomials to obtain the model parameters for each group. In this talk, we will present ongoing work addressing the extension of GPCA to time-series data living in a collection of multiple moving subspaces. The approach combines classical GPCA with newly developed recursive hybrid system identification algorithms. We will also present applications of DGPCA in image/video segmentation, 3-D motion segmentation, dynamic texture segmentation, and heart motion analysis.

Modeling a dynamic texture: fixed boundary Examples of dynamic textures: Model temporal evolution as the output of a linear dynamical system (LDS): Soatto et al. ‘01 dynamics z t + 1 = A v y C w images appearance

Segmenting non-moving dynamic textures One dynamic texture lives in the observability subspace Multiple textures live in multiple subspaces Cluster the data using GPCA z t + 1 = A v y C w water steam

Segmenting moving dynamic textures

Segmenting moving dynamic textures Ocean-bird

Level-set intensity-based segmentation Chan-Vese energy functional Implicit methods Represent C as the zero level set of an implicit function φ, i.e. C = {(x, y) : φ(x, y) = 0} Solution The solution to the gradient descent algorithm for φ is given by c1 and c2 are the mean intensities inside and outside the contour C.

Dynamics & intensity-based energy We represent the intensities of the pixels in the images as the output of a mixture of AR models of order p We propose the following spatial-temporal extension of the Chan-Vese energy functional where

Variational segmentation of dynamic textures Given the ARX parameters, we can solve for the implicit function φ by solving the PDE Given the implicit function φ, we can solve for the ARX parameters of the jth region by solving the linear system

Variational segmentation of dynamic textures Fixed boundary segmentation results and comparison Ocean-smoke Ocean-dynamics Ocean-appearance

Variational segmentation of dynamic textures Moving boundary segmentation results and comparison Ocean-fire

Variational segmentation of dynamic textures Results on a real sequence Raccoon on River

Conclusions Many problems in computer vision can be posed as subspace clustering problems Temporal video segmentation 2-D and 3-D motion segmentation Dynamic texture segmentation Nonrigid motion segmentation These problems can be solved using GPCA: an algorithm for clustering subspaces Deals with unknown and possibly different dimensions Deals with arbitrary intersections among the subspaces GPCA is based on Projecting data onto a low-dimensional subspace Recursively fitting polynomials to projected subspaces Differentiating polynomials to obtain a basis In this paper, we have presented a new way of looking at segmentation problems that is based on algebraic geometric techniques. In essence we showed that segmentation is equivalent to polynomial factorization and presented an algebraic solution for the case of mixtures of subspaces, which is a natural generalization of the well known Principal Component Analysis (PCA). Of course, as this theory is fairly new, there are still a lot of open problems. For example, how to do polynomial factorization in the presence of noise, or how to deal with outliers. We are currently working on such problems and expect to report our results in the near future. Also, we are currently working on generalizing the ideas in this paper to the cases of subspaces of different dimensions and to other types of data. In fact, I cannot resist doing some advertising for my talk tomorrow, where I will talk about the segmentation of multiple rigid motions, which corresponds to the case of bilinear data. Finally, before taking your questions, I would like to say that I think it is time that we start thinking about putting segmentation problems in a theoretical footing. I think this can be done by building a mathematical theory of segmentation that not only incorporates well known statistical modeling techniques, but also the geometric constraints inherent in various applications in computer vision. We hope that by combining the algebraic geometric approach to segmentation presented in this paper with the more standard statistical approaches, we should be able to build a new mathematical theory of segmentation in the followi should be able to build such a theory, which a nice mathematical theory of segmentation. Such a theory should find applications in various disciplines, including of course lots of segmentation and recognition problems in computer vision.

Thank You! For more information, Vision, Dynamics and Learning Lab @ Johns Hopkins University Thank You!