Projective Factorization of Multiple Rigid-Body Motions

Projective Factorization of Multiple Rigid-Body Motions
Ting Li, Vinutha Kallem, Dheeraj Singaraju, and René Vidal Johns Hopkins University Good afternoon/evening. Today’s talk is titled “Projective factorization of multiple rigid body motions”. It is based on research going on at the vision lab at Johns Hopkins under the supervision of Dr. Rene Vidal.

3-D motion segmentation problem
Given a set of point correspondences in multiple views, determine the model to which each correspondence belongs. Mathematics of the problem depends on Number of frames (2, 3, multiple) Projection model (affine, perspective) Motion model (affine, translational, homography, fundamental matrix, etc.) 3-D structure (planar or not) 1)Our work addresses the 3D motion segmentation problem which is an area of great interest in computer vision. 2) As shown in the video, if one is given trajectories of points on different moving objects, we would like to separate the trajectories into separate groups such that each group has the same motion 3) The framework developed to solve this problem depends on the assumptions under which it is studied. 4) Certain factors that affect the nature of the solution are the number of frames being considered or the type of motion model being fit to the data. 5) A factor that plays a very significant role in developing a solution is the projection model of the camera.

Motion estimation: multiple affine views
Structure = 3D surface Affine camera model p = point f = frame Motion of one rigid-body lives in a 4-D subspace (Boult and Brown ’91, Tomasi and Kanade ‘92) P = #points F = #frames Linear relationship between and Motion = camera position and orientation Explain the projection equation Convey the idea of linear relationship between pixels and 3D point The motion segmentation problem has been extensively studied under the assumption of an affine camera In such a case the projection relation is given as shown in the equation. In this line of research, there is an important result that if the trajectories are stacked into a matrix W, this matrix can be factorized into a matrix M that captures the motion, and a matrix S that captures the structure of the 3d scene More importantly, since M and S have 4 columns and 4 rows respectively, the matrix W is at most rank 4. Thereby, we have the important result that the motion of a rigid body lives in a 4-D subspace

Motion segmentation: multiple affine views
Segmentation of multiple motions is equivalent to clustering subspaces of dim 4 Multibody grouping: Gear ’98, Costeira and Kanade ’98, Kanatani ’01, Kanatani and Matsunaga ’02 Multiframe segmentation algorithms that use all the frames simultaneously Cannot deal with partially dependent motions Statistical methods: Sugaya and Kanatani ’04, RANSAC Robust to noise and outliers Computationally intense Algebraic approaches: Vidal and Hartley ’04, Yan and Pollefeys ’06 Errors comparable to the statistical methods Significantly reduce computation time This constraint was first used by the approaches that reduced this problem to one of grouping multibodies. They either formulated the problem as one of bipartite graph matching or thresholding the entries of similarity matrices. More importantly these were the first works that used all the frames simulataneously for segmentation The second kind of methods is statistical in nature and include algorithms such as the MultiStage Learning algorithm and RANSAC like methods. These methods are very robust to noise and outliers, but this is at the cost of high computational expense Then there are the algebraic approaches which proceed by fitting linear subspaces to the given data and segment the data by using the similarities between these subspaces. These methods give comparable errors to the statistical ones, but more importantly at highly reduced run times.

Motion estimation: multiple perspective views
Perspective camera model UNKNOWN depths are unknown depths are known Compute M and S Rk 4 projection factorization Estimate depths Set all depths to 1 Use epipolar geometry to estimate depths ITERATE With perspective cameras, the problem is a little more involved. This is because of the introduction of an unknown depth. The motion estimation problem is hence rendered non-linear. We note that if we were to form a matrix W(lambda) , if we knew the depths, we would be able to proceed as in the affine case. That is we can extract the structure and motion When the depths are unknown, we proceed by making initial guesses for the depths. This can be done by setting them to be 1, or using epipolar geometry. Given the depths, one can extract the motion and structure information, and use this to refine the estimate of the depths While the classical algorithm ends here, one can iterate between factorization and depth estimation to refine the solutions While I will not dwell into the details, I would like to mention that there have indeed been later works that addressed iterative extensions of the Sturms and Triggs algorithm, and analyzed the convergence of the same. Sturm and Triggs ’96 Iterative extensions of the Sturm and Triggs algorithm Triggs ‘96, Mahmud et al ‘01, Oliensis and Hartley ‘06

Motion segmentation: multiple perspective views
Algebraic approaches: Vidal, Ma, Soatto and Sastry ’06, Hartley and Vidal ‘04 Fit global models using two-view and three-view geometry to segment sequences with two and three frames respectively Statistical approaches: Torr ‘1998, Schindler, U and Wang ‘06 Use two-view geometry along with model selection and RANSAC to segment sequences with two frames only Generate candidate models for pairs of 2 views and combine the results across all frames using model selection Segmentation of multiple motions for multiple motions is a non-linear problem Projection model is non linear as opposed to linear in the affine case First two methods only 2-3 views and the last method links results across views Put table showing what exists…and so what we propose The perspective case however is not immediate to solve. As shown in the projection equation, it involves an extra unknown in terms of the depth There is a genre of algebraic algorithms that proceed by fitting global models to data in 2views and 3 views. These models can then be used to estimate the individual motions and therefore obtain segmentation Then there is work that deals with estimating candidate motions using 2 view geometry and pruning the results across all the frames using model selection However, there does not exist a multiframe algorithm for the perspective case. In general, we prefer a multiframe algorithm, because if the data between any particular pair of views is noisy, the algorithms discussed above can perform below par. One would prefer to use all frames simultaneously ?

Paper contributions A 3-D motion segmentation algorithm from multiple perspective views that uses all frames simultaneously Generalizes the subspace separation methods from the affine to the perspective case Can deal with partially dependent and transparent motions Can achieve good trade off between speed and accuracy Generalizes the Sturm & Triggs algorithm to the case of multiple motions Use the Sturm & Triggs algorithm to generalize the subspace separation techniques to the case of perspective views In this paper, we propose a multiframe algorithm for segmenting data in multiple perspective views. As seen earlier subspace separation techniques such as LSA and GPCA give good run times and accuracy. Moreover they can deal with transparent and partially dependent motions. We use the Sturm Triggs algorithm to generalize the subpace separation methods to the case of perspective views. Therefore our work can also be treated as the generalization of ST to multiple motions. Before we go into the details of our actual algorithm let us look at the subspace separation techniques that will be employed

Schematic of our proposed algorithm
Dimension is now 3F, unlike 2F earlier Given depths Do subspace separation Given segmentation Estimate depths Estimation of depths goes later Picture for estimating depths Having introduced the subspace separation methods we intend to use, we present a schematic of our proposed algorithm Note that in this case, the dimension of W is now 3Fxp instead of 2Fxp. This is because x is represented in homogeneous co-ordinates. I shall speak more on this later Therefore, we see that if we know the depths…we can do subspace separation using GPCA or LSA If we know the segmentation, we can estimate the depths for each group in an iterative fashion Therefore given initial estimates of either depth or the segmentation, we can iterate betweeen these two steps to segment the entire data. Estimate depths Subspace separation

Generalized Principal Component Analysis (GPCA)
Dimensionality reduction Apply spectral clustering to Fit global model Change the normals in the segmented image Since the data lies in a subspace of dimension at most 4, we reduce the dimensionality of the data by projectin onto a subspace of dimension 5. Note that, in general this preserves the grouping. The method then proceeds by defining a global model p(x) that is satisfied by all the data irrespective of its segmentation. p(x) is a polynomial in the entries of x. It has the nice property that its derivative when evaluated at a data point, gives the normal to the subspace on which it lies. One then repeats this process for all the trajectories and forms a matrix S based upon the angles between pairs of normals. The data is then segmented by applying spectral clustering to this matrix Estimate normals Segment

Local Subspace Affinity (LSA)
Dimensionality reduction Project each point onto the unit sphere Subspace angles Remove b’s and put in subspaces S Similarity between subspace angles In this case the data is projected onto dimension D where 5<=D<=4n. The data is the normalized to unit norm, by projecting it onto the unit sphere. The idea is that this separates the different motion subspaces. The method then fits a subspace locally to each trajectory and estimates the normal to the subspace One can then form a matrix S which depends on the angles between these normals, and apply spectral clustering to segment the data. Spectral clustering using subspace angles as similarity Segment Locally fit a subspace through each point

Estimation of depths for each group
Make initial guess for depths Rank 4 projection Re-estimate the depths Iterate until convergence of

Implementation details
Normalization of the image co-ordinates such that they have 0 mean and a standard deviation of (Hartley ’97) Balancing by enforcing its columns and rows to have norm 1 (Sturm and Triggs ’96) Initialization of the iterative algorithm for estimating depths, by setting all depths to 1 *Balancing preserves the factorization Talk about normalization, balancing and initialization

Summary of algorithm Trajectories in multiple perspective views
Given initial value of depths Given initial segmentation Subspace separation Estimation of depths ITERATE GPCA To mention our algorithm in a nutshell, given the trajectories of different rigid bodies in multiple perspective views, we iterate between subspace separation and estimation of depths, to segment the motion While I have talked about the iteration so far, I would like to point out that our algorithm does need an initialization This can be done by giving an initial segmentation, which in our case is obtained from the affine GPCA and LSA algorithms Alternately one can choose to given an initial estimate of the depths Since, we have the freedom to choose the kind of initialization and the algorithm that we use for subspace separation, we explore the performance of 4 variations of our algorithm The algorithms are as listed here. In what is shown, we use the following naming scheme. The first tag of the name refers to the kind of initialization. Therefore, DepthInit means the algorithm is initialized with depths and SegInit means that the algorithm is given an initial segmentation The second tag of the name refers to the method we use for subspace separation For example, when I say DepthInit-GPCA, I mean that we initialize our algorithm by setting all the depths to be 1, and use GPCA for subspace separation Similarly, when I say SegInit-GPCA, I mean that we initialize our algorithm with the segmentation given by the affine GPCA algorithm. The subspace separation is done using GPCA then Estimation of depths goes later Picture for estimating depths Having introduced the subspace separation methods we intend to use, we present a schematic of our proposed algorithm Note that in this case, the dimension of W is now 3Fxp instead of 2Fxp. This is because x is represented in homogeneous co-ordinates. I shall speak more on this later Therefore, we see that if we know the depths…we can do subspace separation using GPCA or LSA If we know the segmentation, we can estimate the depths for each group in an iterative fashion Therefore given initial estimates of either depth or the segmentation, we can iterate betweeen these two steps to segment the entire data. 4 variations of our algorithm

Description of database
Results Testing on the Hopkins155 database Description of database Comparison with algorithms for segmenting affine views Multi Stage Learning (MSL): Sugaya & Kanatani ’04 Generalized Principal Component Analysis (GPCA): Vidal & Hartley ’04 Local Subspace Affinity (LSA): Yan & Pollefeys ’06 Introduce methods and their short descriptions and bar graphs We analyzed the performance of our method vs the existing methods for the affine case, on the Hopkins 155 motion segmentation database This data base contains sequences that have 2 and 3 motions. The sequences can further be categorized into those of checkerboard scenes, trafffic and articulated motion If we look at the statistics for GPCA and LSA, we see that they give comparable errors to Kanatani’s statistically robust Multi Stage Learning algorithm, while giving low run times. This motivates our choice to use these two for subspace separation. Note that GPCA does well for 2 motions but not for 3 motions. we now look at the performance of our algorithm. We consider 4 different variations of our algorithm. 2 variations sprouting from the fact that we can use GPCA or LSA for subspace separation, and 2 others due to the fact that we can initialize our algorithm with knowledge of depths or segmentation. Note that in most cases, our perspective algorithm outperforms the existing affine methods, while giving comparable run times.

Results: Statistics for affine segmentation
Tron and Vidal ’07 Statistics for two motions Statistics for three motions LSA gives errors comparable to those of MSL GPCA performs well with 2 motions, but not 3 motions LSA and GPCA are significantly faster than MSL Introduce methods and their short descriptions and bar graphs We analyzed the performance of our method vs the existing methods for the affine case, on the Hopkins 155 motion segmentation database This data base contains sequences that have 2 and 3 motions. The sequences can further be categorized into those of checkerboard scenes, trafffic and articulated motion If we look at the statistics for GPCA and LSA, we see that they give comparable errors to Kanatani’s statistically robust Multi Stage Learning algorithm, while giving low run times. This motivates our choice to use these two for subspace separation. Note that GPCA does well for 2 motions but not for 3 motions. we now look at the performance of our algorithm. We consider 4 different variations of our algorithm. 2 variations sprouting from the fact that we can use GPCA or LSA for subspace separation, and 2 others due to the fact that we can initialize our algorithm with knowledge of depths or segmentation. Note that in most cases, our perspective algorithm outperforms the existing affine methods, while giving comparable run times.

Results: Statistics for 2 motions
The errors of the affine GPCA algorithm are reduced by our iterative algorithm The errors of the affine LSA algorithm are reduced by our iterative algorithm, when given an initial segmentation

Results: Statistics for 3 motions
The errors of GPCA are reduced by our iterative algorithm The errors of LSA are reduced by our iterative algorithm Our best method improves best existing algorithm by 2%

Conclusions We propose the first multibody multiframe 3D motion segmentation algorithm for perspective views, that uses all the frames simultaneously The algorithm generalizes subspace separation techniques such as LSA and GPCA to segment perspective views Comprehensive testing on an extensive database shows that our method improves the results of existing algorithms for affine segmentation

Open Issues Rigorous convergence analysis of the algorithm
Extension of the algorithm to deal with missing data and outliers

Acknowledgements This work was supported by startup funds from Johns Hopkins University, and by grants NSF CAREER IIS , NSF EHS and ONR N We would like to thank the following people for providing us with data and code Dr. K. Kanatani Dr. M. Pollefeys R. Tron

QUESTIONS ?

Projective Factorization of Multiple Rigid-Body Motions

Similar presentations

Presentation on theme: "Projective Factorization of Multiple Rigid-Body Motions"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Projective Factorization of Multiple Rigid-Body Motions

Similar presentations

Presentation on theme: "Projective Factorization of Multiple Rigid-Body Motions"— Presentation transcript:

Similar presentations

About project

Feedback