Answering ‘Where am I?’ by Nonlinear Least Squares Daniel Eaton CPSC 542b December 5th, 2005 12/5/2005 Daniel Eaton
? ? ? Motivation (1/4) “Where am I” - ICCV05 Programming Contest [Given some images of a city taken from known locations, estimate locations of other images] My entry was competitive until optimization became a critical component for success ? ? ? Image Source: Rick Szleski, ICCV ‘05 12/5/2005 Daniel Eaton
Motivation (2/4) My contest entry: dataset matches Find two most similar labeled images 12/5/2005 Daniel Eaton
Motivation (3/4) Given multiple views of same scene, one can do structure and motion reconstruction Input: 2+ views, ‘interest points’ matched across images (visually discriminative parts of the image) Output: 3D position of the corresponding world points, location and orientation of cameras, all up to an overall scale factor This output is all relative to a local coordinate system – with 3 views (2 labeled) can rotate local to world coordinates and resolve scale ambiguity 12/5/2005 Daniel Eaton
Motivation (4/4) Sample input (2 view): 12/5/2005 Daniel Eaton
Formulation (1/1) Phrase structure and motion reconstruction as a nonlinear least squares problem Each point correspondence is associated with a 3D point, and projected into the 2+ images Residual is the difference between the measured point location, and projected point location Want to minimize summed residual of all 3D points & projections, over camera and structure parameters Projection is a nonlinear function 12/5/2005 Daniel Eaton
Implementation (1/6) Used Levenberg-Marquardt algorithm Why? Has been applied successfully to many vision problems already Easy to implement in Matlab (after deriving all the derivatives analytically) Adopted a heuristic for increasing/decreasing the trust region size (rather than the method introduced in class) Used an elliptical trust region to speed up convergence (hand tuned for these parameters) 12/5/2005 Daniel Eaton
Implementation (2/6) Data is noisy & contains outliers, so needed a more robust cost function than L2 Huber function defines a threshold on the residual If greater, point is considered to be an outlier and the L1 error is used for it, if less than, the L2 error is used It is once continuously differentiable With Huber, convergence is slowed (& computation time increased), but accuracy in the estimated camera parameters is greatly increased 12/5/2005 Daniel Eaton
Implementation (3/6) Robust cost function comparison (error on synthetic tests): 12/5/2005 Daniel Eaton
Implementation (4/6) Inner loop involves solving normal equations: Tried Matlab \ and pinv functions, but they will not scale to large numbers of input points (N) J is approx. 4N x 3N, and both of the above cost O(N^3) Solution: find the equivalent p minimizing: 12/5/2005 Daniel Eaton
Implementation (5/6) J is very sparse, so the minimization can be performed quickly with LSQR Two view case 12/5/2005 Daniel Eaton
Implementation (6/6) Unexpected benefit of using LSQR: 12/5/2005 Daniel Eaton
Results (1/6) Experiment 1: Mandarin orange box, two view Input: 12/5/2005 Daniel Eaton
Results (2/6) Experiment 1: Mandarin orange box, two view Output: 12/5/2005 Daniel Eaton
Results (3/6) Experiment 2: Mandarin orange box, three view (5am results) Input: 12/5/2005 Daniel Eaton
Results (4/6) Experiment 2: Mandarin orange box, three view (5am results) Output: 12/5/2005 Daniel Eaton
Results (5/6) What was I looking at? 12/5/2005 Daniel Eaton
Results (6/6) Input: 12/5/2005 Daniel Eaton
Conclusion (1/1) Now have the knowledge and infrastructure to complete my contest entry Could have done much better in the contest with what I know now 12/5/2005 Daniel Eaton
Questions? Thanks for listening! 12/5/2005 Daniel Eaton