Geometry and Algebra of Multiple Views

Slides:

Advertisements

Similar presentations

Epipolar Geometry.

Advertisements

The fundamental matrix F

Lecture 11: Two-view geometry

MASKS © 2004 Invitation to 3D vision Lecture 7 Step-by-Step Model Buidling.

Two-View Geometry CS Sastry and Yang

Multiple View Reconstruction Class 24 Multiple View Geometry Comp Marc Pollefeys.

N-view factorization and bundle adjustment CMPUT 613.

Jan-Michael Frahm, Enrique Dunn Spring 2012

Two-view geometry.

MASKS © 2004 Invitation to 3D vision Lecture 6 Multiple-View Reconstruction from Scene Knowledge Multiple-View Geometry for Image-Based Modeling.

Camera calibration and epipolar geometry

Structure from motion.

MASKS © 2004 Invitation to 3D vision Lecture 11 Vision-based Landing of an Unmanned Air Vehicle.

Epipolar geometry. (i)Correspondence geometry: Given an image point x in the first view, how does this constrain the position of the corresponding point.

Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.

Uncalibrated Geometry & Stratification Sastry and Yang

Epipolar Geometry and the Fundamental Matrix F

Lecture 21: Multiple-view geometry and structure from motion

Multiple-view Reconstruction from Points and Lines

Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.

Single-view geometry Odilon Redon, Cyclops, 1914.

Projected image of a cube. Classical Calibration.

Camera Calibration CS485/685 Computer Vision Prof. Bebis.

May 2004Stereo1 Introduction to Computer Vision CS / ECE 181B Tuesday, May 11, 2004  Multiple view geometry and stereo  Handout #6 available (check with.

Lec 21: Fundamental Matrix

Camera parameters Extrinisic parameters define location and orientation of camera reference frame with respect to world frame Intrinsic parameters define.

CSCE 641 Computer Graphics: Image-based Modeling (Cont.) Jinxiang Chai.

Structure Computation. How to compute the position of a point in 3- space given its image in two views and the camera matrices of those two views Use.

CS 558 C OMPUTER V ISION Lecture IX: Dimensionality Reduction.

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry Topics: Basics of projective geometry Points and hyperplanes in projective space Homography.

Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.

776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.

Computer vision: models, learning and inference

Lecture 11 Stereo Reconstruction I Lecture 11 Stereo Reconstruction I Mata kuliah: T Computer Vision Tahun: 2010.

Multi-view geometry.

Epipolar geometry The fundamental matrix and the tensor

1 Preview At least two views are required to access the depth of a scene point and in turn to reconstruct scene structure Multiple views can be obtained.

Projective cameras Motivation Elements of Projective Geometry Projective structure from motion Planches : –

Lecture 04 22/11/2011 Shai Avidan הבהרה : החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Multiview Geometry and Stereopsis. Inputs: two images of a scene (taken from 2 viewpoints). Output: Depth map. Inputs: multiple images of a scene. Output:

Announcements Project 3 due Thursday by 11:59pm Demos on Friday; signup on CMS Prelim to be distributed in class Friday, due Wednesday by the beginning.

Class 61 Multi-linear Systems and Invariant Theory in the Context of Computer Vision and Graphics Class 6: Duality and Shape Tensors CS329 Stanford University.

Single-view geometry Odilon Redon, Cyclops, 1914.

Two-view geometry. Epipolar Plane – plane containing baseline (1D family) Epipoles = intersections of baseline with image planes = projections of the.

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry Topics: Basics of projective geometry Points and hyperplanes in projective space Homography.

Computer vision: models, learning and inference M Ahad Multiple Cameras

776 Computer Vision Jan-Michael Frahm & Enrique Dunn Spring 2013.

MASKS © 2004 Invitation to 3D vision Uncalibrated Camera Chapter 6 Reconstruction from Two Uncalibrated Views Modified by L A Rønningen Oct 2008.

Single-view geometry Odilon Redon, Cyclops, 1914.

Uncalibrated reconstruction Calibration with a rig Uncalibrated epipolar geometry Ambiguities in image formation Stratified reconstruction Autocalibration.

EECS 274 Computer Vision Projective Structure from Motion.

MASKS © 2004 Invitation to 3D vision. MASKS © 2004 Invitation to 3D vision Lecture 1 Overview and Introduction.

Lec 26: Fundamental Matrix CS4670 / 5670: Computer Vision Kavita Bala.

Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

René Vidal and Xiaodong Fan Center for Imaging Science

Multiple-View Geometry for Image-Based Modeling (Course 42)

A Unified Algebraic Approach to 2D and 3D Motion Segmentation

Pursuit Evasion Games and Multiple View Geometry

Parameter estimation class 5

Two-view geometry Computer Vision Spring 2018, Lecture 10

Epipolar geometry.

3D Photography: Epipolar geometry

Estimating 2-view relationships

Multiple View Geometry for Robotics

Uncalibrated Geometry & Stratification

Multiple-view Reconstruction from Points and Lines

Two-view geometry.

Two-view geometry.

Multi-view geometry.

Presentation transcript:

Geometry and Algebra of Multiple Views René Vidal Center for Imaging Science BME, Johns Hopkins University http://www.eecs.berkeley.edu/~rvidal Adapted for use in CS 294-6 by Sastry and Yang Lecture 13. October 11th, 2006 First let us talk about line features. To describe a line in 3-D, we need to specify a base point on the line and a vector indicating the direction of the line. On the image plane we can use a three dimensional vector l to describe the image of a line L. More specifically, if x is the image of a point on this line, its inner product with l is 0.

Two View Geometry From two views Why multiple views? Can recover motion: 8-point algorithm Can recover structure: triangulation Why multiple views? Get more robust estimate of structure: more data No direct improvement for motion: more data & unknowns

Why Multiple Views? Cases that cannot be solved with two views Uncalibrated case: need at least three views Line features: need at least three views Some practical problems with using two views Small baseline: good tracking, poor motion estimates Wide baseline: good motion estimates, poor correspondences With multiple views one can Track at high frame rate: tracking is easier Estimate motion at low frame rate: throw away data

Problem Formulation Input: Corresponding images (of “features”) in multiple images. Output: Camera motion, camera calibration, object structure. Here is a picture illustrating the fundamental problem that we are interested in multiple view geometry. The scenario is that we are given multiple pictures or images of some 3D object and after establishing correspondence of certain geometric features such as a point, line or curve, we then try to use such information to recover the relative locations of the camera where these images are taken, as well as recover the 3D structure of the object. measurements unknowns

Multiframe SFM as an Optimization Problem Can we minimize the re-projection error? Number of unknowns = 3n + 6 (m-1) – 1 Number of equations = 2nm Very likely to converge to a local minima Need to have a good initial estimate measurements unknowns

Motivating Examples Image courtesy of Paul Debevec

Motivating Examples Image courtesy of Kun Huang

Using Geometry to Tackle the Optimization Problem What are the basic relations among multiple images of a point/line? Geometry and algebra How can I use all the images to reconstruct camera pose and scene structure? Algorithm Examples Synthetic data Vision based landing of unmanned aerial vehicles

Projection: Point Features Homogeneous coordinates of a 3-D point Homogeneous coordinates of its 2-D image Projection of a 3-D point to an image plane Now let me quickly go through the basic mathematical model for a camera system. Here is the notation. We will use a four dimensional vector X for the homogeneous coordinates of a 3-D point p, its image on a pre-specified plane will be described also in homogeneous coordinate as a three dimensional vector x. If everything is normalized, then W and z can be chosen to be 1. We use a 3x4 matrix Pi to denote the transformation from the world frame to the camera frame. R may stand for rotation, T for translation. Then the image x and the world coordinate X of a point is related through the equation, where lambda is a scale associated to the depth of the 3D point relative to the camera center o. But in general the matrix Pi can be any 3x4 matrix, because the camera may add some unknown linear transformation on the image plane. Usually it is denoted by a 3x 3 matrix A(t).

Multiple View Matrix for Point Features WLOG choose frame 1 as reference Rank deficiency of Multiple View Matrix (generic) (degenerate)

Geometric Interpretation of Multiple View Matrix Entries of Mp are normals to epipolar planes Rank constraint says normals must be parallel

Multiple View Matrix for Point Features Mp encodes exactly the 3-D information missing in one image.

Rank Conditions vs. Multifocal Tensors Other relationships among four or more views, e.g. quadrilinear constraints, are algebraically dependent! Two views: epipolar constraint Three views: trilinear constraints

Reconstruction Algorithm for Point Features Given m images of n points The third problem, also the most important one, is how to recover camera configuration from given m images of a set of n points. Using the rank deficiency condition on M, the problem becomes looking for T2, R2, …, Tm, Rm such that the two columns of M are linearly dependent. That is, there exists coefficients alpha^j’s such that the equations hold. I won’t get into the detail how to determine those coefficients alpha^j’s even without knowing all the camera motions. For now, I only tell you their values depend on a choice of coordinate frames. These frames could be either Euclidean, affine or projective. Assuming these coefficients are known, then finding the camera configuration simply becomes a problem of solving a linear equation. This equation have a unique solution if we have in general more than 6 points.

Reconstruction Algorithm for Point Features Given m images of n (>6) points For the jth point SVD Iteration First assume we have m images of n points, where n is at least 8 (actually we need less, but 8 is necessary for initialization using two views linear algorithm). Then for each point, we have a rank condition on the m images for this particular point. If we know all the motions Ri, Ti, then based on its m images, we can find the kernel of the matrix and hence the depth (structure) of the point. On the other hand, for each view, the images of all the points share the same camera motion and hence motion Ri, Ti can be deduced also using SVD if the structure \lambda_I’s are know. If we have perfect data without noise, then first we can pick two views and use these two views to calculate the structure \lambda_I’s for all the points and then calculate the motion for each image. In the presence noise, what do we do? Iteration. Initialized the structure, calculate the motions, then re-calculate the structure and then motions… until it converges. For the ith image SVD

Reconstruction Algorithm for Point Features Initialization Set k=0 Compute using the 8-point algorithm Compute and normalize so that Compute as the null space of Compute new as the null space of Normalize so that If stop, else k=k+1 and goto 2.

Reconstruction Algorithm for Point Features 90.840 89.820

Reconstruction Algorithm for Point Features

Multiple View Matrix for Line Features Homogeneous representation of a 3-D line Homogeneous representation of its 2-D image Projection of a 3-D line to an image plane First let us talk about line features. To describe a line in 3-D, we need to specify a base point on the line and a vector indicating the direction of the line. On the image plane we can use a three dimensional vector l to describe the image of a line L. More specifically, if x is the image of a point on this line, its inner product with l is 0.

Multiple View Matrix for Line Features Point Features Line Features M encodes exactly the 3-D information missing in one image.

Multiple View Matrix for Line Features each is an image of a (different) line in 3-D: Rank =3 any lines Rank =2 intersecting lines Rank =1 same line What is the essential meaning of this rank 2 case? Here is an example explaining it. Consider a family of lines in 3D intersecting at one point p. You then randomly choose the image of any of the lines in the family in each view and form a multiple view matrix. Then this matrix in general has rank 2. We know before that if all the images chosen happen to correspond to the same 3D line, the rank of M is 1. Here, you don’t need to have exact correspondence among those lines, yet you still get some non-trivial constraints among their images… . . . . . . . . .

Multiple View Matrix for Line Features Continue with constraints that the rank of the matrix M_l imposes on multiple images of a line feature. If rank(M) is 1, it means that all the planes through every camera center and image line intersect at a unique line in 3-D. If the rank is 0, it corresponds to the only degenerate case that all the planes are the same hence the 3-D line is determined only up to a plane on which all the camera centers must lie.

Reconstruction Algorithm for Line Features Initialization Set k=0 Compute and using linear algorithm Compute and normalize so that Compute as the null space of Compute new as the null space of Normalize so that If stop, else k=k+1 and goto 2.

Reconstruction Algorithm for Line Features Initialization Use linear algorithm from 3 views to obtain initial estimate for and Given motion parameters compute the structure (equation of each line in 3-D) Given structure compute motion Stop if error is small stop, else goto 2.

Universal Rank Constraint What if I have both point and line features? Traditionally points and lines are treated separately Therefore, joint incidence relations not exploited Can we express joint incidence relations for Points passing through lines? Families of intersecting lines?

Universal Rank Constraint The Universal Rank Condition for images of a point on a line Multi-nonlinear constraints among 3, 4-wise images. Multi-linear constraints among 2, 3-wise images.

Universal Rank Constraint: points and lines

Universal Rank Constraint: family of intersecting lines each can randomly take the image of any of the lines: Nonlinear constraints among up to four views . . .

Universal Rank Constraint: multiple images of a cube Three edges intersect at each vertex. . . .

Universal Rank Constraint: multiple images of a cube

Universal Rank Constraint: multiple images of a cube

Universal Rank Constraint: multiple images of a cube

Multiple View Matrix for Coplanar Point Features Homogeneous representation of a 3-D plane Corollary [Coplanar Features] Rank conditions on the new extended remain exactly the same!

Multiple View Matrix for Coplanar Point Features Given that a point and line features lie on a plane in 3-D space: In addition to previous constraints, it simultaneously gives homography:

Example: Vision-based Landing of a Helicopter

Example: Vision-based Landing of a Helicopter Landing on the ground Tracking two meter waves

Example: Vision-based Landing of a Helicopter

General Rank Constraint for Dynamic Scenes For a fixed camera, assume the point moves with constant acceleration: Here is an example of dynamical scene. We have points in the scene moving with constant velocities or constant accelerations. The 3D location of the point X(t) is just a function of t with coefficients as the initial location X0, initial velocity v0 and constant acceleration a. An equivalent way is to write a fixed 10-dimensional vector \bar{X} containing all the initial location, velocity and acceleration, and a new projection matrix \bar{\Pi} containing the time bases. The image of the point at each time instant t is then characterized by this equation, which is a projection from 10-d space to 2-d image space. Compared with before, we have fixed points in 3d space and the projection is from 3d to 2d. Now the projection is from n-d to 2d. The key point is to decouple time base and constant dynamics and combine the time base in the projection matrix. Now the fixed point in n-d space no longer represent a fixed point 3d space, it represents the trajectory of a point. Then how to characterize the constraints in this new type of projection, actually, as we will see, we can extend this to the most general case where the projection is from n-d space to k-d image space. And now we start to study the algebraic constraints contained in it. Before: Now: Time base

General Rank Constraint for Dynamic Scenes Single hyperplane Inclusion Restriction to a hyperplane Intersection Different with the classic 3-d to 2-d space projection in which the features can only be point or line. Now features can be any p-dimensional hyperplane in n-d space. And there are various relationship among multiple features just like point can belong to a line. We call these relationships among features as incidence relations. There are 4 types of incidence relations. First is the most simple case where only one hyperplane is observed in all the images. Second, third, fourth.

General Rank Constraint for Dynamic Scenes Projection from n-dimensional space to k-dimensional space: We define the multiple view matrix as: where ‘s and ‘s are images and coimages of hyperplanes.

General Rank Constraint for Dynamic Scenes Projection from to Single hyperplane Projection from to

Summary Incidence relations rank conditions Rank conditions multiple-view factorization Rank conditions imply all multi-focal constraints Rank conditions for points, lines, planes, and (symmetric) structures.

Incidence Relations among Features “Pre-images” are all incident at the corresponding features Now consider multiple images of a simplest object, say a cube. All the constraints are incidence relations, are all of the same nature. Is there any way that we can express all the constraints in a unified way? Yes, there is. . . .

Traditional Multifocal or Multilinear Constraints Given m corresponding images of n points This set of equations is equivalent to Multilinear constraints among 2, 3, 4 views

Multiview Formulation: The Multiple View Matrix Point Features Line Features