Computer vision: models, learning and inference M Ahad Multiple Cameras

Computer vision: models, learning and inference M Ahad Multiple Cameras http://research.google.com/pubs/pub37112.html http://grail.cs.washington.edu/rome/ http://grail.cs.washington.edu/projects/interior/ http://phototour.cs.washington.edu/ http://phototour.cs.washington.edu/PhotoTourismPreview-640x480.mov BigBed: http://photosynth.net/view.aspx?cid=877fce1c-4aa9-405c-8024-4fa1dce6a84f Trevi Fountain: http://photosynth.net/view.aspx?cid=8089d414-fa91-4828-b88f-df07173edee4

Structure from Motion (SfM) Consider a single camera moving around a static object. The goal: to build a 3D model from the images taken by the camera. To do this, we will also need to simultaneously establish the properties of the camera and its position in each frame. This problem is widely known as structure from motion [although this is something of a misnomer as both ‘structure’ and ‘motion’ are recovered simultaneously. 2

Structure from motion 33Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Given an object that can be characterized by I 3D points projections into J images Find Intrinsic matrix Extrinsic matrix for each of J images 3D points

Structure 55Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Two view geometry The essential and fundamental matrices Reconstruction pipeline Rectification Multi-view reconstruction Applications

There is a geometric relationship between corresponding points in two images of the same scene. This geometric relationship or epipolar constrinat depends only on – the intrinsic parameters of the two cameras and – their relative translation and rotation of the 2 cameras (determined by extrinsic parameters). 6

Recap - Intrinsic parameters Intrinsic [inherent/essential] parameters: -Focal length parameter  different focal length parameter for x and y dims. -Skew parameter (gamma) -Offset parameter - Pixel (0,0) is where the principal ray strikes the image plane (i.e., the center) & a shift/offset to center (delta) 7

Epipolar lines 88Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Consider point x1 in the first image. The 3D point w that projected to x1 must lie somewhere along the ray that passes from the optical center of camera 1 through the position x1 in the image plane (dashed green line). But, we don't know where along that ray it lies (4 possibilities shown). It follows that x2, the projected position in camera 2 must lie somewhere on the projection of this ray. The projection of this ray is a line in image 2 and is referred to as an epipolar line.

Epiplar geometry Typical use case for epiplar geometry: 2 cameras take a picture of the same scene from different points of view. The epipolar geometry then describes the relation between the two resulting views. 9 Scene/object camera Resulting views/images for both cameras

Epipolar geometry is the geometry of stereo vision. When 2 cameras view a 3D scene from two distinct positions, there are a number of geometric relations between the 3D points and their projections onto the 2D images that lead to constraints between the image points. These relations are derived based on the assumption that the cameras can be approximated by the pinhole camera model. 10

2 pinhole cameras looking at point X. In real cameras, the image plane is actually behind the center of projection, and produces an image that is rotated 180 degrees উল্টা. Epipolar geometry, however, the projection problem is simplified by placing a virtual image plane in front of the center of projection of each camera to produce an unrotated image. O L and O R  the centers of projection of the two cameras. X  the point of interest in both cameras. Points x L and x R  the projections of point X onto the image planes. 12

Each camera captures a 2D image of the 3D world. This conversion from 3D to 2D is referred to as a perspective projection & is described by the pinhole camera model. It is common to model this projection operation by rays that emanate from the camera, passing through its center of projection. Note that each emanating ray corresponds to a single point in the image. 13

Epipole or epipolar point Since the centers of projection of the cameras are distinct, each center of projection projects onto a distinct point into the other camera's image plane. These 2 image points are denoted by e L and e R and are called epipoles or epipolar points. Both epipoles e L and e R in their respective image planes and both centers of projection O L and O R lie on a single 3D line. 14

Epipolar line The line O L –X is seen by the left camera as a point because it is directly in line with that camera's center of projection. However, the right camera sees this line as a line in its image plane. That line (e R –x R ) in the right camera is called an epipolar line. Symmetrically, the line O R –X seen by the right camera as a point is seen as epipolar line e L –x L by the left camera. An epipolar line is a function of the 3D point X, i.e. there is a set of epipolar lines in both images if we allow X to vary over all 3D points. Since the 3D line O L –X passes through the center of projection O L, the corresponding epipolar line in the right image must pass through the epipole e R (and correspondingly for epipolar lines in the left image). This means that all epipolar lines in one image must intersect the epipolar point of that image. In fact, any line which intersects with the epipolar point is an epipolar line since it can be derived from some 3D point X. Source: http://encyclopedia.thefreedictionary.com/Epipolar+geometryhttp://encyclopedia.thefreedictionary.com/Epipolar+geometry15

Epipolar constraint If the relative translation and rotation of the two cameras is known, the corresponding epipolar geometry leads to two important observations: If the projection point x L is known, then the epipolar line e R –x R is known and the point X projects into the right image, on a point x R which must lie on this particular epipolar line. This means that for each point observed in one image – the same point must be observed in the other image on a known epipolar line. 16

Epipolar constraint This provides an epipolar constraint which corresponding image points must satisfy and it means that it is possible to test if two points really correspond to the same 3D point. Epipolar constraints can be described by the essential matrix or the fundamental matrix between the two cameras. 17

For any point in the first image, the corresponding point in the second image is constrained to lie on a line. This is known as the epipolar constraint. The particular line that it is constrained to lie on depends on the intrinsic parameters of the cameras and The relative translation and rotation of the two cameras (determined by the extrinsic parameters). 18

triangulation If the points x L and x R are known, their projection lines are also known. If the two image points - correspond to the same 3D point X - the projection lines must intersect precisely at X. This means that X can be calculated from the coordinates of the two image points, a process called triangulation. 19

Epipole 20 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Now consider a number of points in the first image. Each is associated with a ray in 3D space. Each ray projects to form an epipolar line in the second image. Since all the rays converge at the optical center of the first camera, the epipolar lines must converge at a single point in the second image plane; this is the image in the second camera of the optical center of the first camera and is known as the epipole.

Special configurations: The epipoles are not necessarily within the observed images: the epipolar lines may converge to a point outside the visible area. 21 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince When the camera movement is a pure translation perpendicular to the optical axis (parallel to the image plane) & cameras are oriented in the same direction (i.e., no relative rotation) -- the epipolar lines are parallel and the epipole (where they converge) is at infinity.

Special config.-2 22 When the camera movement is a pure translation along the optical axis, the epipoles are in the center of the image and the epipolar lines form a radial pattern.

To calculate depth information from a pair of images we need to compute the epipolar geometry. In the calibrated environment we capture this geometric constraint in an algebraic representation known as the essential matrix. In the uncalibrated environment, it is captured in the fundamental matrix. 23

16.2Simon: The essential matrix Assume that the world coordinate system is centered on the 1 st camera so that the extrinsic parameters (rotation and translation) of the 1 st camera are {I, 0}. The 2 nd camera may be in any general position {omega, tau}. We will further assume that the cameras are normalized so that intrinsic params: A 1 = A 2 = I. 24

The geometric relationship between the two cameras is captured by the essential matrix. Assume normalized cameras, first camera at origin. In homogeneous coordinates, a 3D point w is projected into the two cameras as: First camera: 16.2Simon: The essential matrix 25 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince where ~x1 is the observed position in the cam1, & ~x2 is the observed position in cam2 This simplifies to:

p.371: 14.3 λ is an arbitrary scaling factor. This is a redundant representation in that any scalar multiple λ represents the same 2D point. E.g., the homogeneous vectors ~x = [2; 4; 2] T and ~x = [3; 6; 3] T both represent the Cartesian 2D point x = [1; 2] T, where scaling factors λ = 2 and λ = 3 have been used, respectively. 26

Similarly, for camera2 27 By a similar process, the projection in the second camera can be written as,

The essential matrix 28 First camera: Second camera: Substituting: This is a mathematical relationship between the points in the two images, but it’s not in the most convenient form. This relationship represents a constraint between the possible positions of corresponding points x1 and x2 in the two images. The constraint is parameterized by the rotation and translation {Omega, tau} of the cam2 relative to the cam1.

The essential matrix 29 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Take cross product of both sides with the translation vector  ( This removes the last term as the cross product of any vector with itself is zero ): Take inner product of both sides with ~ x 2 (The left hand side disappears since (τ x ~x 2 ) must be perpendicular to ~x 2 )

The cross product term can be expressed as a matrix Defining: We now have the essential matrix relation, The essential matrix 30 It is a formulation of the mathematical constraint between the positions of corresponding points x 1 and x 2 in two normalized cameras.  is known as the essential matrix

Computer vision: models, learning and inference M Ahad Multiple Cameras

Similar presentations

Presentation on theme: "Computer vision: models, learning and inference M Ahad Multiple Cameras"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computer vision: models, learning and inference M Ahad Multiple Cameras

Similar presentations

Presentation on theme: "Computer vision: models, learning and inference M Ahad Multiple Cameras"— Presentation transcript:

Similar presentations

About project

Feedback