The Pinhole Camera Model X m F I y f Z C c x Y M image plane focal plane The geometric model of pinhole camera consists of an image plane I and an eyepoint C on the focal plane F. The fundamental property of perspective is that every image point m is collinear with the C and its corresponding world point M. The point C is also called optical center, or the focus. The line Cc, perpendicular to I and to F, is called optical axis, c is called the principal point. 6/2/2019 CS-236607 Visual Recognition
Equations for Perspective Projection Let (C,X,Y,Z) be the camera coordinate system (c.s.) and (c,x,y) be the image c.s. It’s clear that From the geometric viewpoint, there is no difference to replace the image plane by a virtual image plane located on the other side of the focal plane. In this new c.s, an image point (x,y) has 3D coordinates (x,y,f). 6/2/2019 CS-236607 Visual Recognition
The Pinhole Camera Model X I x f C c Z y Y m M lM 6/2/2019 CS-236607 Visual Recognition
Perspective Projection Matrix In projective geometry any point along the ray going through the optical center projects to the same image point. So rescaling “homogeneous coordinates” makes no difference: (X,Y,Z) ~ s(X,Y,Z) = (s X, s Y, s Z) It can be seen from (1): Equations (1) can be rewritten linearly (s arbitrary): 6/2/2019 CS-236607 Visual Recognition
Perspective Projection Matrix and Extrinsic Parameters Given a vector x=[x,y,…]T, we use to denote augmented vector by adding 1 as the last element. The 3x4 matrix P is called the camera perspective projection matrix. Given a 3D point M=[X,Y,Z]T and its image m=[x,y]T (2) can be written in matrix form as (with arbitrary scalar s): For real image point, s should not be 0. If s=0, then Z=0, and the 3D point is in the focal plane and image coordinates x and y are not defined. For all points in the focal plane but C 6/2/2019 CS-236607 Visual Recognition
Perspective Projection Matrix and Extrinsic Parameters their corresponding points in the image plane are at infinity. For the optical center C, we have x=y=s=0 and X=Y=Z=0. In practice 3D points can be expressed in arbitrary world c.s. (not only the camera c.s.). We go from the old c.s. centered at the optical center C to the new c.s. centered at point O (world c.s.) by a rotation R followed by a translation t=CO. A relation between coordinates of a single point in a camera c.s. Mc and in the world c.s. Mw is: Mc = R Mw + t or more compactly where D is Euclidean transformation of the 3D space: 6/2/2019 CS-236607 Visual Recognition
Perspective Projection Matrix and Extrinsic Parameters The matrix R and the vector t describe the orientation and position of the camera with respect to the new world c.s. They are called the extrinsic parameters of the camera (3 rotations +3 translations). X (R,t) Xw I x Zw C c Z y O camera c.s. Yw world c.s. Y m M 6/2/2019 CS-236607 Visual Recognition
Perspective Projection Matrix From (3) and (4) we have: Therefore the new perspective projection matrix is: In real images, the origin of the image c.s. is not the principal point and the scaling along each image axis is different, so the image coordinates undergo a further transformation described by some matrix K, and finally we have: 6/2/2019 CS-236607 Visual Recognition
Intrinsic Parameters of the Camera K is independent of the camera position. It contains the interior (or intrinsic) parameters of the camera. It is represented as an upper triangular matrix: where and stand for the scaling along the x and y axes of the image plane, gives the skew (non-orthogonality) between the axes, and (u0, v0) are the coordinates of the principal point. y v m v0 c x q o u0 u 6/2/2019 CS-236607 Visual Recognition
Intrinsic Parameters of the Camera For a given point, let . Since we have and thus Normalized coordinate system of the camera is a system where the image plane is located at a unit distance from the optical center (i.e. f=1). The perspective projection matrix P in such c.s. is given by 6/2/2019 CS-236607 Visual Recognition
Intrinsic Parameters of the Camera For a world point its coordinates in normalized coordinate system are A matrix Pnew defined by (10) can be decomposed: where Matrix A contains only intrinsic parameters, and is called camera intrinsic matrix. 6/2/2019 CS-236607 Visual Recognition
Intrinsic Parameters of the Camera It is thus clear that the normalized image coordinates are given by Through this transformation from the available pixel image coordinates,[u,v]T, to the imaginary normalized image coordinates the projection from the space onto the normalized image does not depend on the specific cameras. This frees us from thinking about characteristics of the specific cameras and allows us to think in terms of ideal systems 6/2/2019 CS-236607 Visual Recognition
The General Form of the Perspective Projection Matrix Camera can be considered as a system with intrinsic and extrinsic parameters. Here are 5 intrinsic parameters: the coordinates u0,v0 of principal point, and the angle between the two image axes. There are 6 extrinsic parameters, three for the rotation and three for the translation, which define the transformation from the world coordinate system, to the standard coordinate system of the camera. Combining (7) and (13) yields the general form of the perspective projection matrix of the camera: The projection of 3D world coordinates to 2D pixel coordinates is then given by (s is arbitrary scale factor) 6/2/2019 CS-236607 Visual Recognition
The General Form of the Perspective Projection Matrix cont. Matrix P has 3x4=12 elements, but has only 11 degrees of freedom. Why? Let be the (i,j) entry of the matrix P. Eliminating the scalar s in (17) yields two nonlinear equations: 6/2/2019 CS-236607 Visual Recognition
The General Form of the Perspective Projection Matrix cont. Problem 1. Given the perspective projection matrix P find coordinates of the optical center C of the camera in the world coordinate system. Solution. Decompose the 3x4 matrix P as the concatenation of 3x3 matrix B and a 3-vector b, i.e. P =[B b]. Assume that the rank of B is 3. Under the pinhole model, the optical center projects to [0 0 0]T (i.e. s=0). Therefore, the optical center can be obtained by solving The solution is 6/2/2019 CS-236607 Visual Recognition
The General Form of the Perspective Projection Matrix cont. Problem 2. Given matrix P and an image point m find an optical ray going through this point. Solution. The optical center C is on the optical ray. Any point on this ray is also projected on m. Without loss of generality, we can choose the point D such that the scale factor s =1, i.e. This gives A point on the optical ray is thus given by Where l varies from 0 to 6/2/2019 CS-236607 Visual Recognition
Perspective Approximations The perspective projection (2) is a nonlinear mapping which makes it difficult to solve many vision problems. It also ill-conditioned when perspective effects are small. There are several linear mappings, approximating the perspective projection: Orthographic Projection. It ignores the depth dimension. It can be used if distance and position effects can be ignored. 6/2/2019 CS-236607 Visual Recognition
Orthographic and Weak Perspective Projection Orthographic Projection X I x C c Z y Y 6/2/2019 CS-236607 Visual Recognition
Orthographic and Weak Perspective Projection Orthographic Projection X I C Z Y 6/2/2019 CS-236607 Visual Recognition
Weak Perspective Projection Much more reasonable approximation is Weak Perspective Projection. When the object size is small enough with respect to the distance from the camera to the object, Z can be replaced by a common depth Zc . Then the equations (1) become linear: Here we assumed that the focal length f is normalized to 1 6/2/2019 CS-236607 Visual Recognition
Weak Perspective Projection Two step projection: image plane average depth plane X I C Zc Z Y 6/2/2019 CS-236607 Visual Recognition
Weak Perspective Projection Let Equation (12) can be written as 6/2/2019 CS-236607 Visual Recognition
Weak Perspective Projection Taking into account the intrinsic and extrinsic parameters of the camera yields: where A is the intrinsic matrix (14), and D is the rigid transformation (5). 6/2/2019 CS-236607 Visual Recognition