Calibration Dorit Moshe
In today’s show How positions in the image relate to 3D positions in the world? We will use analytical geometry to quantify more precisely the relationship between a camera, the objects it observes, and the pictures of these objects We start by briefly recalling elementary notions of analytical Euclidean geometry. We then introduce the various physical parameters that relate the world and camera coordinate frames, and present as an application various methods for estimating these parameters, a process known as geometric camera calibration. We also present along the way a linear least-squares technique for parameter estimation
Motivation How positions in the image relate to 3D positions in the world? The reconstruction of 3D image is not trivial. We have to reconstruct the third coordinate!
Example Rabbit or Man? The information lays within the 3rd coordinate Markus Raetz, Metamorphose II, 1991-92
2D projections are not the “same” as the real object as we usually see everyday!
But…
Introduction Camera calibration: estimation of the unknown values in a camera model. Intrinsic parameters - Link the frame coordinates of an image point with its corresponding camera coordinates Extrinsic parameters - define the location and orientation of the camera coordinate system with respect to the world coordinate system
Euclidean Geometry - reminder Orthonormal coordinate frame (F) is defined by a point O in E3 and three unit vectors i, j and k orthogonal to each other
Transformations FP - the coordinate vector of the point P in the frame F Let’s consider two frames : A and B (A) = (OA, iA, jA, kA) (B) =(OB, iB, jB, kB) How can we express BP as a function of AP? Let us suppose that the basis vectors of both coordinate systems are parallel to each other, i.e., iA = iB, jA = jB and kA = kB, but the origins OA and OB are distinct We say that two coordinate systems are separated by a pure translation,
Pure translation BP = AP + BOA
Pure rotation When the origins of the two frames coincide, i.e., OA = OB = O, we say that the frames are separated by a pure rotation.
We define the rotation matrix
It means that for pure rotation BP BP
The inverse of a rotation matrix is equal to its transpose Its determinant is equal to 1 – the transform preserves the volume. Not every transformation that preserves the volume keeps the sign. For example - reflection This ortho normal transform preserves length and angles.
Example (Pure rotation) kA=kB=k The vector iB is obtained by applying to the vector iA a counterclockwise rotation of angle θ about k.
Translation and rotation- rigid transformation
As a single matrix equation: BP AP Rotation Transformation
Homogenous coordinates Add an extra coordinate and use an equivalence relation For 3D, equivalence relation k*(X,Y,Z,T) is the same as (X,Y,Z,T) Motivation – it will be possible to write the action of a perspective camera as a matrix
Homogenous/Non-Homogenous transformation for 3D point Non-homogenous to homogenous – add “1” as the 4th coordinate: Homogenous to non- homogenous – devide 1st 3 coordinates by the 4th
Homogenous/Non-Homogenous transformation for 2D point Non-homogenous to homogenous – add “1” as the 3rd coordinate: Homogenous to non- homogenous – devide 1st 2 coordinates by the 3rd
Camera calibration Use the camera to tell you things about the world Relationship between coordinates in the world and coordinates in the image : geometric calibration (We will not discuss here the relationship between intensities in the world and intensities in the image : photometric camera calibration.)
Three coordinate systems involved Camera: perspective projection. Image: intrinsic/internal camera parameters World: extrinsic/external camera parameters
The camera perspective equation The coordinates (x, y, z) of a scene point P observed by a pinhole camera are related to its image coordinates (x’, y’) by the perspective equation We have by similar triangles (x,y,z)-> (f x/z, f y/z, -f ). Ignoring the third coordinate, we get (x,y,z)-> (f x/z, f y/z) P P’
Intrinsic parameters Relate the camera’s coordinate system to the idealized coordinate system We can associate with a camera two different image planes: the first one is a normalized plane located at a unit distance from the pinhole. We attach to this plane its own coordinate system with an origin located at the point where the optical axis pierces it. According to the perspective eq : Perspective projection:
Intrinsic parameters(cont) f 1
Intrinsic parameters(cont) The second is the physical retina. It is located at a distance f ≠1 from the pinhole, and the image coordinates (u,v) are usually expressed in pixel units. Pixels are usually rectangular, so the camera has two additional scale parameters k and l, and: f is a distance in meters Define :
Intrinsic parameters(cont) The actual origin of the camera coordinate system is at a corner C of the retina, and not at its center. It adds two parameters u0 and v0 that define the position (in pixel units) of C0 in the retinal coordinate system.
Intrinsic parameters(cont) The camera coordinate system may also be skewed, due to some manufacturing error, so the angle θ between the two image axes is not equal to 90°.
Intrinsic parameters(cont) Using homogenous coordinates: 3x4 matrix
Intrinsic parameters(cont) The physical size of the pixels and the skew are always fixed for a given camera, and they can in principle be measured during manufacturing
Extrinsic parameters Relate the camera’s coordinate system to a fixed world coordinate system and specify its position and orientation in space. We consider the case where the camera frame (C) is distinct from the world frame (W). Non-homogenous coordinates Homogenous coordinates
Extrinsic parameters(cont)
Combining extrinsic and intrinsic calibration parameters M can be defined with 11 free coefficients 5 are intrinsic parameters – α,β,u0,v0,θ 6 are extrinsic – the 3 angles defining R, 3 coordinates of t 3 coords of t 3 raws of R M is only defined up to scale in this setting!!
Rewriting the equation World coordinates Pixel coordinates
Z is in the camera coordinate system, but we can solve it, cause And we get Relation between image positions, u,v to points at 3D positions in P (homogenous coordinates)
Which features should we choose? Calibration methods Techniques for estimating the intrinsic and extrinsic parameters of a camera Suppose that a camera observes n geometric features such as points or lines with known positions in some fixed world coordinate system. We will: Compute the perspective projection matrix M associated with the camera in this coordinate system Compute the intrinsic and extrinsic parameters of the camera from this matrix Which features should we choose?
A linear approach to camera calibration For each feature point, i, we have : For n features, we will get 2n equations
A linear approach to camera calibration(cont) =0 P m
A linear approach to camera calibration(cont) When n>=6, the system is over-constrained, i.e. there is no non-zero vector m in R12 that satisfies exactly these equations. On the other hand, a zero vector is always a solution. According to the linear least-squares methods, we want to compute the value of the unit vector m that minimizes |Pm|2. In particular, estimating the vector m reduces to computing the eigenvectors and eigenvalues of the 12x12 matrix PTP
Linear least squares methods Let us consider a system of n equations, p unknowns: A is n x p matrix with coefficients aij, x = (x1,..,xp)T There is no single solution if n≥p. The non trivial solution exists only if A is non singular. We will try to find vector x that minimize E (the error measure):
Linear least squares methods(cont) Need to impose a contraint on x, since x=0 yields the minimum. Since E(λx) = λ2E(x), we will use the contraint |x2|=1. E=xT(ATA)x , where ATA (pxp) matrix is positive smmetric matrix It can be diagonalized in an orthonormal basis of eigenvectors ei (i=1,..,p) associated with eigenvalues 0≤ λ1 ≤ λ1 ≤ … ≤ λp we can write x as x=μ1e1+… μnen, that (μ12 +…+ μp2 )=1 e1 minimizes the error E. It is the eigenvector associated with the minimum eigenvalue of ATA (λ1 ) :
Recovering the intrinsic and extrinsic parameters Once we have the M matrix, we can recover the intrinsic and extrinsic parameters in a simple mathematical process, described in Forsyth&Ponce, section 6.3.1
Camera Calibration with a Single Image Sometimes more than one view of the same picture is used to estimate calibration parameters (For example, Stereo) Most Camera parameters can be estimated from the measurements of a single image when sufficient geometric object knowledge is available. The object knowledge used in the approach described here consists of parallelism and perpendicularity assumptions of straight object edges. In buildings parallel and perpendicular edges are usually abundant. Therefore, this method is often applicable for historic imagery of possibly demolished buildings taken with an unknown camera.
Targets for camera calibration
Projection of each point gives us two equations and there are 11 unknowns. 6 points in general position are sufficient for calibration.
The 6 anchor points clicked by the user are represented in green The 6 anchor points clicked by the user are represented in green. If the user had clicked more accurately, they should lie exactly at the corners of the small white squares. Using these 6 points and the corresponding 3D anchor points, the program computes an initial estimate of the projection matrix. http://www-sop.inria.fr/robotvis/personnel/lucr/detecproj.html
We take as input a set of at least 6 non-coplanar 3D anchor points, and their 2D images. The 2D coordinates do not need to be very accurate, they are typically obtained manually by a user who clicks their approximate position.
Summary We saw the goal of calibration We mentioned Euclidean Geometry We learned about internal/external camera parameters We learned how to compute them from a given set of points We saw an example of calibration by one picture only We will see (stereo lecture) computing 3D coordinates from more than one picture (More than one view).