Venus
Classification
Faces – Different
Faces -- Same
Lighting affects appearance
Three-point alignment
Object Alignment Given three model points P 1, P 2, P 3, and three image points p 1, p 2, p 3, there is a unique transformation (rotation, translation, scale) that aligns the model with the image. (SR + d)P i = p i
Alignment -- comments The projection is orthographic projection (combined with scaling). The 3 points are required to be non-collinear. The transformation is determined up to a reflection of the points about the image plane and translation in depth.
Proof of the 3-point Alignment: The 3 3-D points are P1, P2, P3. We can assume that they are initially in the image plane. In the 2-D image we get q1, q2, q3. The transformation P1 > q1, P2 > q2, P3 > q3, defines a unique linear transformation of the plane, L(x). We can easily recover this transformation. L is a 2*2 matrix. We fix the origin at P1 = q1. We have two more points that define 4 linear equations for the elements of L. We now choose two orthogonal vectors E1 and E2 in the original plane of P1, P2, P3. We can compute E1’ = L(E1), E2’ = L(E2). We seek a scaling S, Rotation R, so that the projection of SR(E1) = E1’ and SR(E2) = E2’. Let SR(E1) (without the projection) be V1 and SR(E2) = V2. V1 is E1’ plus a depth component, that is, V1 = E1’ + c1z, where z is a unit vector in the z direction. Similarly, V2 = E2’ + c2z. We wish to recover c1 and c2. This will give the transformation between the points (show that it is unique, and it will be possible to recover the transformation). We know that the scalar product (V1 V2) = 0. (E1’ + c1z) (E1’ + c1z) = 0 Therefore c1c2 = -(E’1 E’2). The magnitude -(E’1 E’2) is measurable in the image, call it C12, therefore c1c2 = c12. Also |V1| = |V2|. Therefore (E1’ + c1z) (E1’ + c1z) = (E1’ + c1z) (E1’ + c1z). This implies c1 2 - c2 2 = k12, where k12 is a measurable quantity in the image (it is |E’1 2 | - |E’2 2 |. The two equation of c1 c2 are: c1c2 = c12 c1 2 - c2 2 = k12 and they have a unique solution. One way of seeing this is by setting a complex number Z = c1 + ic2. Then Z 2 = k12 + ic12. Therefore, Z 2 is measurable. We take the square root and get Z, therefore c1, c2. There are exactly two roots giving the two mirror reflection solutions.
Car Recognition
Car Models
Alignment: Cars
Alignment: Unmatch
Face Alignment
Linear Combination of Views
O is a set of object points. I 1, I 2, I 3, are three images of O from different views. N is a novel view of O. Then O is the linear combination of I 1, I 2, I 3.
Car Recognition
VW – SAAB
LC – Car Images
Linear Combination: Faces
Classification
Structural descriptions
RBC
Structural Description G2 G4 G3 G1 G4 Above Right-of Left-of Touch
Fragment-based Representation
Mutual Information Mutual information Entropy Binary variable -H(C) = P(C=1)Log(P(C=1) + P(C=0)Log(P(C=0)
Mutual information H(C) when F=1H(C) when F=0 I(C;F) = H(C) – H(C/F) F=1 F=0 H(C)
Selecting Fragments
Fragments Selection For a set of training images: Generate candidate fragments –Measure p(F/C), p(F/NC) Compute mutual information Select optimal fragment After k fragments: Maximizing the minimal addition in mutual information with respect to each of the first k fragments
Optimal Face Fragments
Face Fragments by Type 1d. 1e. 1-st. Merit Weight nd 3-rd 4-th
Low-resolution Car Fragments Front – Middle - Back
Intermediate Complexity
Fragment ‘Weight’ Likelihood ratio: Weight of F:
Combining fragments w1w1 wkwk w2w2 D1D1 D2D2 DkDk
Non-optimal Fragments FragmentssizedetectionF/A Optimal11%950 Small3%9730 Large33%390 Same total area covered (8*object), on regular grid
Training & Test Images Frontal faces without distinctive features (K:496,W:385) Minimize background by cropping Training images for extraction: 32 for each class Training images for evaluation: 100 for each class Test images: 253 for Western and 364 for Korean
Training – Fragment Extraction
Western Fragment Score Weight Korean Fragment Score Weight Extracted Fragments
Classifying novel images Westerner Korean Unknown kFkF wFwF Detect Fragments Compare Summed Weights Decision
Effect of Number of Fragments 7 fragments: 95%, 80 fragments: 100% Inherent redundancy of the features Slight violation of independence assumption
Comparison with Humans Algorithm outperformed humans for low resolution images
Class examples
Distinctive Features