Joint Estimation of Image Clusters and Image Transformations Brendan J. Frey Computer Science, University of Waterloo, Canada Beckman Institute and ECE, Univ of Illinois at Urbana Nebojsa Jojic Beckman Institute, University of Illinois at Urbana
We’d like to cluster images, but The unknown subjects have unknown positions
The unknown subjects have unknown positions unknown rotations unknown scales unknown levels of shearing...
One approach Normalization Pattern Analysis Images Normalized images Labor
Another approach Apply transformations to each image Pattern Analysis Images Huge data set Assumes transformations are equally likely noise gets copied analysis is more complex
Yet another approach Extract transformation- invariant features Pattern Analysis Images Transformation- invariant data Difficult to work with May hide useful features
Our approach Joint Normalization and Pattern Analysis Images
A continuous transformation moves an image,, along a continuous curve Our clustering algorithm should assign images near this nonlinear manifold to the same cluster What transforming an image does in the vector space of pixel intensities
Tractable approaches to modeling the transformation manifold \ Linear approximation - good locally, bad globally Finite-set approximation - good globally, bad locally
Generative models Local invariance: PCA, Turk, Moghaddam, Pentland (96); factor analysis, Hinton, Revow, Dayan, Ghahramani (96); Frey, Colmenarez, Huang (98) Layered motion: Black,Jepson,Wang,Adelson,Weiss(93-98) Learning discrete representations of generative manifolds Generative topographic maps, Bishop,Svensen,Williams (98) Discriminative models Local invariance: tangent distance, tangent prop, Simard, Le Cun, Denker, Victorri (92-93) Global invariance: convolutional neural networks, Le Cun, Bottou, Bengio, Haffner (98) Related work
Generative density modeling The goal is to find a probability model that –reflects the structure we want to extract –can randomly generate plausible images, –represents the data using parameters ML estimation is used to find the parameters We can use class-conditional likelihoods, p(image|class) for recognition, detection,...
Mixture of Gaussians c The probability that an image comes from cluster c = 1,2,… is P(c) = c
Mixture of Gaussians c z The probability of pixel intensities z given that the image is from cluster c is p(z|c) = N(z; c, c ) P(c) = c
Mixture of Gaussians c P(c) = c z p(z|c) = N(z; c, c ) Parameters c, c and c represent the data For input z, the cluster responsibilities are P(c|z) = p(z|c)P(c) / c p(z|c)P(c)
Example: Hand-crafted model c P(c) = c z p(z|c) = N(z; c, c ) 1 = 0.6, 2 = 0.4,
Example: Simulation c P(c) = c z p(z|c) = N(z; c, c ) 1 = 0.6, 2 = 0.4,
Example: Simulation c =1 P(c) = c z p(z|c) = N(z; c, c ) 1 = 0.6, 2 = 0.4,
Example: Simulation c =1 P(c) = c z=z= p(z|c) = N(z; c, c ) 1 = 0.6, 2 = 0.4,
Example: Simulation c P(c) = c z p(z|c) = N(z; c, c ) 1 = 0.6, 2 = 0.4,
Example: Simulation c =2 P(c) = c z p(z|c) = N(z; c, c ) 1 = 0.6, 2 = 0.4,
Example: Simulation c =2 P(c) = c z=z= p(z|c) = N(z; c, c ) 1 = 0.6, 2 = 0.4,
Example: Inference c z 1 = 0.6, 2 = 0.4, Images from data set
Example: Inference c =1 1 = 0.6, 2 = 0.4, Images from data set z=z= c =2 P(c|z) c
Example: Inference 1 = 0.6, 2 = 0.4, Images from data set z=z= c c =1 c =2 P(c|z)
Example: Learning - E step c z 1 = 0.5, 2 = 0.5, Images from data set
Example: Learning - E step c =1 Images from data set z=z= c =2 P(c|z) c 1 = 0.5, 2 = 0.5,
Example: Learning - E step Images from data set z=z= c c =1 c =2 P(c|z) 1 = 0.5, 2 = 0.5,
Example: Learning - E step Images from data set z=z= c c =1 c =2 P(c|z) 1 = 0.5, 2 = 0.5,
Example: Learning - E step Images from data set z=z= c c =1 c =2 P(c|z) 1 = 0.5, 2 = 0.5,
Example: Learning - M step c 1 = 0.5, 2 = 0.5, z Set 1 to the average of zP(c =1 |z) Set 2 to the average of zP(c =2 |z)
Example: Learning - M step c 1 = 0.5, 2 = 0.5, z Set 1 to the average of zP(c =1 |z) Set 2 to the average of zP(c =2 |z)
Example: Learning - M step c 1 = 0.5, 2 = 0.5, z Set 1 to the average of diag((z- 1 ) T (z- 1 ))P(c =1 |z) Set 2 to the average of diag((z- 2 ) T (z- 2 ))P(c =2 |z)
Example: Learning - M step c 1 = 0.5, 2 = 0.5, z Set 1 to the average of diag((z- 1 ) T (z- 1 ))P(c =1 |z) Set 2 to the average of diag((z- 2 ) T (z- 2 ))P(c =2 |z)
Example: After iterating EM... c z 1 = 0.6, 2 = 0.4,
Adding “transformation” as a discrete latent variable Say there are N pixels We assume we are given a set of sparse N x N transformation generating matrices G 1,…,G l,…,G L These generate points from point
Transformed Mixture of Gaussians c The probability that the image comes from cluster c = 1,2,… is P(c) = c
Transformed Mixture of Gaussians c z The probability of latent image z for cluster c is p(z|c) = N(z; c, c ) P(c) = c
Transformed Mixture of Gaussians l The probability of transf l = 1,2,… is P(l) = l p(z|c) = N(z; c, c ) c z P(c) = c
Transformed Mixture of Gaussians The probability of observed image x is p(x|z,l) = N(x; G l z, ) x P(l) = l l p(z|c) = N(z; c, c ) c z P(c) = c
Transformed Mixture of Gaussians l, c, c and c represent the data The cluster/transf responsibilities, P(c,l|x), are quite easy to compute p(x|z,l) = N(x; G l z, ) x P(l) = l l p(z|c) = N(z; c, c ) c z P(c) = c
Example: Hand-crafted model G 1 = shift left and up, G 2 = I, G 3 = shift right and up x l c z l = 1, 2, 3 1 = 0.6, 2 = 0.4 1 = 2 = 3 = 0.33
Example: Simulation x l c z G 1 = shift left and up, G 2 = I, G 3 = shift right and up
Example: Simulation c =1 G 1 = shift left and up, G 2 = I, G 3 = shift right and up x lz
Example: Simulation c =1 G 1 = shift left and up, G 2 = I, G 3 = shift right and up z=z= x l
Example: Simulation l =1 c =1 G 1 = shift left and up, G 2 = I, G 3 = shift right and up z=z= x
Example: Simulation l =1 c =1 G 1 = shift left and up, G 2 = I, G 3 = shift right and up z=z= x=x=
Example: Simulation x l c z G 1 = shift left and up, G 2 = I, G 3 = shift right and up
Example: Simulation c =2 G 1 = shift left and up, G 2 = I, G 3 = shift right and up x lz
Example: Simulation c =2 G 1 = shift left and up, G 2 = I, G 3 = shift right and up z=z= x l
Example: Simulation l =3 c =2 G 1 = shift left and up, G 2 = I, G 3 = shift right and up z=z= x
Example: Simulation l =3 c =2 G 1 = shift left and up, G 2 = I, G 3 = shift right and up z=z= x=x=
ML estimation of a Transformed Mixture of Gaussians using EM x l c z E step: Compute P(l|x), P(c|x) and p(z|c,x) for each x in data M step: Set – c = avg of P(c|x) – l = avg of P(l|x) – c = avg mean of p(z|c,x) – c = avg variance of p(z|c,x) – = avg var of p(x-G l z|x)
A Tough Toy Problem 4 different shapes 25 possible locations cluttered background fixed distraction 100 “clusters” 200 training cases
Mixture of Gaussians Mean and first 5 principal components Transformed Mixture of Gaussians 5 horiz shifts + 5 vert shifts 20 iterations of EM
Face Clustering Examples of 400 outdoor images of 2 people (44 x 28 pixels)
Mixture of Gaussians 15 iterations of EM (MATLAB takes 1 minute) Cluster means c = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians 11 horizontal shifts; 11 vertical shifts 4 clusters Each cluster has 1 mean and 1 variance for each latent pixel 1 variance for each observed pixel Training: 15 iterations of EM (MATLAB script takes 10 sec/image)
Initialization Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
1 iteration of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
2 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
3 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
4 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
5 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
6 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
7 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
8 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
9 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
10 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
11 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
12 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
13 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
14 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
15 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
20 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
30 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians
Mixture of Gaussians 30 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4
Modeling Written Digits
A TMG that Captures Writing Angle P(l|x) identifies the writing angle in image x CLUSTERSCLUSTERS TRANSFORMATIONS
Wrap-up MATLAB scripts available at Other domains: audio, bioinformatics, … Other latent image models, p(z) –factor analysis (prob PCA) (ICCV99) –mixtures of factor analyzers (NIPS99) –time series (CVPR00) Automatic video clustering Fast variational inference and learning