Joint Estimation of Image Clusters and Image Transformations Brendan J. Frey Computer Science, University of Waterloo, Canada Beckman Institute and ECE,

Joint Estimation of Image Clusters and Image Transformations Brendan J. Frey Computer Science, University of Waterloo, Canada Beckman Institute and ECE, Univ of Illinois at Urbana Nebojsa Jojic Beckman Institute, University of Illinois at Urbana

We’d like to cluster images, but The unknown subjects have unknown positions

The unknown subjects have unknown positions unknown rotations unknown scales unknown levels of shearing...

One approach Normalization Pattern Analysis Images Normalized images Labor

Another approach Apply transformations to each image Pattern Analysis Images Huge data set Assumes transformations are equally likely noise gets copied analysis is more complex

Yet another approach Extract transformation- invariant features Pattern Analysis Images Transformation- invariant data Difficult to work with May hide useful features

Our approach Joint Normalization and Pattern Analysis Images

A continuous transformation moves an image,, along a continuous curve Our clustering algorithm should assign images near this nonlinear manifold to the same cluster What transforming an image does in the vector space of pixel intensities

Tractable approaches to modeling the transformation manifold \ Linear approximation - good locally, bad globally Finite-set approximation - good globally, bad locally

Generative models Local invariance: PCA, Turk, Moghaddam, Pentland (96); factor analysis, Hinton, Revow, Dayan, Ghahramani (96); Frey, Colmenarez, Huang (98) Layered motion: Black,Jepson,Wang,Adelson,Weiss(93-98) Learning discrete representations of generative manifolds Generative topographic maps, Bishop,Svensen,Williams (98) Discriminative models Local invariance: tangent distance, tangent prop, Simard, Le Cun, Denker, Victorri (92-93) Global invariance: convolutional neural networks, Le Cun, Bottou, Bengio, Haffner (98) Related work

Generative density modeling The goal is to find a probability model that –reflects the structure we want to extract –can randomly generate plausible images, –represents the data using parameters ML estimation is used to find the parameters We can use class-conditional likelihoods, p(image|class) for recognition, detection,...

Mixture of Gaussians c The probability that an image comes from cluster c = 1,2,… is P(c) =  c

Mixture of Gaussians c z The probability of pixel intensities z given that the image is from cluster c is p(z|c) = N(z;  c,  c ) P(c) =  c

Mixture of Gaussians c P(c) =  c z p(z|c) = N(z;  c,  c ) Parameters  c,  c and  c represent the data For input z, the cluster responsibilities are P(c|z) = p(z|c)P(c) /  c p(z|c)P(c)

Example: Hand-crafted model c P(c) =  c z p(z|c) = N(z;  c,  c )      1 = 0.6,   2 = 0.4,

Example: Simulation c P(c) =  c z p(z|c) = N(z;  c,  c )      1 = 0.6,   2 = 0.4,

Example: Simulation c =1 P(c) =  c z p(z|c) = N(z;  c,  c )      1 = 0.6,   2 = 0.4,

Example: Simulation c =1 P(c) =  c z=z= p(z|c) = N(z;  c,  c )      1 = 0.6,   2 = 0.4,

Example: Simulation c P(c) =  c z p(z|c) = N(z;  c,  c )      1 = 0.6,   2 = 0.4,

Example: Simulation c =2 P(c) =  c z p(z|c) = N(z;  c,  c )      1 = 0.6,   2 = 0.4,

Example: Simulation c =2 P(c) =  c z=z= p(z|c) = N(z;  c,  c )      1 = 0.6,   2 = 0.4,

Example: Inference c z      1 = 0.6,   2 = 0.4, Images from data set

Example: Inference c =1      1 = 0.6,   2 = 0.4, Images from data set z=z= c =2 P(c|z) c 0.99 0.01

Example: Inference      1 = 0.6,   2 = 0.4, Images from data set z=z= c c =1 c =2 P(c|z) 0.02 0.98

Example: Learning - E step c z      1 = 0.5,   2 = 0.5, Images from data set

Example: Learning - E step c =1 Images from data set z=z= c =2 P(c|z) c 0.52 0.48      1 = 0.5,   2 = 0.5,

Example: Learning - E step Images from data set z=z= c c =1 c =2 P(c|z) 0.51 0.49      1 = 0.5,   2 = 0.5,

Example: Learning - M step c      1 = 0.5,   2 = 0.5, z Set  1 to the average of zP(c =1 |z) Set  2 to the average of zP(c =2 |z)

Example: Learning - M step c      1 = 0.5,   2 = 0.5, z Set  1 to the average of diag((z-  1 ) T (z-  1 ))P(c =1 |z) Set  2 to the average of diag((z-  2 ) T (z-  2 ))P(c =2 |z)

Example: After iterating EM... c z      1 = 0.6,   2 = 0.4,

Adding “transformation” as a discrete latent variable Say there are N pixels We assume we are given a set of sparse N x N transformation generating matrices G 1,…,G l,…,G L These generate points from point

Transformed Mixture of Gaussians c The probability that the image comes from cluster c = 1,2,… is P(c) =  c

Transformed Mixture of Gaussians c z The probability of latent image z for cluster c is p(z|c) = N(z;  c,  c ) P(c) =  c

Transformed Mixture of Gaussians l The probability of transf l = 1,2,… is P(l) =  l p(z|c) = N(z;  c,  c ) c z P(c) =  c

Transformed Mixture of Gaussians The probability of observed image x is p(x|z,l) = N(x; G l z,  ) x P(l) =  l l p(z|c) = N(z;  c,  c ) c z P(c) =  c

Transformed Mixture of Gaussians  l,  c,  c and  c represent the data The cluster/transf responsibilities, P(c,l|x), are quite easy to compute p(x|z,l) = N(x; G l z,  ) x P(l) =  l l p(z|c) = N(z;  c,  c ) c z P(c) =  c

Example: Hand-crafted model     G 1 = shift left and up, G 2 = I, G 3 = shift right and up x l c z l = 1, 2, 3  1 = 0.6,  2 = 0.4  1 =  2 =  3 = 0.33

Example: Simulation     x l c z G 1 = shift left and up, G 2 = I, G 3 = shift right and up

Example: Simulation     c =1 G 1 = shift left and up, G 2 = I, G 3 = shift right and up x lz

Example: Simulation     c =1 G 1 = shift left and up, G 2 = I, G 3 = shift right and up z=z= x l

Example: Simulation     l =1 c =1 G 1 = shift left and up, G 2 = I, G 3 = shift right and up z=z= x

Example: Simulation     l =1 c =1 G 1 = shift left and up, G 2 = I, G 3 = shift right and up z=z= x=x=

Example: Simulation     x l c z G 1 = shift left and up, G 2 = I, G 3 = shift right and up

Example: Simulation     c =2 G 1 = shift left and up, G 2 = I, G 3 = shift right and up x lz

Example: Simulation     c =2 G 1 = shift left and up, G 2 = I, G 3 = shift right and up z=z= x l

Example: Simulation     l =3 c =2 G 1 = shift left and up, G 2 = I, G 3 = shift right and up z=z= x

Example: Simulation     l =3 c =2 G 1 = shift left and up, G 2 = I, G 3 = shift right and up z=z= x=x=

A Tough Toy Problem 4 different shapes 25 possible locations cluttered background fixed distraction 100 “clusters” 200 training cases

Mixture of Gaussians Mean and first 5 principal components Transformed Mixture of Gaussians 5 horiz shifts + 5 vert shifts 20 iterations of EM

Face Clustering Examples of 400 outdoor images of 2 people (44 x 28 pixels)

Mixture of Gaussians 15 iterations of EM (MATLAB takes 1 minute) Cluster means c = 1 c = 2 c = 3 c = 4

Transformed mixture of Gaussians 11 horizontal shifts; 11 vertical shifts 4 clusters Each cluster has 1 mean and 1 variance for each latent pixel 1 variance for each observed pixel Training: 15 iterations of EM (MATLAB script takes 10 sec/image)

Initialization Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians

1 iteration of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians

2 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians

Mixture of Gaussians 30 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4

Modeling Written Digits

A TMG that Captures Writing Angle P(l|x) identifies the writing angle in image x CLUSTERSCLUSTERS TRANSFORMATIONS

Wrap-up MATLAB scripts available at www.cs.uwaterloo.ca/~frey Other domains: audio, bioinformatics, … Other latent image models, p(z) –factor analysis (prob PCA) (ICCV99) –mixtures of factor analyzers (NIPS99) –time series (CVPR00) Automatic video clustering Fast variational inference and learning

Joint Estimation of Image Clusters and Image Transformations Brendan J. Frey Computer Science, University of Waterloo, Canada Beckman Institute and ECE,

Similar presentations

Presentation on theme: "Joint Estimation of Image Clusters and Image Transformations Brendan J. Frey Computer Science, University of Waterloo, Canada Beckman Institute and ECE,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Joint Estimation of Image Clusters and Image Transformations Brendan J. Frey Computer Science, University of Waterloo, Canada Beckman Institute and ECE,

Similar presentations

Presentation on theme: "Joint Estimation of Image Clusters and Image Transformations Brendan J. Frey Computer Science, University of Waterloo, Canada Beckman Institute and ECE,"— Presentation transcript:

Similar presentations

About project

Feedback