Segmentation and Fitting Using Probabilistic Methods

Slides:



Advertisements
Similar presentations
Image Modeling & Segmentation
Advertisements

Mixture Models and the EM Algorithm
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
EM Algorithm Jur van den Berg.
Expectation Maximization
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Computer Vision - A Modern Approach Set: Probability in segmentation Slides by D.A. Forsyth Missing variable problems In many vision problems, if some.
Statistical Topic Modeling part 1
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 143, Brown James Hays 02/22/11 Many slides from Derek Hoiem.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/12.
Visual Recognition Tutorial
EE-148 Expectation Maximization Markus Weber 5/11/99.
Overview Full Bayesian Learning MAP learning
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
The EM algorithm (Part 1) LING 572 Fei Xia 02/23/06.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.
Clustering.
Expectation Maximization Algorithm
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Fitting. Choose a parametric object/some objects to represent a set of tokens Most interesting case is when criterion is not local –can’t tell whether.
Expectation-Maximization
Visual Recognition Tutorial
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Gaussian Mixture Models and Expectation Maximization.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Biointelligence Laboratory, Seoul National University
Model Inference and Averaging
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Mixture of Gaussians This is a probability distribution for random variables or N-D vectors such as… –intensity of an object in a gray scale image –color.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
HMM - Part 2 The EM algorithm Continuous density HMM.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Lecture 2: Statistical learning primer for biologists
Flat clustering approaches
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Chapter 3: Maximum-Likelihood Parameter Estimation
Mixtures of Gaussians and the EM Algorithm
Classification of unlabeled data:
Latent Variables, Mixture Models and EM
Expectation Maximization Mixture Models HMMs
Probabilistic Models with Latent Variables
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
EM for Inference in MV Data
Biointelligence Laboratory, Seoul National University
EM for Inference in MV Data
Presentation transcript:

Segmentation and Fitting Using Probabilistic Methods Or, How Expectation-Maximization Can Cure Your Computer Vision System of Almost Anything Well… maybe...

Departure Point Up to now, most of what we’ve done in the grouping, segmentation arena has been local. Now we want to model things globally, and in probabilistic terms. Explain a large collection of tokens with a few parameters. (Hmmm…. Like the Hough?)

Missing Data Problems, Fitting, Segmentation Often, if some parameters were known, the maximum likelihood problem would be easy Fitting: If you know which line each token comes from, getting the parameters is easy Segmentation: If you the segment each pixel comes from, the segment’s parameters are easily determined Fundamental Matrix: If you know the correspondences….

Missing Data Problem A missing data problem is one where… Some terms in a data vector are missing in some instances, but present in others An inference problem can be made simpler by rewriting it using some variables whose values are unknown Algorithm Concept: Take an expectation over the missing data

Missing Data Problems Strategy For example Estimate values for the missing data Plug these in, now estimate parameters Re-estimate values for missing data Continue to convergence For example Guess a mapping of points to lines Fit each line to its points Reallocate points to the fitted lines Loop to convergence Reminiscent of K-means, is it not?

Refining the Strategy The problem has parameters to be estimated, and missing variables (data) Iterate to convergence: Replace missing data with expected values, given fixed parameter values Fix the missing data, do a maximium likelihood estimate of the parameters, given that data

Refining the Example Allocate each point to a line with a weight equal to the probability of the point, given the line’s parameters Refit the lines to the weighted set of points Converges to local extremum (caution) Can be generalized…

Image Segmentation pl: Probability of choosing segment l at random (a priori) p(x|ql): Conditional density of feature vector x, given that it comes from segment l, l=1,…g Model: p(x|ql) is Gaussian, ql=(ml,Sl) The total density for the feature vector of any pixel drawn at random… Segment 1, q1 Segment 2, q2 Segment 3, q3 Segment 4, q4 This is known as a Mixture Model

Mixture Model: Generative To produce a pixel (feature vector) Pick an image segment l with prior probability pl Draw a sample from p(x|ql) Density in x space is a set of g Gaussian blobs, one per segment We want to determine The parameters of each blob (the m and S values) The mixing weights (the p values) A mapping of pixels to components (the segmentation)

Package all these things into a parameter vector: mixing weights blob parameters The mixture model becomes: With each component a multivariate Gaussian:

The Chicken and the Egg If we knew which pixel belonged to which component, Q would be straightforward: Use Max Likelihood estimates for each ql Fraction of image in each component gives al If we knew Q, then For each pixel, assign it to its most likely blob Unfortunately, we know neither That’s where Expectation-Maximization (EM) comes in; iterate guesses until convergence

Formal Statement of Missing Data Problems X Complete data space f Y Incomplete data space Measurements at each pixel and Set of variables matching pixels to mixture components Measurements at each token Mapping of tokens to lines Measurements at each pixel Measurements at each token

Missing, Formally Mixing weights and Parameters (mean, covariance) of each mixture component (parameters of each line) U Parameter space We want to obtain a maximum-likelihood estimate of these parameters given incomplete data. If we had complete data, the we could use the joint density function for the complete data space, pc(x;u). Complete data log-likelihood:

OK. We maximize this to estimate each segment’s parameters (image segmentation) or the mixing weights and parameters of the lines, given the mapping of the tokens to lines (for the line fitting example). Problem. We don’t have complete data. The density for the incomplete space is the marginal density of the complete space where we’ve integrated out the parameters we don’t know.

This is a pain in the neck… We don’t know which of the many possible x values that could correspond to the y values we observe are correct. We’ve taken a projection (of some sort), and we cannot uniquely reconstruct the full joint density. So we have to average over all those possibilities to make our best guess. But all is not lost… We have the following strategy: 1. Obtain some estimate of the missing data using a guess at the parameters. 2. Form a maximum likelihood estimate of the free parameters using the estimate of the missing data. 3. Iterate to (hopefully) convergence.

Strategy by Example Image segmentation Tokens and lines Obtain an estimate of the component from which each pixel comes using an estimate of the ql Update the ql and the mixing weights using this estimate Tokens and lines Obtain an estimate of the correspondence between tokens and lines, using a guess at the line parameters Revise the estimate of the line parameters using the estimated correspondences

Expectation-Maximization For Mixture Models Assume the complete log-likelihood is linear in the missing variables. (Common) Mixture model: Missing data indicate the mixture component from which a data item is drawn. Represent this by associating with each data point a bit vector z of g elements (one per component in the mix).

About the z Vectors (matrix) Mixture components, one Gaussian per column l Data points, one per row. That is, one row per observation, each row a z vector. j 1 if pixel (token) j produced by Gaussian mixture component l. Expectation: Probability of that event. g n

So our complete information can be written as: Write the mixture model as (line example): Complete data log-likelihood is: This is linear in the missing variables. Good news! How did we ensure that that would happen? We will think of the entries in z as probabilities, expectations.

EM: The Key Idea Obtain working values for the missing data, and so for x by substituting the expectation for each missing value. That is, fix the parameters, then compute each expectation E[zjl], given yj and the parameter values. Plug E[zjl] into the complete data log-likelihood and find parameters maxing that. E[zjl] has probably changed, so repeat.

More Formally Given us we form us+1 by: 1. E-Step: Compute expected value for complete data using the incomplete data and the current parameter estimates. We know the expected value of yj (the means of the current Gaussian guesses) and only need expected value of zj for each j. Denote these values as . Superscript indicates that the expectation depends on current parameter values at step s. 2. M-Step: Maximize the complete data log-likelihood with respect to u using the expectation from the E-step.

Image Segmentation In Practice (Warning: Your text is a typo minefield) Set up an n by g array of indicators I (Each row like z vector) E-Step: The j, l element of I is 1 if pixel j comes from blob l E(Ijl)= Prob (pixel j comes from Gaussian blob l) Note: This is no longer a binary value! ~ b/(a+b) a b x

Practice… M-Step: Now form a maximum-likelihood estimate of Qs+1 … average value in each column … weighted average feature vector for each column … weighted average covariance matrix for each column

When it Converges... Can make a maximum a posteriori (MAP) decision by assigning each pixel to the Gaussian for which it has the highest E(Ijl). Can also keep the probabilities and work with them in, for instance, a probabilistic relaxation framework. (coming attractions)