1 On the Statistical Analysis of Dirty Pictures Julian Besag.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Pattern Recognition and Machine Learning
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS479/679 Pattern Recognition Dr. George Bebis
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Chapter 4: Linear Models for Classification
Visual Recognition Tutorial
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.
An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell
1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.
1 Bayesian Restoration Using a New Nonstationary Edge-Preserving Image Prior Giannis K. Chantas, Nikolaos P. Galatsanos, and Aristidis C. Likas IEEE Transactions.
Today Logistic Regression Decision Trees Redux Graphical Models
Thanks to Nir Friedman, HU
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Crash Course on Machine Learning
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Presenter : Kuang-Jui Hsu Date : 2011/5/23(Tues.).
1 Physical Fluctuomatics 5th and 6th Probabilistic information processing by Gaussian graphical model Kazuyuki Tanaka Graduate School of Information Sciences,
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Fields of Experts: A Framework for Learning Image Priors (Mon) Young Ki Baik, Computer Vision Lab.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Mixture of Gaussians This is a probability distribution for random variables or N-D vectors such as… –intensity of an object in a gray scale image –color.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Lecture 2: Statistical learning primer for biologists
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Tracking with dynamics
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Univariate Gaussian Case (Cont.)
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Biointelligence Laboratory, Seoul National University
Univariate Gaussian Case (Cont.)
CS479/679 Pattern Recognition Dr. George Bebis
Chapter 3: Maximum-Likelihood Parameter Estimation
CONTEXTUAL CLUSTERING FOR IMAGE SEGMENTATION
Deep Feedforward Networks
LECTURE 11: Advanced Discriminant Analysis
Classification of unlabeled data:
Statistical Models for Automatic Speech Recognition
Special Topics In Scientific Computing
Lecture 26: Faces and probabilities
Learning Markov Networks
Statistical Models for Automatic Speech Recognition
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
LECTURE 09: BAYESIAN LEARNING
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
EM Algorithm and its Applications
Probabilistic Surrogate Models
Presentation transcript:

1 On the Statistical Analysis of Dirty Pictures Julian Besag

2 Image Processing Required in a very wide range of practical problems  Computer vision  Computer tomography  Agriculture  Many more… Picture acquisition techniques are noisy

3 Problem Statement Given a noisy picture And 2 source of information (assumptions)  A multivariate record for each pixel  Pixels close together tend to be alike Reconstruct the true scene

4 Notation S – 2D region, partitioned into pixels numbered 1…n x = (x 1, x 2, …, x n ) – a coloring of S x* (realization of X ) – true coloring of S y = (y 1, y 2, …, y n ) (realization of Y ) – observed pixel color

5 Assumption #1 Given a scene x, the random variables Y 1, Y 2, …, Y n are conditionally independent and each Y i has the same known conditional density function f(y i |x i ), dependent only on x i. Probability of correct acquisition

6 Assumption #2 The true coloring x* is a realization of a locally dependant Markov random field with specified distribution {p(x)}

7 Locally Dependent M.r.f.s Generally, the conditional distribution of pixel i depends on all other pixels, {S\i} We are only concerned with local dependencies

8 Previous Methodology Maximum Probability Estimation Classification by Maximum Marginal Probabilities

9 Maximum Probability Estimation Chose an estimate x such that it will have the maximum probability given a record vector y. In Bayesian framework x is MAP estimate In decision theory – 0-1 loss function

10 Maximum Probability Estimation Iterate over each pixel Chose color x i at pixel i from probability Slowly decreasing T will guarantee convergence

11 Classification by Maximum Marginal Probabilities Maximize the proportion of correctly classified pixels Note that P(x i | y) depends on all records Another proposal: use a small neighborhood for maximization  Still computationally hard because P is not available in closed form

12 Problems Large scale effects  Favors scenes of single color Computationally expensive

13 Estimation by Iterated Conditional Modes The previously discussed methods have enormous computational demands, and undesirable large-scale properties. We want a faster method with good large- scale properties.

14 Iterated Conditional Modes When applied to each pixel in turn, this procedure defines a single cycle of an iterative algorithm for estimating x*

15 Examples of ICM Each example involves:  c unordered colors  Neighborhood is 8 surrounding pixels  A known scene x*  At each pixel i, a record y i is generated from a Gaussian distribution with mean and variance κ.

16 The hillclimbing update step

17 Extremes of β β = 0 gives the maximum likelihood classifier, with which ICM is initialized β = ∞, x i is determined by a majority vote of its neighbors, with y i records only used to break ties.

18 Example 1 6 cycles of ICM were applied, with β = 1.5

19 Example 2 Hand-drawn to display a wide array of features y i records were generated by superimposing independent Gaussian noise, √κ =.6 8 cycles, β increasing from.5 to 1.5 over the 1 st 6

20 Models for the true scene Most of the material here is speculative, a topic for future research There are many kinds of images possessing special structures in the true scene. What we have seen so far in the examples are discrete ordered colors.

21 Examples of special types of images Unordered colors  These are generally codes for some other attribute, such as crop identities Excluded adjacencies  It may be known that certain colors cannot appear on neighboring pixels in the true scene.

22 More special cases… Grey-level scenes  Colors may have a natural ordering, such as intensity. The authors did not have the computing equipment to process, display, and experiment with 256 grey levels. Continuous intensities  {p(x)} is a Gaussian M.r.f. with zero mean

23 More special cases… Special features, such as thin lines  Author had some success reproducing hedges and roads in radar images. Pixel overlap

24 Parameter Estimation This may be computationally expensive This is often unnecessary We may need to estimate θ in l(y|x; θ)  Learn how records result from true scenes. And we may need to estimate Φ in p(x;Φ)  Learn probabilities of true scenes.

25 Parameter Estimation, cont. Estimation from training data Estimation during ICM

26 Example of Parameter Estimation Records produced with Gaussian noise, κ =.36 Correct value of κ, gradually increasing β gives 1.2% error Estimating β = 1.83 and κ =.366 gives 1.2% error κ known but β = 1.8 estimated gives 1.1% error

27 Block reconstruction Suppose the Bs form 2x2 blocks of four, with overlap between blocks At each stage, the block in question must be assigned one of c 4 colorings, based on 4 records, and 26 direct and diagonal adjacencies:

28 Block reconstruction example Univariate Gaussian records with κ =.9105 Basic ICM with β = 1.5 gives 9% error rate ICM with β = ∞ estimated gives 5.7% error

29 Conclusion We began by adopting a strict probabilistic formulation with regard to the true scene and generated records. We then abandoned these in favor of ICM, on grounds of computation and to avoid unwelcome large-scale effects. There is a vast number of problems in image processing and pattern recognition to which statisticians might usefully contribute.