LECTURE 07: BAYESIAN ESTIMATION

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS479/679 Pattern Recognition Dr. George Bebis
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Visual Recognition Tutorial
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
Visual Recognition Tutorial
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Computer vision: models, learning and inference
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Introduction to Bayesian Parameter Estimation
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Chapter 3 (part 2): Maximum-Likelihood and Bayesian Parameter Estimation Bayesian Estimation (BE) Bayesian Estimation (BE) Bayesian Parameter Estimation:
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
INTRODUCTION TO Machine Learning 3rd Edition
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
1 Parameter Estimation Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia,
Machine Learning 5. Parametric Methods.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Univariate Gaussian Case (Cont.)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Applied statistics Usman Roshan.
Univariate Gaussian Case (Cont.)
Lecture 1.31 Criteria for optimal reception of radio signals.
CS479/679 Pattern Recognition Dr. George Bebis
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Ch3: Model Building through Regression
Parameter Estimation 主講人:虞台文.
Pattern Classification, Chapter 3
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Outline Parameter estimation – continued Non-parametric methods.
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
LECTURE 09: BAYESIAN LEARNING
Parametric Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Learning From Observed Data
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

LECTURE 07: BAYESIAN ESTIMATION • Objectives: Bayesian Estimation Example Resources: D.H.S.: Chapter 3 (Part 2) J.O.S.: Bayesian Parameter Estimation A.K.: The Holy Trinity A.E.: Bayesian Methods J.H.: Euro Coin

Introduction to Bayesian Parameter Estimation In Chapter 2, we learned how to design an optimal classifier if we knew the prior probabilities, P(ωi), and class-conditional densities, p(x|ωi). Bayes: treat the parameters as random variables having some known prior distribution. Observations of samples converts this to a posterior. Bayesian learning: sharpen the a posteriori density causing it to peak near the true value. Supervised vs. unsupervised: do we know the class assignments of the training data. Bayesian estimation and ML estimation produce very similar results in many cases. Reduces statistical inference (prior knowledge or beliefs about the world) to probabilities.

Class-Conditional Densities Posterior probabilities, P(ωi|x), are central to Bayesian classification. Bayes formula allows us to compute P(ωi|x) from the priors, P(ωi), and the likelihood, p(x|ωi). But what If the priors and class-conditional densities are unknown? The answer is that we can compute the posterior, P(ωi|x), using all of the information at our disposal (e.g., training data). For a training set, D, Bayes formula becomes: We assume priors are known: P(ωi|D) = P(ωi). Also, assume functional independence: Di have no influence on This gives:

The Parameter Distribution Assume the parametric form of the evidence, p(x), is known: p(x|θ). Any information we have about θ prior to collecting samples is contained in a known prior density p(θ). Observation of samples converts this to a posterior, p(θ|D), which we hope is peaked around the true value of θ. Our goal is to estimate a parameter vector: We can write the joint distribution as a product: because the samples are drawn independently. This equation links the class-conditional density to the posterior, . But numerical solutions are typically required!

Univariate Gaussian Case Case: only mean unknown Known prior density: Using Bayes formula: Rationale: Once a value of μ is known, the density for x is completely known. α is a normalization factor that depends on the data, D.

Univariate Gaussian Case Applying our Gaussian assumptions:

Univariate Gaussian Case (Cont.) Now we need to work this into a simpler form:

Univariate Gaussian Case (Cont.) p(μ|D) is an exponential of a quadratic function, which makes it a normal distribution. Because this is true for any n, it is referred to as a reproducing density. p(μ) is referred to as a conjugate prior. Write p(μ|D) ~ N(μn,σn2): Expand the quadratic term: Equate coefficients of our two functions:

Univariate Gaussian Case (Cont.) Rearrange terms so that the dependencies on μ are clear: Associate terms related to σ2 and μ: There is actually a third equation involving terms not related to : but we can ignore this since it is not a function of μ and is a complicated equation to solve.

Univariate Gaussian Case (Cont.) Two equations and two unknowns. Solve for μn and σn2. First, solve for μn2 : Next, solve for μn: Summarizing:

Summary Introduction of Bayesian parameter estimation. The role of the class-conditional distribution in a Bayesian estimate. Estimation of the posterior and probability density function assuming the only unknown parameter is the mean, and the conditional density of the “features” given the mean, p(x|θ), can be modeled as a Gaussian distribution. Bayesian estimates of the mean for the multivariate Gaussian case. General theory for Bayesian estimation. Comparison to maximum likelihood estimates. Recursive Bayesian incremental learning. Noninformative priors. Sufficient statistics Kernel density.

“The Euro Coin” Getting ahead a bit, let’s see how we can put these ideas to work on a simple example due to David MacKay, and explained by Jon Hamaker.