Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Bayes rule, priors and maximum a posteriori
Basics of Statistical Estimation
A Tutorial on Learning with Bayesian Networks
Probabilistic models Haixu Tang School of Informatics.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Visual Recognition Tutorial
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Probabilistic Graphical Models Tool for representing complex systems and performing sophisticated reasoning tasks Fundamental notion: Modularity Complex.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Presenting: Assaf Tzabari
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Maximum Likelihood (ML), Expectation Maximization (EM)
Visual Recognition Tutorial
Computer vision: models, learning and inference
Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1.
Thanks to Nir Friedman, HU
Learning Bayesian Networks (From David Heckerman’s tutorial)
Review of Lecture Two Linear Regression Normal Equation
Crash Course on Machine Learning
Recitation 1 Probability Review
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Statistical Decision Theory
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Statistical Learning (From data to distributions).
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Problem: 1) Show that is a set of sufficient statistics 2) Being location and scale parameters, take as (improper) prior and show that inferences on ……
Lecture 2: Statistical learning primer for biologists
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Machine Learning 5. Parametric Methods.
1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Ch 2. Probability Distribution Pattern Recognition and Machine Learning, C. M. Bishop, Update by B.-H. Kim Summarized by M.H. Kim Biointelligence.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Oliver Schulte Machine Learning 726
Bayesian Estimation and Confidence Intervals
Probability Theory and Parameter Estimation I
CS 2750: Machine Learning Density Estimation
Ch3: Model Building through Regression
CS 2750: Machine Learning Probability Review Density Estimation
Bayes Net Learning: Bayesian Approaches
Maximum Likelihood Estimation
Distributions and Concepts in Probability Theory
OVERVIEW OF BAYESIAN INFERENCE: PART 1
Statistical NLP: Lecture 4
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
Mathematical Foundations of BME Reza Shadmehr
Applied Statistics and Probability for Engineers
Presentation transcript:

Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013

Outline Probability distributions Maximum likelihood estimation Maximum a posteriori estimation Conjugate priors Conceptualizing models as collection of priors Noninformative priors Empirical Bayes

Probability distribution Density estimation – to model distribution p(x) of a random variable x given a finite set of observations x 1, …, x N. Nonparametric approachParametric approach Histogram Kernel density estimation Nearest neighbor approach Gaussian distribution Beta distribution …

The Exponential Family Gaussian distribution Binomial distribution Beta distribution etc…

Gaussian distribution Central limit theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with a well-defined mean and well-defined variance, will be approximately normally distributed Bean machine by Sir Francis Galton

Maximum likelihood estimation The frequentist approach to estimate parameters of the distribution given a set of observations is to maximize likelihood. – data are i.i.d – monotonic transformation

MLE for Gaussian distribution – simple average

Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information.

MAP for Gaussian distribution Posterior distribution is given by – weighted average

Conjugate prior In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. For any member of the exponential family, there exists a conjugate prior that can be written in the form Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma

MLE for Binomial distribution Binomial distribution models the probability of m “heads” out of N tosses. The only parameter of the distribution μ encodes probability of a single event (“head”) Maximum likelihood estimation is given by

MAP for Binomial distribution The conjugate prior for this distribution is Beta The posterior is then given by where l = N – m, simply the number of “tails”.

Models as collection of priors - 1 Take a simple regression model Add a prior on weights And get Bayesian linear regression!

Models as collection of priors - 2 Take again a simple regression model Add a prior on function And get Gaussian processes ! ynyn β ynyn β K Where y n is some function of x n

Models as collection of priors - 3 Take a model where x n is discrete and unknown Add a prior on states ( x n ), assuming they are temporarily smooth And get Hidden Markov Model! θ x1x1 x2x2 x n-1 xnxn x n+1 t1t1 tntn t2t2 t n-1 t n+1

Noninformative priors Sometimes we have no strong prior belief but still want to apply Bayesian inference. Then we need noninformative priors. If our parameter λ is a discrete variable with K states then we can simply set each prior probability to 1/K. However for continues variables it is not so clear. One example of a noninformative prior could be a noninformative prior over μ for Gaussian distribution: with We can see that the effect of the prior on the posterior over μ is vanished in this case.

Empirical Bayes But what if still want to assume some prior information but want to learn it from the data instead of assuming in advance? Imagine the following model We cannot use full Bayesian inference but we can approximate it by finding the best λ * to maximize p(X|λ) N θsθs xnxn S λ

We can estimate the result by the following iterative procedure (EM-algorithm): Initialize λ * E-step: M-step: It illustrates the other term for Empirical Bayes – maximum marginal likelihood. This is not fully Bayesian treatment however offers a useful compromise between Bayesian and frequentist approaches. Empirical Bayes Compute p(θ|X, λ) given fixed λ *

Thank you for your attention!