Bayesian Inference: Multiple Parameters

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Bayesian inference of normal distribution
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Bayesian Estimation in MARK
By C. Yeshwanth and Mohit Gupta.  An inference method that uses Bayes’ rule to update prior beliefs based on data  Allows a-priori information about.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Estimation A major purpose of statistics is to estimate some characteristics of a population. Take a sample from the population under study and Compute.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Presenting: Assaf Tzabari
Computer vision: models, learning and inference
Computer vision: models, learning and inference Chapter 3 Common probability distributions.
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
Chapter Two Probability Distributions: Discrete Variables
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
PATTERN RECOGNITION AND MACHINE LEARNING
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
1 A Bayesian statistical method for particle identification in shower counters IX International Workshop on Advanced Computing and Analysis Techniques.
Sample variance and sample error We learned recently how to determine the sample variance using the sample mean. How do we translate this to an unbiased.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
1 Bayesian Essentials Slides by Peter Rossi and David Madigan.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
Chapter 11: Linear Regression and Correlation Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables.
The Uniform Prior and the Laplace Correction Supplemental Material not on exam.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Prediction and Missing Data. Summarising Distributions ● Models are often large and complex ● Often only interested in some parameters – e.g. not so interested.
MLPR - Questions. Can you go through integration, differentiation etc. Why do we need priors? Difference between prior and posterior. What does Bayesian.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Oliver Schulte Machine Learning 726
Regression Analysis AGEC 784.
Probability Theory and Parameter Estimation I
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Ch3: Model Building through Regression
Chapter Six Normal Curves and Sampling Probability Distributions
Set-up of the Bayesian Regression Model
Bayes Net Learning: Bayesian Approaches
Computer vision: models, learning and inference
Special Topics In Scientific Computing
CONCEPTS OF ESTIMATION
OVERVIEW OF BAYESIAN INFERENCE: PART 1
Location-Scale Normal Model
More about Posterior Distributions
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
LECTURE 07: BAYESIAN ESTIMATION
Mathematical Foundations of BME Reza Shadmehr
Statistical Inference for the Mean: t-test
Applied Statistics and Probability for Engineers
Moments of Random Variables
Classical regression review
Presentation transcript:

Bayesian Inference: Multiple Parameters

More Parameters Most statistical models have more than 1 parameter can have thousands! We need to know how to deal with many parameters actually a strength of the Bayesian approach It turns out to be simple in practice, but we need some maths first.

A Simple Example Estimating the mean and variance of a normal distribution Some data (heights in cm): 160, 170, 167, 162, 170, 171, 164, 175, 177, 178, 184, 174, 176, 186, 189, 197 Fit a Normal distribution 2 parameters, mean (m) and variance (s2) Likelihood: 𝑃𝑦∣,  2 = 1 2  2 exp − 1 2  2 y− 2 

The Priors First, we will use a prior distribution that is uniform prior on m and log(s) (for reasons that will become clear). The prior is thus 𝑃,  2 ∝   2  −1 This is not a proper probability distribution the area under the curve is infinite! Can prove this still leads to a proper posterior Later we will use a different prior

The Posterior The posterior distribution for n observations is: 𝑃 μ, σ 2 ∣𝑦 ∝𝑃 μ, σ 2 𝑃 𝑦∣μ, σ 2 𝑃 μ, σ 2 ∣𝑥 = 1 σ 2 𝑖=1 𝑛 1 2π σ 2 exp − 1 2 σ 2 y 𝑖 −μ 2  𝑃 μ, σ 2 ∣𝑥 ∝ 1 σ 𝑛+2 exp − 1 2 σ 2 𝑖=1 𝑛 y 𝑖 −μ 2  This is the joint posterior for the parameters So, now we have an equation, what do we do with it?

Joint Distributions From the definition of a conditional distribution: 𝑃 𝑋 1 , 𝑋 2 =𝑃 𝑋 1 ∣𝑋2𝑃 𝑋 2  We can read this as slices of conditional distributions, weighted by the size of the slice

Marginal Distributions What if we want the distribution of one variable? We take the marginal distribution: 𝑃 𝑋 1 = 𝑃  𝑋 1 ∣ 𝑋 2 𝑃 𝑋 2  𝑑𝑋 2 We just sum up the slices but we need the marginal for X2

The Normal Distribution We can get the marginal posteriors for m and s2 by some maths: 𝑃,  2 ∣𝑦∝ 1  𝑛2 exp − 1 2  2 𝑖=1 𝑛  y 𝑖 − 2  𝑃,  2 ∣𝑥 = 1  𝑛2 exp − 1 2  2 𝑛−1 𝑠 2 𝑛  𝑦  − 2  y and s2 are the sample means and variances: 𝑦  = 𝑦 𝑖 𝑛 𝑠 2 = 1 𝑛−1  𝑦 𝑖 − 𝑦   2

Conditional Posterior for m If we remove the constants that do not depend on m then we get 𝑃,∣  2, 𝑦∝exp − 𝑛 2  2  𝑦  − 2  Conveniently, this is just a Normal distribution. So 𝑃,∣  2 𝑦~𝑁 𝑦  ,  2 𝑛  Which still depends on s2

Marginal Posterior for s2 For s2, we want to calculate 𝑃  2 ∣𝑦∝ 1  𝑛2 exp − 1 2  2 𝑛−1 𝑠 2 𝑛  𝑦  − 2  𝑑 This is easier than it looks. We get: 𝑃  2 ∣𝑦∝  −𝑛1 2 exp − 𝑛−1 𝑠 2 2  2  Which is an inverse gamma distribution also known as a scaled inverse chi-squared

Marginal for m Now we want Which, with some maths, becomes 𝑃∣𝑦∝ 𝑃 ,  2 ∣𝑦𝑑  2 Which, with some maths, becomes 𝑃∣𝑦∝ 1 𝑛 − 𝑦   2 𝑛−1 𝑠 2 −𝑛 2 This is just a t distribution, with n-1 df! − 𝑦  𝑠 𝑛 ~ 𝑡 𝑛−1

Prediction What if we want to predict a new value? We need 𝑃 𝑦 𝑛𝑒𝑤 ∣𝑦= 𝑃  𝑦 𝑛𝑒𝑤 ∣,  2 𝑃,  2 ∣𝑦𝑑  2 𝑑 Which is also a t-distribution: 𝑦 𝑛𝑒𝑤 ~ 𝑡 𝑛−1  𝑦  ,1 1 𝑛  𝑠 2 

Summary We have the following posterior distributions: Conditional for m m ~ N(y, s2) Marginal for m m ~ tn-1(y, s2/n) Marginal for s2 s2 ~ c-2 (n-1, s2) Predictive ynew ~ tn-1(y, (1+1/n)s2)

Simulation It is often easier to deal with these distributions by simulation sometimes easier to see what is going on Most real work is done by simulation Draw the parameters many times from the right distribution The easy one: Conditional for ms | s2 Simulate from a N(y, s2)

Simulation Marginal for m Marginal for ss2 Predictive for ynew Draw ss-2 from a c2 (n-1, s2) Draw ms from a N(y, ss-2) Marginal for ss2 Draw ss-2 from a c2(n-1, s2), then take inverse Predictive for ynew Draw ss-2 and ms as above Draw ynew from a N(ms, ss-2)

Different Priors What if we want informative priors e.g. if we have information to use? One possible set of priors: m|s2 ~ N(m0, s02/k0) s2 ~ Inv-c2(u0, s02) Why these priors? Because the posterior has the same distribution conjugate

Different Posteriors m|y,s2 ~ N(mn, s2/kn) s2|y ~ Inv-c2(nn, sn2)  𝑛 =  0  0 𝑛 𝑦   0 𝑛  𝑛 =  0 𝑛 s2|y ~ Inv-c2(nn, sn2)  𝑛  𝑛 2 =  0  0 2 𝑛−1 𝑠 2   0 𝑛  0 𝑛  𝑦  −  0  2  𝑛 =  0 𝑛 m|y ~ tnn(mn, sn2/kn)

What do we want? The basic calculations are as outlined above From the joint posterior, we calculate the marginal distributions can calculate a joint distribution for a subset of the parameters by marginalising over the rest For real models the calculations get difficult Instead, we use simulation makes things easier

Marginalisation by Simulation To simulate a conditional distribution, we plug in the parameters we are conditioning on: Simulate by plugging in y and s2/n To simulate the marginal, we use the relationship 𝑃 μ∣ σ 2 𝑦 ∼𝑁 𝑦 , σ 2 𝑛 𝑃 𝑋 1 = 𝑃  𝑋 1 ∣ 𝑋 2 𝑃 𝑋 2  𝑑𝑋 2 If we can do P(X1|X2) and P(X2), then we just draw X2, then X1|X2 and repeat this many times we just ignore X2