Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Bayesian inference of normal distribution
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Sampling: Final and Initial Sample Size Determination
Sampling Distributions (§ )
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Probability distribution functions Normal distribution Lognormal distribution Mean, median and mode Tails Extreme value distributions.
ELEC 303 – Random Signals Lecture 18 – Statistics, Confidence Intervals Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 10, 2009.
Estimation A major purpose of statistics is to estimate some characteristics of a population. Take a sample from the population under study and Compute.
SUMS OF RANDOM VARIABLES Changfei Chen. Sums of Random Variables Let be a sequence of random variables, and let be their sum:
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Presenting: Assaf Tzabari
Statistical Background
Computer vision: models, learning and inference
8-1 Introduction In the previous chapter we illustrated how a parameter can be estimated from sample data. However, it is important to understand how.
Standard error of estimate & Confidence interval.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
1 As we have seen in section 4 conditional probability density functions are useful to update the information about an event based on the knowledge about.
Exam I review Understanding the meaning of the terminology we use. Quick calculations that indicate understanding of the basis of methods. Many of the.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
1 G Lect 3b G Lecture 3b Why are means and variances so useful? Recap of random variables and expectations with examples Further consideration.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
Lab 3b: Distribution of the mean
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
8 Sampling Distribution of the Mean Chapter8 p Sampling Distributions Population mean and standard deviation,  and   unknown Maximal Likelihood.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Binomial probability estimation Playing chess against a friend you won 3 out of 5 matches and lost 2. Assuming that wins and losses follow the binomial.
Chapter 9: One- and Two-Sample Estimation Problems: 9.1 Introduction: · Suppose we have a population with some unknown parameter(s). Example: Normal( ,
- 1 - Matlab statistics fundamentals Normal distribution % basic functions mew=100; sig=10; x=90; normpdf(x,mew,sig) 1/sig/sqrt(2*pi)*exp(-(x-mew)^2/sig^2/2)
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Univariate Gaussian Case (Cont.)
Bayesian Approach Jake Blanchard Fall Introduction This is a methodology for combining observed data with expert judgment Treats all parameters.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Bayesian Inference: Multiple Parameters
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Univariate Gaussian Case (Cont.)
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Chapter 4 Continuous Random Variables and Probability Distributions
Model Inference and Averaging
Maximum likelihood estimation
Parameter, Statistic and Random Samples
CONCEPTS OF ESTIMATION
More about Posterior Distributions
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
LECTURE 09: BAYESIAN LEARNING
Parametric Methods Berlin Chen, 2005 References:
6.3 Sampling Distributions
11. Conditional Density Functions and Conditional Expected Values
11. Conditional Density Functions and Conditional Expected Values
Classical regression review
Presentation transcript:

Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference –Establish prior of  if any. –Establish likelihood of y conditional on  –Derive posterior distribution Posterior analysis & prediction –Analyze posterior distribution of . –Predict distribution of new y ~ based on posterior distribution . Bayesian updating –with new data, old posterior turned into prior, and repeat process. Prior distribution Likelihood function Observed data Posterior distribution

Bayesian inference for normal distribution Objective –Estimate unknown distribution parameters  of normal distribution based on observations y: mean  & var   (stdev . Cases in this lecture –Observations y that follows normal distribution. –Single observation y –Multiple observations y = {y 1, y 2, …} –Estimate mean  with known variance   –Estimate variance   with known mean 

Estimating mean with known variance from single datum Variance  2 known. Mean  unknown. Single observation y. Likelihood of y: No prior on  Then  ~ constant or just ignore. Posterior density: –With no prior, the distribution of the mean is same as sample distribution. Because of symmetry, no need to re-normalize. Practice –Generate one datum from  =90 with  =10. –Plot the shape of posterior pdf. Superpose original normal distribution.

Predictive distribution Posterior prediction –First term in integrand does not include y, because given theta, y does not matter. –See proof in Bayesian Data Analysis that –Variance is 2  2, which is sum of native variance and variance of mean. Stdev is √2 .

Estimating mean with known variance (conjugate prior) Conjugate prior is normal Posterior density works out to be –Posterior mean is weighted average of prior mean  0 and observed y with weights inversely proportional to the variance –If  0 → ∞ we get back  1 = y,  1 =  Similarly, predictive distribution

Multiple data Independent & identically distributed (iid) observations Posterior density (no prior) Posterior prediction

Practice Observations are 25 data of normal dist where y ̄ 2.9 and stdev  Plot posterior pdf of unknown population mean. What is the probability that the true mean >3 Superpose the original normal distribution, assuming y ̄ is the true mean. Plot cdfs of the two together.

Estimating variance with known mean  (multiple data) In case of no information, a reasonable prior is –Sample distribution (observation) where –Posterior distribution –Equivalent to –Chi square distribution is a standard distribution

Working with Chi square For the posterior pdf we have Obviously, P[  2 ≤ c] is same as P[z ≥ z c =n  c] so for CDF of posterior distribution for given  2 we take one minus the CDF of  2 at zc, in Matlab, 1-chi2cdf(zc,n) By differentiating the CDF, it is easy to show that the posterior pdf is given as Or in Matlab (z/sig2)*chi2pdf(z,n)

Homework Mean with known variance (no prior) generate n=25 samples of normal distribution with sample mean 3 and population stdev 0.2 (you can generate 25 samples and then shift and scale them to get the desired values of mean and standard deviation). 1)write the expression for the posterior distribution of the mean. 2)plot the posterior distribution of the mean using the pdf function and simulation with N=1e6 samples respectively. 3)calculate 2.5%, 97.5% percentiles of the unknown mean from the posterior distribution using the inv function and drawn samples respectively. Variance with known mean (non-informative prior) Use the same samples generated above. 1)write the expression for the posterior distribution of the variance. 2)plot the posterior distribution of the variance using the pdf function and simulation with N=1e6 respectively. 3)calculate 2.5%, 97.5% percentiles of the unknown variance from the posterior distribution using the inv function and drawn samples respectively.