More about Posterior Distributions

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Hypothesis testing Another judgment method of sampling data.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Statistical Estimation and Sampling Distributions
1 Parametric Sensitivity Analysis For Cancer Survival Models Using Large- Sample Normal Approximations To The Bayesian Posterior Distribution Gordon B.
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Statistical inference form observational data Parameter estimation: Method of moments Use the data you have to calculate first and second moment To fit.
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Computer vision: models, learning and inference
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Bayesian Inference, Basics Professor Wei Zhu 1. Bayes Theorem Bayesian statistics named after Thomas Bayes ( ) -- an English statistician, philosopher.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Week 71 Hypothesis Testing Suppose that we want to assess the evidence in the observed data, concerning the hypothesis. There are two approaches to assessing.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Week 41 Estimation – Posterior mean An alternative estimate to the posterior mode is the posterior mean. It is given by E(θ | s), whenever it exists. This.
11 Confidence Intervals – Introduction A point estimate provides no information about the precision and reliability of estimation. For example, the sample.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
1 Standard error Estimated standard error,s,. 2 Example 1 While measuring the thermal conductivity of Armco iron, using a temperature of 100F and a power.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Week 41 How to find estimators? There are two main methods for finding estimators: 1) Method of moments. 2) The method of Maximum likelihood. Sometimes.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
Latent regression models. Where does the probability come from? Why isn’t the model deterministic. Each item tests something unique – We are interested.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Bayesian Approach Jake Blanchard Fall Introduction This is a methodology for combining observed data with expert judgment Treats all parameters.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Conditional Expectation
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Oliver Schulte Machine Learning 726
Standard Errors Beside reporting a value of a point estimate we should consider some indication of its precision. For this we usually quote standard error.
Bayesian Estimation and Confidence Intervals
Probability Theory and Parameter Estimation I
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Model Inference and Averaging
Bayes Net Learning: Bayesian Approaches
Maximum Likelihood Estimation
Oliver Schulte Machine Learning 726
Location-Scale Normal Model
Bayesian Inference, Basics
Statistical Assumptions for SLR
EC 331 The Theory of and applications of Maximum Likelihood Method
Example Human males have one X-chromosome and one Y-chromosome,
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Mathematical Foundations of BME
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Mathematical Foundations of BME Reza Shadmehr
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Estimation – Posterior intervals
Applied Statistics and Probability for Engineers
Presentation transcript:

More about Posterior Distributions The process of Bayesian inference involves passing from a prior distribution to a posterior distribution. It is natural to expect that some general relations might hold between these two distributions. For example, since the posterior distribution incorporates the information from the data, it will be less variable than the prior distribution. week 3

Recall – Conditional Expectation The theorem of total expectation states the following week 3

Claim The posterior variance is on average smaller than the prior variance, that is, Proof:… week 3

Example In the location normal model, the posterior variance is smaller than the variance of the sample mean. Consider the following numerical example… week 3

Inference Based on the Posterior As we have seen, the principle of conditional probability implies that the posterior distribution contains all the relevant information about the unknown parameter in the sampling model, the prior and the data. We now proceed to make inference about the unknown parameter or some other characteristic of interest that is a function of it using the posterior distribution. In particular we will specify how to compute estimates, credible regions and carry out hypothesis assessment. week 3

Estimation – Posterior mode Suppose we want to calculate an estimate for the parameter of interest θ based on its posterior distribution. There are several different approaches to this problem. One of the most natural estimate is the posterior mode . It is the point where the posterior probability or density function of θ takes its maximum. In the discrete case, it is the value that has the highest posterior probability. In the continuous case, it is the value that has the highest amount of posterior probability in short interval containing it. To calculate the posterior mode we need to maximize as a function of θ. Note that this is equivalent to maximizing so that we do not need to compute the inverse normalizing constant to implement this. week 3

Example: Bernoulli Model Suppose we observe a sample from the Bernoulli(θ) distribution with unknown and we place the Beta(α, β) prior on θ. We already determined that the posterior distribution of θ is the distribution. The posterior density is then… So we need to maximize ….. week 3

Next we need to check the second derivative… Taking first derivative of the above function, setting it equal to 0 and solving gives the solution ... Next we need to check the second derivative… Now, if α ≥ 1, β ≥ 1, we see that the second derivative is always negative and so is the unique posterior mode. This restriction on the choice of α and β implies that the prior has a mode in (0,1) rather than at 0 or 1. Note that when α = β = 1, namely, if we put a uniform prior on θ, the posterior mode is and this is the same as the maximum likelihood estimates (MLE). week 3