Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS479/679 Pattern Recognition Dr. George Bebis
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
A Brief Introduction to Bayesian Inference Robert Van Dine 1.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
Visual Recognition Tutorial
An Introduction to Bayesian Inference Michael Betancourt April 8,
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Presenting: Assaf Tzabari
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Computer vision: models, learning and inference
Thanks to Nir Friedman, HU
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Data analysis and uncertainty
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Statistical Decision Theory
Model Inference and Averaging
Bayesian Inference, Basics Professor Wei Zhu 1. Bayes Theorem Bayesian statistics named after Thomas Bayes ( ) -- an English statistician, philosopher.
Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Bayesian Analysis and Applications of A Cure Rate Model.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Week 41 Estimation – Posterior mean An alternative estimate to the posterior mode is the posterior mean. It is given by E(θ | s), whenever it exists. This.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
1 Bayesian Essentials Slides by Peter Rossi and David Madigan.
Lecture 2: Statistical learning primer for biologists
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
ICS 280 Learning in Graphical Models
Ch3: Model Building through Regression
Bayes Net Learning: Bayesian Approaches
OVERVIEW OF BAYESIAN INFERENCE: PART 1
Location-Scale Normal Model
More about Posterior Distributions
Bayesian Inference, Basics
Example Human males have one X-chromosome and one Y-chromosome,
Bayes for Beginners Luca Chech and Jolanda Malamud
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
CS639: Data Management for Data Science
Mathematical Foundations of BME Reza Shadmehr
Applied Statistics and Probability for Engineers
Presentation transcript:

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Bayes Theorem Recall basic axiom of probability: – f( ,y) = f(y|  ) f(  ) Also – f( ,y) = f(  |y) f(y) Combine both expressions to get: or Posterior  Likelihood * Prior 2

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Prior densities/distributions What can we specify for ? – Anything that reflects our prior beliefs. – Common choice: “conjugate” prior. is chosen such that is recognizeable and of same form. – “Flat” prior:. Then – flat priors can be dangerous…can lead to improper ; i.e. 3

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Prior information / Objective? Introducing prior information may somewhat "bias" sample information; nevertheless, ignoring existing prior information is inconsistent with – 1) human rational behavior – 2) nature of the scientific method. – Memory property: past inference (posterior) can be used as updated prior in future inference. Nevertheless, many applied Bayesian data analysts try to be as “objective” as possible using diffuse (e.g., flat) priors. 4

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Example of conjugate prior Recall the binomial distribution: Suppose we express prior belief on p using a beta distribution: – Denoted as Beta( ,  ) 5

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Examples of different beta densities 6 Diffuse (flat) bounded prior (but it is proper since it is bounded!)

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Posterior density of p Posterior  Likelihood * Prior i.e. Beta(y+ ,n-y+  ) Beta is conjugate to the Binomial 7

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Suppose we observe data y = 10, n = 15. Consider three alternative priors: – Beta(1,1) – Beta(9,1) – Beta(2,18) Posterior densities: 8 Beta(y+ ,n-y+  )

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Suppose we observed a larger dataset y = 100, n = 150. Consider same alternative priors: – Beta(1,1) – Beta(9,1) – Beta(2,18) Posterior densities 9

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Posterior information Given: Posterior information = likelihood information + prior information. One option for point estimate: joint posterior mode of q using Newton Raphson. – Also called MAP (maximum a posteriori) estimate of . 10

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Recall the plant genetic linkage example Recall Suppose Then Almost as if you increased the number of plants in genotypes 2 and 3 by  -1…in genotype 4 by 

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Plant linkage example cont’d. Suppose data newton; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; alpha = 50; beta=500; theta = 0.01; /* try starting value of 0.50 too */ do iterate = 1 to 10; logpost = y1*log(2+theta) + (y2+y3+beta-1)*log(1-theta) + (y4+alpha-1)*log(theta); firstder = y1/(2+theta) - (y2+y3+beta-1)/(1-theta) + (y4+alpha-1)/theta; secndder = (-y1/(2+theta)**2 - (y2+y3+beta-1)/(1-theta)**2 - (y4+alpha-1)/theta**2); theta = theta + firstder/(-secndder); output; end; asyvar = 1/(-secndder); /* asymptotic variance of theta_hat at convergence */ poststd = sqrt(asyvar); call symputx("poststd",poststd); output; run; title "Posterior Standard Error = &poststd"; proc print; var iterate theta logpost; run; 12 Posterior standard error

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Output Posterior Standard Error = Posterior Standard Error = Obsiteratethetalogpost Posterior Standard Error =

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Additional elements of Bayesian inference Suppose that  can be partitioned into two components, a px1 vector  1 and a qx1 vector  2, If want to make probability statements about , use probability calculus: There is NO repeated sampling concept. – Condition on one observed dataset. – However, Bayes estimators typically do have very good frequentist properties! 14

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Marginal vs. conditional inference Suppose you’re primarily interested in  1 : – i.e. average over uncertainty on  2 (nuisance variables) Of course, if  2 was known, you would condition your inference on  1 accordingly: 15

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Two-stage model example Given with y i ~ NIID ( ,  2 ) where  2 is known. Wish to infer . From Bayes theorem: Suppose i.e. 16

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Simplify likelihood 17

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Posterior density Consider the following limit: Consistent with or 18

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Interpretation of Posterior Density with Flat Prior So Then i.e. 19

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Posterior density with informative prior Now After algebraic simplication: 20

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Note that Posterior precision = prior precision + sample (likelihood) precision 21 i.e., weighted average of data mean and prior mean

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Hierarchical models Given Two stage: Three stage: – What’s the difference? When do you consider one over another? 22

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Simple hierarchical model Random effects model – Y ij =  + a i + e ij  : overall mean, a i ~ NIID(0,  2 ) ; e ij ~ NIID(0,  2 ). Suppose we knew ,  2, and  2 : Shrinkage factor 23

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / What if we don’t know ,  2, or  2 ? Option 1: Estimate them: Then “plug them” in. Not truly Bayesian. – Empirical Bayesian (EB) (next section). – Most of us using PROC MIXED/GLIMMIX are EB! 24 e.g.method of moments

Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / A truly Bayesian approach 1) Y ij |  i ~ N(  i,  2 ) ; for all i,j 2)  1,  2, …,  k are iid N( ,  2 ) o Structural prior (exchangeable entities) 3)  ~ p(  );  2 ~ p(  2 );  2 ~ p(  2 ) o Subjective prior 25 Fully Bayesian inference (next section after that!)