Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Bayes Theorem Recall basic axiom of probability: – f( ,y) = f(y| ) f( ) Also – f( ,y) = f( |y) f(y) Combine both expressions to get: or Posterior Likelihood * Prior 2
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Prior densities/distributions What can we specify for ? – Anything that reflects our prior beliefs. – Common choice: “conjugate” prior. is chosen such that is recognizeable and of same form. – “Flat” prior:. Then – flat priors can be dangerous…can lead to improper ; i.e. 3
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Prior information / Objective? Introducing prior information may somewhat "bias" sample information; nevertheless, ignoring existing prior information is inconsistent with – 1) human rational behavior – 2) nature of the scientific method. – Memory property: past inference (posterior) can be used as updated prior in future inference. Nevertheless, many applied Bayesian data analysts try to be as “objective” as possible using diffuse (e.g., flat) priors. 4
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Example of conjugate prior Recall the binomial distribution: Suppose we express prior belief on p using a beta distribution: – Denoted as Beta( , ) 5
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Examples of different beta densities 6 Diffuse (flat) bounded prior (but it is proper since it is bounded!)
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Posterior density of p Posterior Likelihood * Prior i.e. Beta(y+ ,n-y+ ) Beta is conjugate to the Binomial 7
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Suppose we observe data y = 10, n = 15. Consider three alternative priors: – Beta(1,1) – Beta(9,1) – Beta(2,18) Posterior densities: 8 Beta(y+ ,n-y+ )
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Suppose we observed a larger dataset y = 100, n = 150. Consider same alternative priors: – Beta(1,1) – Beta(9,1) – Beta(2,18) Posterior densities 9
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Posterior information Given: Posterior information = likelihood information + prior information. One option for point estimate: joint posterior mode of q using Newton Raphson. – Also called MAP (maximum a posteriori) estimate of . 10
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Recall the plant genetic linkage example Recall Suppose Then Almost as if you increased the number of plants in genotypes 2 and 3 by -1…in genotype 4 by
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Plant linkage example cont’d. Suppose data newton; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; alpha = 50; beta=500; theta = 0.01; /* try starting value of 0.50 too */ do iterate = 1 to 10; logpost = y1*log(2+theta) + (y2+y3+beta-1)*log(1-theta) + (y4+alpha-1)*log(theta); firstder = y1/(2+theta) - (y2+y3+beta-1)/(1-theta) + (y4+alpha-1)/theta; secndder = (-y1/(2+theta)**2 - (y2+y3+beta-1)/(1-theta)**2 - (y4+alpha-1)/theta**2); theta = theta + firstder/(-secndder); output; end; asyvar = 1/(-secndder); /* asymptotic variance of theta_hat at convergence */ poststd = sqrt(asyvar); call symputx("poststd",poststd); output; run; title "Posterior Standard Error = &poststd"; proc print; var iterate theta logpost; run; 12 Posterior standard error
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Output Posterior Standard Error = Posterior Standard Error = Obsiteratethetalogpost Posterior Standard Error =
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Additional elements of Bayesian inference Suppose that can be partitioned into two components, a px1 vector 1 and a qx1 vector 2, If want to make probability statements about , use probability calculus: There is NO repeated sampling concept. – Condition on one observed dataset. – However, Bayes estimators typically do have very good frequentist properties! 14
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Marginal vs. conditional inference Suppose you’re primarily interested in 1 : – i.e. average over uncertainty on 2 (nuisance variables) Of course, if 2 was known, you would condition your inference on 1 accordingly: 15
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Two-stage model example Given with y i ~ NIID ( , 2 ) where 2 is known. Wish to infer . From Bayes theorem: Suppose i.e. 16
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Simplify likelihood 17
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Posterior density Consider the following limit: Consistent with or 18
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Interpretation of Posterior Density with Flat Prior So Then i.e. 19
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Posterior density with informative prior Now After algebraic simplication: 20
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Note that Posterior precision = prior precision + sample (likelihood) precision 21 i.e., weighted average of data mean and prior mean
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Hierarchical models Given Two stage: Three stage: – What’s the difference? When do you consider one over another? 22
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / Simple hierarchical model Random effects model – Y ij = + a i + e ij : overall mean, a i ~ NIID(0, 2 ) ; e ij ~ NIID(0, 2 ). Suppose we knew , 2, and 2 : Shrinkage factor 23
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / What if we don’t know , 2, or 2 ? Option 1: Estimate them: Then “plug them” in. Not truly Bayesian. – Empirical Bayesian (EB) (next section). – Most of us using PROC MIXED/GLIMMIX are EB! 24 e.g.method of moments
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / A truly Bayesian approach 1) Y ij | i ~ N( i, 2 ) ; for all i,j 2) 1, 2, …, k are iid N( , 2 ) o Structural prior (exchangeable entities) 3) ~ p( ); 2 ~ p( 2 ); 2 ~ p( 2 ) o Subjective prior 25 Fully Bayesian inference (next section after that!)