SIR method continued
SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For each pair, calculate X = likelihood × prior Accept pair with probability X/Y, otherwise reject Note that X/Y = exp([-lnY] – [-lnX]) = exp(NLL(Y) – NLL(X)) Accepted pairs are the posterior Repeat until you have sufficient accepted pairs 26 Antarctic blue SIR.xlsx, sheet “Normal prior”
SIR: accepted, rejected 26 Antarctic blue SIR.xlsx, sheet “Normal prior” Value of N 1973 Value of r
20,000 samples, 296 accepted r = 0.072, 95% interval = – Grid method 0.072, N 1973 = 320, 95% interval = LOTS of rejected function calls (waste) Tricks almost always employed to increase acceptance rates – Accept with probability X/Z where Z is smaller than Y, will accept more draws, and some draws will be duplicated in the posterior (no time now) – Sample parameter values from the priors and compare ratios of likelihood only (no time now) 26 Antarctic blue SIR.xlsx, sheet “Normal prior”
SIR threshold to increase acceptance rate Choose threshold Z where Z < maximum likelihood Y Randomly sample pairs of r and N 1973 For each pair, calculate X = likelihood × prior If X ≤ Z, accept pair with probability X/Z If X > Z, accept multiple copies of X E.g. if X/Z = 4.6 then save 4 copies with probability 0.4 or 5 copies with probability Antarctic blue SIR.xlsx, sheet “Normal prior”
Accepted multiple times, accepted once, rejected 26 Antarctic blue SIR.xlsx, sheet “Normal prior” Value of N 1973 Value of r
Advantage of discrete samples Each draw that is saved is a sample from the posterior distribution We can take these pairs of (r, N 1973 ) and project the model into the future for each pair This gives us future predictions for the joint values of the parameters Takes into account correlations between parameter values (imagine a model with 20 parameters)
MCMC method Markov Chain Monte Carlo
Markov Chain Monte Carlo (MCMC) Start somewhere Randomly jump somewhere else If you found a better place, go there If you found a worse place, go there with some probability There are formal proofs that this works
MCMC algorithm I Start anywhere with values for r 1, N1973 1, X 1 = likelihood×prior Jump function: add random numbers to r 1 and N1973 1, to get a candidate draw: r*, N1973*, and X* = likelihood×prior Calculate X*/X 1 which equals exp([-lnX 1 ] – [-lnX*]) If random number U[0,1] is < X*/X 1 then r 2 = r*, N = N1973*, X 2 = X* [accept draw] If random number U[0,1] ≥ X*/X 1 then r 2 = r 1, N = N1973 1, X 2 = X 1 [reject draw] 27 Antarctic blue MCMC.xlsx
MCMC algorithm II Successive points wander around the posterior If you start far away, it will take some time to get near to the highest likelihood Therefore, discard first 20% of accepted draws (burn- in period) Thin the chain, by retaining only one in every n accepted draws Convergence attained when no autocorrelation in thinned chain (there are other tests for convergence) 27 Antarctic blue MCMC.xlsx
Trace for N 1973 Trace for rr vs. N Antarctic blue MCMC.xlsx Draws 1–500 Draws 2,000–10,000
10,000 samples, 2669 accepted r = 0.074, 95% interval = – Grid method 0.072, N 1973 = 302, 95% interval = Increase length of chain, change jump size, change thinning rate, change burn-in period, etc. 27 Antarctic blue MCMC.xlsx
RejectedAccepted MCMC (10000 samples) SIR (20000 samples) Does not explore space with low likelihood Therefore many more draws accepted 27 Accepted rejected comparison.xlsx
Answer to original question Are Antarctic blue whales increasing? Using informative prior, MCMC, zero posterior draws out of 8000 with r < 0 Answer: yes, they are increasing (P ≈ ) Using uniform prior U[-0.1, 0.2], MCMC, 2 out of 8000 draws with r < 0 Answer: yes, they are increasing (P = ) The choice of prior does not really matter
Conjugate prior method
Beta-binomial demo in R Sometimes we don’t need to go through the numerical methods (grid, SIR, MCMC) Demo in R: “27 Beta binomial Bayesian.R” If the prior is a particular distribution (beta) and the likelihood is a particular distribution (binomial), then the posterior will be a beta distribution These are called conjugate priors 27 Beta binomial Bayesian.r
Example Tag some fish, hold them for 1 month, what fraction p die? Data: number of deaths, number of survivors Prior on p (choose it to be a beta distribution) Likelihood of observing data given value of p (binomial distribution) Posterior is beta (with parameters a function of the parameters of the prior and likelihood) 27 Beta binomial Bayesian.r
Bayesian methods summary Different algorithms: grid method, SIR method, MCMC method, conjugate priors, Gibbs samplers, etc. All involve priors, likelihoods, and posteriors Natural interpretation of probability Allow use of other information Posterior draws can be used for prediction