Presentation is loading. Please wait.

Presentation is loading. Please wait.

A correction on notation (Thanks Emma and Melissa)

Similar presentations


Presentation on theme: "A correction on notation (Thanks Emma and Melissa)"— Presentation transcript:

1 A correction on notation (Thanks Emma and Melissa)

2 Bayes’ Theorem (probabilities) What is the posterior probability that hypothesis H i is correct? This is the classical Bayes’ Theorem. posterior probabilityprior probability of observing the data

3 Bayes’ Theorem (likelihoods) posterior likelihoodprior probability of observing the data likelihood of H i given the observed data = probability of the data given known values of H i Therefore because this is the same height on the same probability distribution

4 Bayesian methods II

5 Bayes in words If you observe some data, then you can calculate the posterior probability of a particular hypothesis H i as a function of the likelihood of the hypothesis given the observed data, combined with the prior information you have about the likelihood of that hypothesis

6 Application of Bayesian methods Gmail uses Bayesian spam filtering – Every user has unique prior probabilities that each word is a spam word – These are calculated from your actions in either replying to a message, saving it, or marking it as spam – E.g. for me the prior probability of spam for different words is probably something like this 0.0001 “fisheries”, 0.0001 “fish458”, 0.0001 “blue whales”, 0.999 “ONE HUNDRED MILLION DOLLARS”, 0.1 “journal”, 0.9 “CONGRATULATION!!!” 0.7 “to formally invite you” – Take all the prior probabilities for me for all the words in a message, multiply them by the likelihood each word is spam, and each message gets a posterior probability that the whole email is spam Spam that sneaks through is likely to be something like “Free blue whale cruise for university professors teaching quantitative modeling, click here!”

7 Recovering Antarctic blue whales? Largest animal to ever live Whaling 1904–1973 Catches peaked at 30,365 in 1930 Current abundance 400- 2300 Are they recovering? Branch TA (2007) Abundance of Antarctic blue whales south of 60°S from three complete circumpolar sets of surveys. Journal of Cetacean Research and Management 9:253-262

8 Why Bayesian? Prior on r allows use of outside information – What is the fastest they could increase? – How fast do other whale populations increase? What is the probability of competing hypotheses? (i.e. that r > 0)

9 Model and data Model years 1973- 2005, parameters N 1973 and r the exponential rate of increase No catches (ended in 1972) Three abundance estimates, with known CV Lognormal likelihood No process error, CV add Branch, TA et al. (2004) Evidence for increases in Antarctic blue whales based on Bayesian modelling. Marine Mammal Science 20:726-754 Branch TA (2007) Abundance of Antarctic blue whales south of 60°S from three complete circumpolar sets of surveys. Journal of Cetacean Research and Management 9:253-262

10 e.g. N 1981 term, take log, remove constant terms We will be running the model and calculating the likelihood thousands or even millions of times. Simplifying it really helps

11 Total NLL for all data years

12 Maximum likelihood estimates MLE r = 0.104 N 1973 = 178 NLL = 0.347 25 Antarctic blue grid.xlsx Year Abundance

13 Grid method 25 Antarctic blue grid.xlsx

14 Grid method for likelihoods Grid cell is likelihood for each value of r and N 1973. Dividing by the highest likelihood gives scaled likelihood for each cell To get an approximate likelihood profile on r, find the highest scaled likelihood for each value of r 26 Antarctic blue grid.xlsx, sheet “many cells”

15 Likelihoods: grid surface -0.100.000.20.1 10 2000 1000 500 1500 -0.050.050.15 Approximate likelihood profile Each cell in the grid is the scaled likelihood for (a large number of) discrete values of r and N 1973 The line is the discrete N 1973 producing the highest likelihood Scaled likelihood close to zero Likelihood profile on this line MLE 26 Antarctic blue grid.xlsx, sheet “many cells” Scaled likelihood close to zero

16 Grid likelihood: r profile Grid gives only an approximate likelihood profile because for each value of r we are taking the lowest NLL among a finite number of N1973 values. As the number of N1973 values considered increases, the approximate profile gets smoother and smoother. The true profile would involve for each value of r finding the exact N1973 that minimizes the NLL. 26 Antarctic blue grid.xlsx, sheet “many cells” Value of r Scaled likelihood

17 Grid likelihood: r profile 26 Antarctic blue grid.xlsx, sheet “few cells” Small number of discrete r and N 1973 values leads to poor approximation of r profile Value of r Scaled likelihood

18 Grid method for posterior Cells contain likelihood×prior for each value of r and N 1973. Each cell is a hypothesis H i. The posterior probability of H i is: To get marginal posterior for a value of r, get the sum of each column divided by sum of all cells This is integration Sum of all cells Value in each cell is hypothesis H i of each value of r and N 1973 Posterior probability of individual pairs of r and N 1973 values 26 Antarctic blue grid.xlsx, sheet “many cells”

19 Bayesian: two differences Integration instead of maximization Priors

20 Integration not maximization -0.100.000.20.1 10 2000 1000 500 1500 -0.050.050.15 Likelihood: for each value of r, search for N 1973 with the best NLL (stars) Bayesian: for each value of r, integrate (“add up”) cells across values of N 1973 Where the green/yellow area is very narrow, Bayesian integration will have smaller summed probability compared to the maximum value used in a likelihood profile Maximization line Integration column 26 Antarctic blue grid.xlsx, sheet “many cells” Integration column

21 Value of r Scaled likelihood Posterior distribution Likelihood profile vs. Bayesian posterior 26 Antarctic blue grid.xlsx, sheet “many cells”

22 Normal prior on r = N[0.062,0.029 2 ] Punt et al. (2010) looked at actual increase rates in depleted whale populations and found a mean of 6.2% and SD of 2.9% Multiply the likelihood by a prior for r that is normal with mean 0.062 and SD 0.029 Dropping constants, Punt AE & Allison C (2010) Appendix 2. Revised outcomes from the Bayesian meta-analysis, Annex D: Report of the sub-committee on the revised management procedure. Journal of Cetacean Research and Management (Suppl. 2) 11:129-130 No prior Normal prior -0.10 0.200.10 26 Antarctic blue grid.xlsx, sheet “many cells”

23 Uniform r prior -0.10 to 0.118 -0.100.000.2 10 2000 1000 500 1500 -0.050.050.15 MLE Zerbini et al. (2011) showed it is impossible for baleen whales to increase at more than 11.8% per year. Multiply the likelihood by the prior. The prior is 0 if r 0.118; and a uniform constant for -0.10 ≤ r ≤ 0.118. The constant is 1/(11.8+0.10) = 0.084 0.118 Zero likelihood here Zerbini AN et al. (2010) Assessing plausible rates of population growth in humpback whales from life-history data. Mar. Biology 157:1225-1236 26 Antarctic blue grid.xlsx, sheet “many cells”

24 Effect of different priors Value of r Posterior probability 26 Antarctic blue grid.xlsx, sheet “compare all priors”

25 Likelihood profileBayesian posterior Uniform prior U[-0.1, 0.2] Prior N(0.062, 0.029 2 ) median 0.072 95% credible interval 0.029-0.115 median 0.086 95% credible interval 0.022-0.155 MLE estimate 0.104 95% confidence interval 0.038-0.170 Integration not maximization Informative prior Likelihood profile 95% confidence interval “out of 100 experiments, 95 of the calculated intervals will contain the true fixed value of r. Any single interval either contains the value or it does not.” Bayesian 95% credibility interval “there is a 95% probability that the true value of r is within the interval” 26 Antarctic blue grid.xlsx, sheet “many cells”

26 Grid method Two parameters, calculated 301 × 200 = 60,200 complete model calls to get those posteriors, took about 1 second If there were 10 parameters we would need 10 23 points and it would take 30 quadrillion years to calculate all the points Need a more general method 26 Antarctic blue grid.xlsx, sheet “many cells”

27 SIR method

28 Problem with grid method You don’t know how fine to make the grid steps You really want steps to be continuous Instead of systematic sampling, the SIR method randomly samples the grid region Good guesses (draws) are kept and bad draws are discarded When enough draws have been saved from SIR to get a smooth posterior (1000 or 5000), then stop 26 Antarctic blue SIR.xlsx, sheet “Normal prior”

29 SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For each pair, calculate X = likelihood × prior Accept pair with probability X/Y, otherwise reject Note that X/Y = exp([-lnY] – [-lnX]) = exp(NLL(Y) – NLL(X) Accepted pairs are the posterior Repeat until you have sufficient accepted pairs 26 Antarctic blue SIR.xlsx, sheet “Normal prior”

30 SIR: accepted, rejected 26 Antarctic blue SIR.xlsx, sheet “Normal prior” Value of N 1973 Value of r

31 20,000 samples, 296 accepted r = 0.072, 95% interval = 0.027-0.112 – Grid method 0.072, 0.029-0.115 N 1973 = 320, 95% interval = 145-689 LOTS of rejected function calls (waste) Tricks almost always employed to increase acceptance rates – Accept with probability X/Z where Z is smaller than Y, will accept more draws, and some draws will be duplicated in the posterior (no time now) – Sample parameter values from the priors and compare ratios of likelihood only (no time now) 26 Antarctic blue SIR.xlsx, sheet “Normal prior”

32 SIR threshold to increase acceptance rate Choose threshold Z where Z < maximum likelihood Y Randomly sample pairs of r and N 1973 For each pair, calculate X = likelihood × prior If X ≤ Z, accept pair with probability X/Z If X > Z, accept multiple copies of X E.g. if X/Z = 4.6 then save 4 copies with probability 0.4 or 5 copies with probability 0.6 26 Antarctic blue SIR.xlsx, sheet “Normal prior”

33 Accepted multiple times, accepted once, rejected 26 Antarctic blue SIR.xlsx, sheet “Normal prior” Value of N 1973 Value of r

34 Advantage of discrete samples Each draw that is saved is a sample from the posterior distribution We can take these pairs of (r, N 1973 ) and project the model into the future for each pair This gives us future predictions for the joint values of the parameters Takes into account correlations between parameter values


Download ppt "A correction on notation (Thanks Emma and Melissa)"

Similar presentations


Ads by Google