SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Bayesian Estimation in MARK
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Bayesian Methods with Monte Carlo Markov Chains III
Bayesian Reasoning: Markov Chain Monte Carlo
Bayesian statistics – MCMC techniques
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Using MCMC Separating MCMC from Bayesian Inference? Line fitting revisited A toy equaliser problem Some lessons A problem in film restoration/retouching.
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-
Introduction to Monte Carlo Methods D.J.C. Mackay.
Bayes Factor Based on Han and Carlin (2001, JASA).
Statistical Decision Theory
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Priors, Normal Models, Computing Posteriors
2 nd Order CFA Byrne Chapter 5. 2 nd Order Models The idea of a 2 nd order model (sometimes called a bi-factor model) is: – You have some latent variables.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Module 1: Statistical Issues in Micro simulation Paul Sousa.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Molecular Systematics
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
1 Bayesian Essentials Slides by Peter Rossi and David Madigan.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
MCMC reconstruction of the 2 HE cascade events Dmitry Chirkin, UW Madison.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
TEMPLATE DESIGN © Approximate Inference Completing the analogy… Inferring Seismic Event Locations We start out with the.
Bayesian statistics named after the Reverend Mr Bayes based on the concept that you can estimate the statistical properties of a system after measuting.
Chapter 2: Bayesian hierarchical models in geographical genetics Manda Sayler.
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
A correction on notation (Thanks Emma and Melissa)
FIXETH LIKELIHOODS this is correct. Bayesian methods I: theory.
Prediction and Missing Data. Summarising Distributions ● Models are often large and complex ● Often only interested in some parameters – e.g. not so interested.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo in R
MCMC Output & Metropolis-Hastings Algorithm Part I
Advanced Statistical Computing Fall 2016
ERGM conditional form Much easier to calculate delta (change statistics)
Jun Liu Department of Statistics Stanford University
CS 4/527: Artificial Intelligence
Markov Chain Monte Carlo
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
CAP 5636 – Advanced Artificial Intelligence
More about Posterior Distributions
Ch13 Empirical Methods.
CS 188: Artificial Intelligence
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Classical regression review
Presentation transcript:

SIR method continued

SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For each pair, calculate X = likelihood × prior Accept pair with probability X/Y, otherwise reject Note that X/Y = exp([-lnY] – [-lnX]) = exp(NLL(Y) – NLL(X)) Accepted pairs are the posterior Repeat until you have sufficient accepted pairs 26 Antarctic blue SIR.xlsx, sheet “Normal prior”

SIR: accepted, rejected 26 Antarctic blue SIR.xlsx, sheet “Normal prior” Value of N 1973 Value of r

20,000 samples, 296 accepted r = 0.072, 95% interval = – Grid method 0.072, N 1973 = 320, 95% interval = LOTS of rejected function calls (waste) Tricks almost always employed to increase acceptance rates – Accept with probability X/Z where Z is smaller than Y, will accept more draws, and some draws will be duplicated in the posterior (no time now) – Sample parameter values from the priors and compare ratios of likelihood only (no time now) 26 Antarctic blue SIR.xlsx, sheet “Normal prior”

SIR threshold to increase acceptance rate Choose threshold Z where Z < maximum likelihood Y Randomly sample pairs of r and N 1973 For each pair, calculate X = likelihood × prior If X ≤ Z, accept pair with probability X/Z If X > Z, accept multiple copies of X E.g. if X/Z = 4.6 then save 4 copies with probability 0.4 or 5 copies with probability Antarctic blue SIR.xlsx, sheet “Normal prior”

Accepted multiple times, accepted once, rejected 26 Antarctic blue SIR.xlsx, sheet “Normal prior” Value of N 1973 Value of r

Advantage of discrete samples Each draw that is saved is a sample from the posterior distribution We can take these pairs of (r, N 1973 ) and project the model into the future for each pair This gives us future predictions for the joint values of the parameters Takes into account correlations between parameter values (imagine a model with 20 parameters)

MCMC method Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC) Start somewhere Randomly jump somewhere else If you found a better place, go there If you found a worse place, go there with some probability There are formal proofs that this works

MCMC algorithm I Start anywhere with values for r 1, N1973 1, X 1 = likelihood×prior Jump function: add random numbers to r 1 and N1973 1, to get a candidate draw: r*, N1973*, and X* = likelihood×prior Calculate X*/X 1 which equals exp([-lnX 1 ] – [-lnX*]) If random number U[0,1] is < X*/X 1 then r 2 = r*, N = N1973*, X 2 = X* [accept draw] If random number U[0,1] ≥ X*/X 1 then r 2 = r 1, N = N1973 1, X 2 = X 1 [reject draw] 27 Antarctic blue MCMC.xlsx

MCMC algorithm II Successive points wander around the posterior If you start far away, it will take some time to get near to the highest likelihood Therefore, discard first 20% of accepted draws (burn- in period) Thin the chain, by retaining only one in every n accepted draws Convergence attained when no autocorrelation in thinned chain (there are other tests for convergence) 27 Antarctic blue MCMC.xlsx

Trace for N 1973 Trace for rr vs. N Antarctic blue MCMC.xlsx Draws 1–500 Draws 2,000–10,000

10,000 samples, 2669 accepted r = 0.074, 95% interval = – Grid method 0.072, N 1973 = 302, 95% interval = Increase length of chain, change jump size, change thinning rate, change burn-in period, etc. 27 Antarctic blue MCMC.xlsx

RejectedAccepted MCMC (10000 samples) SIR (20000 samples) Does not explore space with low likelihood Therefore many more draws accepted 27 Accepted rejected comparison.xlsx

Answer to original question Are Antarctic blue whales increasing? Using informative prior, MCMC, zero posterior draws out of 8000 with r < 0 Answer: yes, they are increasing (P ≈ ) Using uniform prior U[-0.1, 0.2], MCMC, 2 out of 8000 draws with r < 0 Answer: yes, they are increasing (P = ) The choice of prior does not really matter

Conjugate prior method

Beta-binomial demo in R Sometimes we don’t need to go through the numerical methods (grid, SIR, MCMC) Demo in R: “27 Beta binomial Bayesian.R” If the prior is a particular distribution (beta) and the likelihood is a particular distribution (binomial), then the posterior will be a beta distribution These are called conjugate priors 27 Beta binomial Bayesian.r

Example Tag some fish, hold them for 1 month, what fraction p die? Data: number of deaths, number of survivors Prior on p (choose it to be a beta distribution) Likelihood of observing data given value of p (binomial distribution) Posterior is beta (with parameters a function of the parameters of the prior and likelihood) 27 Beta binomial Bayesian.r

Bayesian methods summary Different algorithms: grid method, SIR method, MCMC method, conjugate priors, Gibbs samplers, etc. All involve priors, likelihoods, and posteriors Natural interpretation of probability Allow use of other information Posterior draws can be used for prediction