Use of Monte-Carlo particle filters to fit and compare models for the dynamics of wild animal populations Len Thomas Newton Inst., 21 st Nov 2006 I always.

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

MCMC estimation in MlwiN
Using particle filters to fit state-space models of wildlife population dynamics Len Thomas, Stephen Buckland, Ken Newman, John Harwood University of St.
Jose-Luis Blanco, Javier González, Juan-Antonio Fernández-Madrigal University of Málaga (Spain) Dpt. of System Engineering and Automation May Pasadena,
CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.
A.M. Alonso, C. García-Martos, J. Rodríguez, M. J. Sánchez Seasonal dynamic factor model and bootstrap inference: Application to electricity market forecasting.
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
Bayesian statistics – MCMC techniques
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Sérgio Pequito Phd Student
Nonlinear and Non-Gaussian Estimation with A Focus on Particle Filters Prasanth Jeevan Mary Knox May 12, 2006.
Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.
Bootstrap in Finance Esther Ruiz and Maria Rosa Nieto (A. Rodríguez, J. Romo and L. Pascual) Department of Statistics UNIVERSIDAD CARLOS III DE MADRID.
Particle Filters for Mobile Robot Localization 11/24/2006 Aliakbar Gorji Roborics Instructor: Dr. Shiri Amirkabir University of Technology.
Today Introduction to MCMC Particle filters and MCMC
Comparative survey on non linear filtering methods : the quantization and the particle filtering approaches Afef SELLAMI Chang Young Kim.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-
For Better Accuracy Eick: Ensemble Learning
Helsinki University of Technology Adaptive Informatics Research Centre Finland Variational Bayesian Approach for Nonlinear Identification and Control Matti.
Muhammad Moeen YaqoobPage 1 Moment-Matching Trackers for Difficult Targets Muhammad Moeen Yaqoob Supervisor: Professor Richard Vinter.
1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.
MCMC: Particle Theory By Marc Sobel. Particle Theory: Can we understand it?
Markov Localization & Bayes Filtering
Kalman filtering techniques for parameter estimation Jared Barber Department of Mathematics, University of Pittsburgh Work with Ivan Yotov and Mark Tronzo.
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Computer vision: models, learning and inference Chapter 19 Temporal models.
From Bayesian Filtering to Particle Filters Dieter Fox University of Washington Joint work with W. Burgard, F. Dellaert, C. Kwok, S. Thrun.
SIS Sequential Importance Sampling Advanced Methods In Simulation Winter 2009 Presented by: Chen Bukay, Ella Pemov, Amit Dvash.
Particle Filtering (Sequential Monte Carlo)
Embedding population dynamics models in inference S.T. Buckland, K.B. Newman, L. Thomas and J Harwood (University of St Andrews) Carmen Fernández (Oceanographic.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Overview Particle filtering is a sequential Monte Carlo methodology in which the relevant probability distributions are iteratively estimated using the.
Forward-Scan Sonar Tomographic Reconstruction PHD Filter Multiple Target Tracking Bayesian Multiple Target Tracking in Forward Scan Sonar.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
-Arnaud Doucet, Nando de Freitas et al, UAI
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
Maximum a posteriori sequence estimation using Monte Carlo particle filters S. J. Godsill, A. Doucet, and M. West Annals of the Institute of Statistical.
Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.
Sequential Monte-Carlo Method -Introduction, implementation and application Fan, Xin
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Classification Ensemble Methods 1
SLAM Tutorial (Part I) Marios Xanthidis.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Rao-Blackwellised Particle Filtering for Dynamic Bayesian Network Arnaud Doucet Nando de Freitas Kevin Murphy Stuart Russell.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Matrix Models for Population Management & Conservation March 2014 Lecture 10 Uncertainty, Process Variance, and Retrospective Perturbation Analysis.
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Hierarchical Models. Conceptual: What are we talking about? – What makes a statistical model hierarchical? – How does that fit into population analysis?
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Introduction to Sampling based inference and MCMC
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Probabilistic Robotics
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
CAP 5636 – Advanced Artificial Intelligence
Filtering and State Estimation: Basic Concepts
Robust Full Bayesian Learning for Neural Networks
Presentation transcript:

Use of Monte-Carlo particle filters to fit and compare models for the dynamics of wild animal populations Len Thomas Newton Inst., 21 st Nov 2006 I always wanted to be a model….

Outline  1. Introduction  2. Basic particle filtering  3. Tricks to make it work in practice  4. Applications –(i) PF, Obs error fixed –(ii) PF vs KF, One colony model –(iii) PF vs MCMC  5. Discussion

References  Our work:

Joint work with…  Methods and framework: –Ken Newman, Steve Buckland: NCSE St Andrews  Seal models: –John Harwood, Jason Matthiopoulos: NCSE & Sea Mammal Research Unit –Many others at SMRU  Comparison with Kalman filter: –Takis Besbeas, Byron Morgan: NCSE Kent  Comparison with MCMC –Carmen Fernández: Univ. Lancaster

1. Introduction

Answering questions about wildlife systems  How many ?  Population trends  Vital rates  What if ? –scenario planning –risk assessment –decision support  Survey design –adaptive management

State space model State process densityg t (n t |n t-1 ; Θ) Observation process densityf t (y t |n t ; Θ) Initial state densityg 0 (n 0 ; Θ) Bayesian approach, so:  Priors on Θ  Initial state density + state density gives prior on n 1:T

British grey seal  Population in recovery from historical exploitation  NERC  Special Committee on Seals

Data  Aerial surveys of breeding colonies since 1960s count pups  Other data: intensive studies, radio tracking, genetic, counts at haul-outs

Pup production estimates

Orkney example colonies

State process model Life cycle graph representation pup density dependence here… … or here

Density dependence e.g. in pup survival Carrying capacity χ r

More flexible models of density dependence

State process model 4 regions pup North Sea pup Inner Hebrides pup Outer Hebrides pup Orkneys movement depends on distance density dependence site faithfulness

SSMs of widllife population dynamics: Summary of Features  State vector high dimensional (seal model: 7 x 4 x 22 = 616).  Observations only available on a subset of these states (seal model: 1 x 4 x 22 = 88)  State process density is a convolution of sub-processes so hard to evaluate.  Parameter vector is often quite large (seal model: 11-12).  Parameters often partially confounded, and some are poorly informed by the data.

Fitting state-space models  Analytic approaches –Kalman filter (Gaussian linear model; Besbeas et al.) –Extended Kalman filter (Gaussian nonlinear model – approximate) + other KF variations –Numerical maximization of the likelihood  Monte Carlo approximations –Likelihood-based (Geyer; de Valpine) –Bayesian  Rejection Sampling Damien Clancy  Markov chain Monte Carlo (MCMC; Bob O’Hara, Ruth King)  Sequential Importance Sampling (SIS) a.k.a. Monte Carlo particle filtering

Inference tasks for time series data  Observe data y 1:t = (y 1,...,y t )  We wish to infer the unobserved states n 1:t = (n 1,...,n t ) and parameters Θ  Fundamental inference tasks: –Smoothing p(n 1:t, Θ| y 1:t ) –Filtering p(n t, Θ t | y 1:t ) –Prediction p(n t+x | y 1:t ) x>0

Filtering  Filtering forms the basis for the other inference tasks  Filtering is easier than smoothing (and can be very fast) –Filtering recursion: divide and conquor approach that considers each new data point one at a time p(n 0 ) p(n 1 |y 1 ) Only need to integrate over n t, not n 1:t p(n 2 |y 1:2 ) y1y1 y2y2 p(n 3 |y 1:3 ) y3y3 p(n 4 |y 1:4 ) y4y4

Monte-Carlo particle filters: online inference for evolving datasets  Particle filtering used when fast online methods required to produce updated (filtered) estimates as new data arrives: –Tracking applications in radar, sonar, etc. –Finance  Stock prices, exchange rates arrive sequentially. Online update of portfolios. –Medical monitoring  Online monitoring of ECG data for sick patients –Digital communications –Speech recognition and processing

2. Monte Carlo Particle Filtering Variants/Synonyms: Sequential Monte Carlo methods Sequential Importance Sampling (SIS) Sampling Importance Sampling Resampling (SISR) Bootstrap Filter Interacting Particle Filter Auxiliary Particle Filter

Importance sampling  Want to make inferences about some function p(), but cannot evaluate it directly  Solution: –Sample from another function q() (the importance function) that has the same support as p() (or wider support) –Correct using importance weights

Example:

Importance sampling algorithm  Given p(n t |y 1:t ) and y t+1 want to update to p(n t+1 |y 1:t+1 ),  Prediction step: Make K random draws (i.e., simulate K “particles”) from importance function  Correction step: Calculate:  Normalize weights so that  Approximate the target density:

Importance sampling: take home message  The key to successful importance sampling is finding a proposal q() that: –we can generate random values from –has weights p()/q() that can be evaluated  The key to efficient importance sampling is finding a proposal q() that: –we can easily/quickly generate random values from –has weights p()/q() that can be evaluated easily/quickly –is close to the target distribution

Sequential importance sampling  SIS is just repeated application of importance sampling at each time step  Basic sequential importance sampling: –Proposal distribution q() = g(n t+1 |n t ) –Leads to weights  To do basic SIS, need to be able to: –Simulate forward from the state process –Evaluate the observation process density (the likelihood)

Basic SIS algorithm  Generate K “particles” from the prior on {n 0, Θ} and with weights 1/K:  For each time period t=1,...,T –For each particle i=1,...,K  Prediction step:  Correction step:

Justification of weights

Example of basic SIS  State-space model of exponential population growth –State model –Observation model –Priors

Example of basic SIS t=1 Obs: Predict Correct Sample from prior n 0 Θ 0 w Prior at t= n 1 Θ 0 w Posterior at t= n 1 Θ 1 w 1 gives f()

Example of basic SIS t=2 Obs: 14 gives f() Predict Correct Posterior at t= n 1 Θ 1 w ! Prior at t= n 2 Θ 1 w Posterior at t= n 2 Θ 2 w 2

Problem: particle depletion  Variance of weights increases with time, until few particles have almost all the weight  Results in large Monte Carlo error in approximation  Can quantify:  From previous example: Time012 ESS

Problem: particle depletion  Worse when: –Observation error is small –Lots of data at any one time point –State process has little stochasticity –Priors are diffuse or not congruent with observations –State process model incorrect (e.g., time varying) –Outliers in the data

Some intuition  In a (basic) PF, we simulate particles from the prior, and gradually focus in on the full posterior by filtering the particles using data from one time period at a time  Analogies with MCMC: –In MCMC, we take correlated samples from the posterior. We make proposals that are accepted stochastically.  Problem is to find a “good” proposal  Limitation is time – has the sampler converged yet? –In PF, we get an importance sample from the posterior. We generate particles from a proposal, that are assigned weights (and other stuff – see later).  Problem is to find a “good” proposal  Limitation is memory – do we have enough particles?  So, for each “trick” in MCMC, there is probably an analogous “trick” in PF (and visa versa)

3. Particle filtering “tricks” An advanced randomization technique

Tricks: solutions to the problem of particle depletion  Pruning: throw out “bad” particles (rejection)  Enrichment: boost “good” particles (resampling) –Directed enrichment (auxiliary particle filter) –Mutation (kernel smoothing)  Other stuff –Better proposals –Better resampling schemes –…–…

Rejection control  Idea: throw out particles with low weights  Basic algorithm, at time t: –Have a pre-determined threshold, c t, where 0 < c t <=1 –For i = 1, …, K, accept particle i with probability –If particle is accepted, update weight to –Now we have fewer than K samples  Can make up samples by sampling from the priors, projecting forward to the current time point and repeating the rejection control

Rejection control - discussion  Particularly useful at t=1 with diffuse priors  Can have a sequence of control points (not necessarily every year)  Check points don’t need to be fixed – can trigger when variance of weights gets too high  Thresholds, c t, don’t need to be set in advance but can be set adaptively (e.g., mean of weights)  Instead of restarting at time t=0, can restart by sampling from particles at previous check point (= partial rejection control)

Resampling: pruning and enrichment  Idea: allow “good” particles to amplify themselves while killing off “bad” particles  Algorithm. Before and/or after each time step (not necessarily every time step) –For j = 1, …, K  Sample independently from the set of particles according to the probabilities  Assign new weights  Reduces particle depletion of states as “children” particles with the same “parent” now evolve independently

Resample probabilities  Should be related to the weights  (as in the bootstrap filter)  –α could vary according to the variance of weights –α = ½ has been suggested  related to “future trend” – as in auxiliary particle filter

Directed resampling: auxiliary particle filter  Idea: Pre-select particles likely to have high weights in future  Example algorithm. –For j = 1, …, K  Sample independently from the set of particles according to the probabilities  Predict:  Correct:  If “future” observations are available can extend to look >1 time step ahead – e.g., protein folding application Can obtain by projecting forward deterministically

Kernel smoothing: enrichment of parameters through mutation  Idea: Introduce small “mutations” into parameter values when resampling  Algorithm: –Given particles –Let V t be the variance matrix of the –For i = 1, …, K  Sample where h controls the size of the perturbations –Variance of parameters is now (1+h 2 )V t, so need shrinkage to preserve 1 st 2 moments

Kernel smoothing - discussion  Previous algorithm does not preserve the relationship between parameters and states –Leads to poor smoothing inference –Possibly unreliable filtered inference? –Pragmatically – use as small a value of h as possible  Extensions: –Kernel smooth states as well as parameters –Local kernel smoothing

Other “tricks”  Reducing dimension: –Rao Blackwellization – integrating out some part of the model  Better proposals: –Start with an importance sample (rather than from priors) –Conditional proposals  Better resampling: –Residual resamling –Stratified resampling  Alternative “mutation” algorithms: –MCMC within PF  Gradual focussing on posterior: –Tempering/anneling ……

4. Applications

(i) Faray example  Motivation: Comparison with Kalman Filter (KF) via Integrated Population Modelling methods of Besbeas et al.

Example State Process Model: Density dependent emigration pup density dependent emigration  τ fixed at 1991

Observation Process Model  Ψ = CV of observations

Priors  Parameters: –Informative priors on survival rates from intensive studies (mark-recapture) –Informative priors on fecundity, carrying capacity and observation CV from expert opinion  Initial values for states in 1984: –For pups, assume –For other ages:  Stable age prior  More diffuse prior

Fitting the Faray data  One colony: relatively low dimension problem  So few “tricks” required –Pruning (rejection control) in first time period –Multiple runs of sampler until required accuracy reached (note – ideal for parallelization) –Pruning of final results (to reduce number of particles stored)

Results – Smoothed states KF Result SIS Result More diffuse prior

Posterior parameter estimates Param 1Param 2 φaφa φpφp α ψ β Sensitivity to priors (Method of Millar, 2004) Prior Posterior median Median ML est from KF

Results – SIS Stable age prior KF Result SIS Result Stable age prior

(ii) Extension to regional model pup North Sea pup Inner Hebrides pup Outer Hebrides pup Orkneys density dependent juvenile survival movement depends on distance density dependence site faithfulness

Fitting the regional data  Higher dimensional problem (7x4xN.years states; 11 parameters)  More “tricks” required for an efficient sampler –Pruning (rejection control) in first time period –Multiple runs with rejection control of final results –Directed enrichment (auxiliary particle filter with kernel smoothing of parameters)

Estimated pup production

Posterior parameter estimates

Predicted adults

(iii) Comparison with MCMC  Motivation: –Which is more efficient? –Which is more general? –Do the “tricks” used in SIS cause bias?  Example applications: –Simulated data for Coho salmon –Grey seal data – 4 region model with movement and density dependent pup survival

Summary of findings  To be efficient, the MCMC sampler was not at all general  We also used an additional “trick” in SIS: integrating out the observation CV parameter. SIS algorithm still quite general however.  MCMC was more efficient (lower MC variation per unit CPU time)  SIS algorithm was less efficient, but was not significantly biased

Update: Kernel smoothing bias KS discount = KS discount = 0.997

Can’t we discuss this? 5. Discussion I’ll make you fit into my model!!!

Modelling framework  State-space framework –Can explicitly incorporate knowledge of biology into state process models –Explicitly model sources of uncertainty in the system –Bring together diverse sources of information  Bayesian approach –Expert knowledge frequently useful since data is often uninformative –(In theory) can fit models of arbitrary complexity

SIS vs KF  Like SIS, use of KF and extensions is still an active research topic  KF is certainly faster – but is it accurate and flexible enough?  May be complementary: –KF could be used for initial model investigation/selection –KF could provide a starting importance sample for a particle filter

SIS vs MCMC  SIS: –In other fields, widely used for “on-line” problems – where the emphasis is on fast filtered estimates  foot and mouth outbreak?  N. American West coast salmon harvest openings? –Can the general algorithms be made more efficient?  MCMC: –Better for “off-line” problems? – plenty of time to develop and run highly customized, efficient samplers –Are general, efficient samplers possible for this class of problems?  Current disadvantages of SIS: –Methods less well developed than for MCMC? –No general software (no WinBUGS equivalent – “WinSIS”)

Current / future research  SIS: –Efficient general algorithms (and software) –Comparison with MCMC and Kalman filter –Parallelization –Model selection and multi-model inference –Diagnostics  Wildlife population models: –Other seal models (random effects, covariates, colony- level analysis, more data…) –Other applications (salmon, sika deer, Canadian seals, killer whales, …)

! Just another particle…

Inference from different models 1 Assuming N adult males is 0.73*N adult females

Model selection

Effect of independent estimate of total population size DDS & DDF Models Assumes independent estimate is normally distributed with 15%CV. Calculations based on data from