Recent Advances in Statistical Ecology using Computationally Intensive Methods Ruth King.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Introduction to Monte Carlo Markov chain (MCMC) methods
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Cross Sectional Designs
Objectives 10.1 Simple linear regression
Longitudinal and Multilevel Methods for Models with Discrete Outcomes with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
METHODS FOR HAPLOTYPE RECONSTRUCTION
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Bayesian Estimation in MARK
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Business Statistics for Managerial Decision
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
Bayesian Reasoning: Markov Chain Monte Carlo
BAYESIAN INFERENCE Sampling techniques
Visual Recognition Tutorial
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-
Robin McDougall, Ed Waller and Scott Nokleby Faculties of Engineering & Applied Science and Energy Systems & Nuclear Science 1.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Bayes Factor Based on Han and Carlin (2001, JASA).
Chapter 10 Hypothesis Testing
Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
Bayesian Analysis and Applications of A Cure Rate Model.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Simulation techniques Summary of the methods we used so far Other methods –Rejection sampling –Importance sampling Very good slides from Dr. Joo-Ho Choi.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Tracking Multiple Cells By Correspondence Resolution In A Sequential Bayesian Framework Nilanjan Ray Gang Dong Scott T. Acton C.L. Brown Department of.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
Statistics What is the probability that 7 heads will be observed in 10 tosses of a fair coin? This is a ________ problem. Have probabilities on a fundamental.
Reducing MCMC Computational Cost With a Two Layered Bayesian Approach
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Capture-recapture Models for Open Populations “Single-age Models” 6.13 UF-2015.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Hierarchical Models. Conceptual: What are we talking about? – What makes a statistical model hierarchical? – How does that fit into population analysis?
Prediction and Missing Data. Summarising Distributions ● Models are often large and complex ● Often only interested in some parameters – e.g. not so interested.
Introduction to Sampling based inference and MCMC
How to handle missing data values
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Hidden Markov Models Part 2: Algorithms
Multidimensional Integration Part I
Statistical NLP: Lecture 4
Multistate models Lecture 10.
Presentation transcript:

Recent Advances in Statistical Ecology using Computationally Intensive Methods Ruth King

Overview  Introduction to wildlife populations and identification of questions of interest.  Motivating example.  Issues to be addressed: Missing data; Model discrimination.  Summary.  Future research.

Wildlife Populations  In recent years there has been increasing interest in wildlife populations.  Often we may be interested in population changes over time, e.g. if there is a declining population. (Steve Buckland)  Alternatively, we may be interested in the underlying dynamics of the system, in order to obtain a better understanding of the population.  We shall concentrate on this latter problem, with particular focus on identifying factors that affect demographic rates.

Data Collection  Data are often collected via some form of capture-recapture study.  Observers go out into the field and record all animals that are seen at a series of capture events.  Animals may be recorded via simply resightings or recaptures (of live animals) and recoveries (of dead animals).  At each capture event all unmarked animals are uniquely marked; all observed animals are recorded and subsequently released back into the population.

Data  Each animal is uniquely identifiable so our data consist of the capture histories for each individual observed in the study.  A typical capture history may look like:  0/1 corresponds to the individual being unobserved/observed at that capture time; and 2 denotes an individual is recovered dead.  We can then explicitly write down the corresponding likelihood as a function of survival (), recapture (p) and recovery () rates.

Likelihood  The likelihood is the product over all individuals of the probability of their corresponding capture history, conditional on their initial capture.  For example, for an individual with history: the contribution to the likelihood is:  2 p 3  3 (1-p 4 )  4 (1-p 5 )  5 p 6 (1- 6 ) 7.  Then, we can use this likelihood to estimate the parameter values (either MLE’s or posterior distributions).

Covariates  Covariates are often used to explain temporal heterogeneity within the parameters.  Typically these are “environmental” factors, such as resource availability, weather conditions or human intervention.  Alternatively, heterogeneity in a population can often be explained via different (individual) covariates.  For example, sex, condition or breeding status.  We shall consider the survival rates to be possibly dependent on the different covariates.

Case Study: Soay sheep  We consider mark-recapture-recovery (MRR) data relating to Soay sheep on the island of Hirta.  The sheep are free from human activity and external competition/predation.  Thus, this population is ideal for investigating the impact of different environmental and/or individual factors on the sheep.  We consider annual data from , collected on female sheep (1079 individuals). This is joint work with Steve Brooks and Tim Coulson

Covariate Information (i) Individual covariates: Coat type (1=dark, 2=light) Horn type (1=polled, 2=scurred, 3=classical) Birth weight (real – normalised) Age (in years); Weight (real - normalised); Number of lambs born to the sheep in the spring prior to summer census (0, 1, 2); And in the spring following the census (0, 1, 2). (ii) Environmental covariates: NAO index; population size; March rainfall; Autumn rainfall and March temperature.

Survival Rates -   Let the set of environmental covariate values at time t be denoted by x t.  The survival rate for animal i, of age a, at time t, is given by, logit  i,a,t =  a +  a T x t +  a T y i +  a T z i,t +  a,t  Here, y i denotes the set of time-independent covariates and z i,t the time varying covariates;   a,t ~ N(0, a 2 ) denotes a random effect.  There arise two issues here: missing covariate values and model choice.

Issue 1: Missing Data  The capture histories and time-invariant covariate values data are presented in the form:  Note that there are also many missing values for the time-dependent weight covariate. Capture history Covariate values Missing values

Problem  Given the set of covariate values, the corresponding survival rate can be obtained, and hence the likelihood calculated.  However, a complexity arises when there are unknown (i.e. missing) covariate values, removing the simple and explicit expression for the likelihood.

Missing Data: Classical Approaches  Typical classical approaches to missing data problems include: Ignoring individuals with missing covariate values; EM algorithm (can be difficult to implement and computationally expensive); Imputation of missing values using some underlying model (e.g. Gompertz curve). Conditional approach for time-varying covariates (Catchpole, Morgan and Tavecchia, 2006 – in submission)  We consider a Bayesian approach, where we assume an underlying model for the missing data which allows us to account for the corresponding uncertainty of the missing values.

Bayesian Approach  Suppose that we wish to make inference on the parameters and the data observed corresponds to capture histories, c, and covariate values, v obs.  Then, Bayes’ theorem states: (|c, v obs ) Ç L(c, v obs | ) p()  The posterior distribution is very complex and so we use Markov chain Monte Carlo (MCMC) to obtain estimates of the posterior statistics of interest.  However, in our case the likelihood is analytically intractable, due to the missing covariate values.

Auxiliary Variables  We treat the missing covariate values (v mis ) as parameters or auxiliary variables (AVs).  We then form the joint posterior distribution over the parameters, , and AVs, given the capture histories c and observed covariate values y obs : (, v mis | c, v obs ) Ç L(c | , v mis, v obs ) £ f(v mis, v obs | ) p()  We can now sample from the joint posterior distribution: (, v mis | c, v obs ).  We can integrate out over the missing covariate values, v mis, within the MCMC algorithm to obtain a sample from ( | c, v obs ). We can calculate the likelihood of the capture histories given all the covariate values (observed and imputed missing values). The underlying model for the covariate values. Prior on parameters

f(v mis, v obs |  ) – Categorical Data  For categorical data (coat type and horn type), we assume the following model.  Let y 1,i denote the horn type of individual i. Then, y 1,i 2 {1,2,3}, and we assume that, y 1,i ~ Multinomial(1,q), where q = {q 1, q 2, q 3 }.  Thus, we have additional parameters, q, which can be regarded as the underlying probability of each horn type.  The q’s are updated within the MCMC algorithm, as well as the y 1,i ’s which are unknown.  We assume the analogous model for coat type.

f(v mis, v obs |  ) – Continuous Data  Let y 3,i denote the birth weight of individual i.  Then, a possible model is to assume that, y 3,i ~ N(,  2 ), where  and  2 are to be estimated.  For the weight of individual i, aged a at time t, denoted by z 1,i,t we set, z 1,i,t ~ N(z 1,i,t-1 +  a +  t,  w 2 ), where the parameters  a,  t and  w 2 are to be estimated.  In general, the modelling assumptions will depend on the system under study.

Practical Implications  Within each step of the MCMC algorithm, we not only need to update the parameter values , but also impute the missing covariate values and random effects (if present).  This can be computationally expensive for large amounts of missing data.  The posterior results may depend on the underlying model for the covariates– a sensitivity analysis can be performed using different underlying models.  (State-space modelling can also be implemented using similar ideas – see Steve’s talk).

Issue 2: Model Selection  For the sheep data set we can now deal with the issue of missing covariate values.  But….. how do we decide which covariates to use and/or the age structure? – often there may be a large number of possible covariates and/or age structures.  Discriminating between different models can often be of particular interest, since they represent competing biological hypotheses.  Model choice can also be important for future predictions of the system.

Possible Models  We only want to consider biologically plausible models.  We have uncertainty on the age structure of the survival rates, and consider models of the form:  1:a ;  a+1;b ; …;  j+  k is unknown a priori and also the covariate dependence for each age group.  We consider similar age-type models for p and with possible arbitrary time dependence.  E.g.  1 (N,BW),  2:7 (W,L+), 8+ (N,P)/p(t)/ 1, 2+ (t)  The number of possible models is immense!! k groups

Bayesian Approach  We treat the model itself to be an unknown parameter to be estimated.  Then, applying Bayes’ Theorem we obtain the posterior distribution over both parameter and model space: ( m, m | data) Ç L(data |  m, m) p( m ) p(m).  Here  m denotes the parameters in model m. Likelihood Prior on parameters in model m Prior on model m

Reversible Jump MCMC  The reversible jump (RJ)MCMC algorithm allows us to construct a Markov chain with stationary distribution equal to the posterior distribution.  It is simply an extension of the Metropolis- Hastings algorithm that allows moves between different dimensions.  This algorithm is needed because the number of parameters,  m, in model m, may differ between models.  Note that this algorithm needs only one Markov chain, regardless of the number of models.

Posterior Model Probabilities  Posterior model probabilities are able to quantitatively compare different models.  The posterior probability of model m is defined as,  ( m| data) = s ( m,m| data) d m.  These are simply estimated within the RJMCMC algorithm as the proportion of the time the chain is in the given model.  We are also able to obtain model-averaged estimates of parameters, which takes into account both parameter and model uncertainty.

General Comments  The RJMCMC algorithm is the most widely used algorithm to explore and summarise a posterior distribution defined jointly over parameter and model space.  The posterior model probabilities can be sensitive to the priors specified on the parameters (p()).  The acceptance probabilities for reversible jump moves are typically lower than MH updates.  Longer simulations are generally needed to explore the posterior distribution.  Only a single Markov chain is necessary, irrespective of the number of possible models!!

Problem 1  Constructing efficient RJ moves can be difficult.  This is particularly the case when updating the number of age groups for the survival rates.  This step involves: Proposing a new age structure (typically add/remove a change-point); Proposing a covariate dependence for the new age structure; Proposing new parameter values for this new model.  It can be very difficult to construct the Markov chain with (reasonably) high acceptance rates.

Example: Reversibility Problem  One obvious (and tried!) method for adding a change-point would be as follows.  Suppose that we propose to split age group a:c.  We propose new age groups – a:b and b+1:c.  Consider a small perturbation for each (non-zero) regression parameter, e.g., ’ a:b =  a:c + ;and’ b+1:c =  a:c - . where  ~ N(0,  2 ).  However, to satisfy the reversibility constraints, a change-point can only be removed when the covariate dependence structure is the same for two consecutive age groups…

Example: High Posterior Mass  An alternative proposal would be to set ’ a:b =  a:c (i.e. keep all parameters same for a:b).  Propose a model (in terms of covariates) for ’ b+1:c : Choose each possible model with equal probability (reverse move always possible) Choose model that differs by at most one individual and one environmental covariate.  The problem now lies in proposing “sensible” parameter values for the new model.  One approach is to use posterior estimates of the parameters from a “saturated” model (full covariate dependence for some age structure) as the mean of the proposal distribtion.

Problem 2  Consider the missing covariates v mis.  Then, if the covariate is present, we have,  ( v mis | v obs, , data) / L(data | , v obs, v mis ) £ f(v mis, v obs | ).  However, if the covariate is not present in the model, we have,  ( v mis | v obs, , data) / f(v mis, v obs | ).  Thus, adding (or removing) the covariate values may be difficult, due to (potentially) fairly different posterior conditional distributions.  One way to avoid this is to simultaneously update the missing covariate values in the model move.

Soay Sheep Analysis  We now use these techniques for analysing the (complex) Soay sheep data.  Discussion with experts identified several points of particular interest: the age-dependence of the parameters; identification of the covariates influencing the survival rates; whether random effects are present.  We focus on these issues on the results presented.

Prior Specification  We place vague priors on the parameters present in each model.  Priors also need to be specified on the models.  Placing an equal prior on each model places a high prior mass on models with a large number of age groups, since the number of models increases with the number of age groups.  Thus, we specify an equal prior probability on each marginal age structure and a flat prior over the covariate dependence given the age structure; or on time-dependence.

Results: Survival Rates  The marginal models for the age groups with largest posterior support are:  Note that with probability 1, lambs have a distinct survival rate.  Often the models with most posterior support are close neighbours of each other. Age-structurePosterior probability  1 ;  2:7 ;   1 ;  2:7 ;  8:9 ; 

Covariate Dependence BF = 3 BF = 20 BF = Bayes factor

Influence of Covariates Weight Age 1Age 10+ Population size

Marginalisations  Presenting only marginal results can hide some of the more intricate details.  This is most clearly seen from another MRR data analysis relating to the UK lapwing population.  There are two covariates – time and “fdays” – a measure of the harshness of the winter.  Without going into too many details, we had MRR data and survey data (estimates of total population size).  An integrated analysis was performed, jointly analysing both data sets.

Marginal Models  The marginal models with most posterior support for the demographic parameters are: (a)  1 – 1 st year survival (b)  a – adult survival ModelPost. prob. ModelPost. prob.  1 (fdays)0.636 a (fdays,t)0.522   a (fdays)0.407 (c)  – productivity (d) – recovery rate ModelPost. prob. ModelPost. prob. 0.497(t)0.730 (t)0.393(fdays,t)0.270 This is joint work with Steve Brooks, Chiara Mazzetta and Steve Freeman

Warning!  The previous (marginal) posterior estimates hides some of the intricate details.  The marginal models of the adult survival rate and productivity rate are highly correlated.  In particular, the joint posterior probabilities for these parameters are: Model Post. prob.  a (fdays,t), 0.45  a (fdays), (t)0.36  Thus, there is strong evidence that either adult survival rate OR productivity rate is time dependent.

Summary  Bayesian techniques are widely used, and allow complex data analyses.  Covariates can be used to explain both temporal and individual heterogeneity.  However, missing values can add another layer of complexity and the need to make additional assumptions.  The RJMCMC algorithm can explore the possible models and discriminate between competing biological hypotheses.  The analysis of the Soay sheep has stimulated new discussion with biologists, in terms of the factors that affect their survival rates.

Further Research  The development of efficient and generic RJMCMC algorithms.  Assessing the posterior sensitivity on the model specification for the covariates.  Investigation of the missing at random assumption for the covariates in both classical and Bayesian frameworks.  Prediction in the presence of time-varying covariate information.