School of Mathematical Sciences, University of Nottingham.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.
Using complex random effect models in epidemiology and ecology
Random effect modelling of great tit nesting behaviour William Browne (University of Nottingham) In collaboration with Richard Pettifor (Institute of Zoology,
Simple methods to improve MCMC efficiency in random effect models William Browne*, Mousa Golalizadeh*, Martin Green and Fiona Steele Universities of Bristol.
Random effect modelling of great tit nesting behaviour
Random effect modelling of great tit nesting behaviour William Browne (University of Bristol) In collaboration with Richard Pettifor (Institute of Zoology,
Missing data – issues and extensions For multilevel data we need to impute missing data for variables defined at higher levels We need to have a valid.
MCMC for Poisson response models
Generalised linear mixed models in WinBUGS
Lecture 9 Model Comparison using MCMC and further models.
Introduction to Monte Carlo Markov chain (MCMC) methods
Other MCMC features in MLwiN and the MLwiN->WinBUGS interface
Lecture 23 Spatial Modelling 2 : Multiple membership and CAR models for spatial data.
MCMC estimation in MlwiN
Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
Latent normal models for missing data Harvey Goldstein Centre for Multilevel Modelling University of Bristol.
School of Mathematical Sciences, University of Nottingham.
Bayesian Estimation in MARK
Markov-Chain Monte Carlo
CHAPTER 16 MARKOV CHAIN MONTE CARLO
MCMC efficiency in Multilevel Models
Bayesian statistics – MCMC techniques
Visual Recognition Tutorial
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,
Mixture Modeling Chongming Yang Research Support Center FHSS College.
Introduction to Multilevel Modeling Using SPSS
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Bayes Factor Based on Han and Carlin (2001, JASA).
Modelling non-independent random effects in multilevel models William Browne Harvey Goldstein University of Bristol.
Priors, Normal Models, Computing Posteriors
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Incorporating heterogeneity in meta-analyses: A case study Liz Stojanovski University of Newcastle Presentation at IBS Taupo, New Zealand, 2009.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Modelling non-independent random effects in multilevel models Harvey Goldstein and William Browne University of Bristol NCRM LEMMA 3.
Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.
A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models Russell Almond Florida State University College of Education Educational Psychology.
Multilevel Modeling Software Wayne Osgood Crime, Law & Justice Program Department of Sociology.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Multilevel and multifrailty models. Overview  Multifrailty versus multilevel Only one cluster, two frailties in cluster e.g., prognostic index (PI) analysis,
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Introduction to Multilevel Analysis Presented by Vijay Pillai.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
Hierarchical Models. Conceptual: What are we talking about? – What makes a statistical model hierarchical? – How does that fit into population analysis?
Multilevel modelling: general ideas and uses
Bayesian Semi-Parametric Multiple Shrinkage
Bayesian Generalized Product Partition Model
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
How to handle missing data values
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Predictive distributions
Mathematical Foundations of BME Reza Shadmehr
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Ch13 Empirical Methods.
EVENT PROJECTION Minzhao Liu, 2018
Non response and missing data in longitudinal surveys
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

School of Mathematical Sciences, University of Nottingham. An illustration of the use of reparameterisation methods for improving MCMC efficiency in crossed random effect models. William J. Browne School of Mathematical Sciences, University of Nottingham.

Talk Summary Wytham woods great tit dataset. Cross-classified random effect models. Hierarchical centering. Parameter expansion. Combining methods. Future work.

Wytham woods great tit dataset A longitudinal study of great tits nesting in Wytham Woods, Oxfordshire. 6 responses : 3 continuous & 3 binary Clutch size, lay date and mean nestling mass Nest success, male and female survival Here focus on clutch size, see Browne et al. (2004) for MV modelling of 6 responses. Data: 4165 attempts over a period of 34 years. There are 4 higher-level classifications of the data: female parent, male parent, nestbox and year

Data background The data can be summarised as follows: Source Number of IDs Median #obs Mean Year 34 104 122.5 Nestbox 968 4 4.30 Male parent 2986 1 1.39 Female parent 2944 1.41 Note there is very little information on each individual male and female bird but we can get some estimates of variability via a random effects model.

Cross-classified modelling: Algorithms Some algorithms e.g. IGLS in MLwiN specifically designed for nested structures. MCMC algorithms work as well for crossed as nested structures. MCMC algorithms for cross-classified models available in MLwiN and WinBUGS. Other packages e.g. Genstat have maximum likelihood methods designed specifically for cross-classified structures.

Cross-classified model We aim to fit the following model to the clutch size response. Here we have a fixed effect that picks out average clutch size and random effects for female, male, nestbox and year. Here we use notation from Browne et al. (2001). Note we need prior distributions for all parameters as indicated to use MCMC estimation. We choose standard conjugate ‘diffuse’ priors.

Results The cross-classified model was fitted using both MLwiN and WinBUGS. Each program was run for a burnin of 5,000 iterations followed by a main run of 50,000. In the following we give firstly point and interval estimates for all variances and the average clutch size parameter. Then we give for each parameter effective sample sizes (ESS) which give an estimate of how many ‘independent samples we have’ for each parameter as opposed to 50,000 dependent samples. ESS = # of iterations/,

Point and Interval Estimates Here we see similar estimates for the two packages as would be expected. Parameter MLwiN WinBUGS Fixed Effect 8.805 (8.589,9.025) 8.810 (8.593,9.021) Year 0.365 (0.215,0.611) 0.365 (0.216,0.606) Nestbox 0.107 (0.059,0.158) 0.108 (0.060,0.161) Male 0.045 (0.001,0.166) 0.034 (0.002,0.126) Female 0.975 (0.854,1.101) 0.976 (0.858,1.097) Observation 1.064 (0.952,1.173) 1.073 (0.968,1.175)

Effective Sample sizes The effective sample sizes are similar for both packages. Note that MLwiN is 5 times quicker!! Parameter MLwiN WinBUGS Fixed Effect 671 602 Year 30632 29604 Nestbox 833 788 Male 36 33 Female 3098 3685 Observation 110 135 Time 519s 2601s We will now consider methods that will improve the ESS values for particular parameters. We will firstly consider the fixed effect parameter.

Trace and autocorrelation plots for fixed effect using standard Gibbs sampling algorithm

Hierarchical Centering This method was devised by Gelfand et al. (1995) for use in nested models. Basically (where feasible) parameters are moved up the hierarchy in a model reformulation. For example: is equivalent to The motivation here is we remove the strong negative correlation between the fixed and random effects by reformulation.

Hierarchical Centering In our cross-classified model we have 4 possible hierarchies up which we can move parameters. We have chosen to move the fixed effect up the year hierarchy as it’s variance had biggest ESS although this choice is rather arbitrary. The ESS for the fixed effect increases 50-fold from 602 to 35,063 while for the year level variance we have a smaller improvement from 29,604 to 34,626. Note this formulation also runs faster 1864s vs 2601s (in WinBUGS).

Trace and autocorrelation plots for fixed effect using hierarchical centering formulation

Parameter Expansion We next consider the variances and in particular the between-male bird variance. When the posterior distribution of a variance parameter has some mass near zero this can hamper the mixing of the chains for both the variance parameter and the associated random effects. The pictures over the page illustrate such poor mixing. One solution is parameter expansion (Liu et al. 1998). In this method we add an extra parameter to the model to improve mixing.

Trace plots for between males variance and a sample male effect using standard Gibbs sampling algorithm

Autocorrelation plot for male variance and a sample male effect using standard Gibbs sampling algorithm

Parameter Expansion In our example we use parameter expansion for all 4 hierarchies. Note the  parameters have an impact on both the random effects and their variance. The original parameters can be found by: Note the models are not identical as we now have different prior distributions for the variances.

Parameter Expansion For the between males variance we have a 20-fold increase in ESS from 33 to 600. The parameter expanded model has different prior distributions for the variances although these priors are still ‘diffuse’. It should be noted that the point and interval estimate of the level 2 variance has changed from 0.034 (0.002,0.126) to 0.064 (0.000,0.172). Parameter expansion is computationally slower 3662s vs 2601s for our example.

Trace plots for between males variance and a sample male effect using parameter expansion.

Autocorrelation plot for male variance and a sample male effect using parameter expansion.

Combining the two methods Hierarchical centering and parameter expansion can easily be combined in the same model. Here we perform centering on the year classification and parameter expansion on the other 3 hierarchies.

Effective Sample sizes As we can see below the effective sample sizes for all parameters are improved for this formulation while running time remains approximately the same. Parameter WinBUGS originally WinBUGS combined Fixed Effect 602 34296 Year 29604 34817 Nestbox 788 5170 Male 33 557 Female 3685 8580 Observation 135 1431 Time 2601s 2526s

Conclusions Both hierarchical centering and parameter expansion are useful methods for improving mixing when using MCMC. Both methods are simple to implement in the WinBUGS package. They can be easily combined to produce an efficient MCMC algorithm. The change in prior distributions brought about by parameter expansion needs some further research, particularly when used with poorly identified variance parameters where the choice of ‘diffuse’ prior may be having a large effect on the posterior.

Further Work Incorporating hierarchical centering and parameter expansion in MLwiN. Investigating their use in conjunction with the Metropolis-Hastings algorithm. Extending the methods to our original multivariate response problem.

References Browne, W.J. (2004). An illustration of the use of reparameterisation methods for improving MCMC efficiency in crossed random effect models. Multilevel Modelling Newsletter 16 (1): 13-25 Browne, W.J., Goldstein, H. and Rasbash, J. (2001). Multiple membership multiple classification (MMMC) models. Statistical Modelling 1: 103-124. Browne, W.J., McCleery, R.H., Sheldon, B.C., and Pettifor, R.A. (2004). Using cross-classified multivariate mixed response models with application to the reproductive success of great tits (Parus Major). Nottingham Statistics Research report 03-18. Gelfand A.E., Sahu S.K., and Carlin B.P. (1995). Efficient Parametrizations For Normal Linear Mixed Models. Biometrika 82 (3): 479-488. Liu, C., Rubin, D.B., and Wu, Y.N. (1998) Parameter expansion to accelerate EM: The PX-EM algorithm. Biometrika 85 (4): 755-770.