Generalised linear mixed models in WinBUGS

Slides:

Advertisements

Similar presentations

Missing data – issues and extensions For multilevel data we need to impute missing data for variables defined at higher levels We need to have a valid.

Advertisements

MCMC for Poisson response models

Lecture 9 Model Comparison using MCMC and further models.

Introduction to Monte Carlo Markov chain (MCMC) methods

Other MCMC features in MLwiN and the MLwiN->WinBUGS interface

Lecture 23 Spatial Modelling 2 : Multiple membership and CAR models for spatial data.

MCMC estimation in MlwiN

MCMC for multilevel logistic regressions in MLwiN

Contrastive Divergence Learning

9. Heterogeneity: Mixed Models. RANDOM PARAMETER MODELS.

Linear Regression.

Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.

Chapter 4: Linear Models for Classification

Models with Discrete Dependent Variables

CHAPTER 16 MARKOV CHAIN MONTE CARLO

BAYESIAN INFERENCE Sampling techniques

Visual Recognition Tutorial

Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.

Making rating curves - the Bayesian approach. Rating curves – what is wanted? A best estimate of the relationship between stage and discharge at a given.

Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.

Machine Learning CUNY Graduate Center Lecture 7b: Sampling.

FIN357 Li1 Binary Dependent Variables Chapter 12 P(y = 1|x) = G(  0 + x  )

Expectation Maximization Algorithm

End of Chapter 8 Neil Weisenfeld March 28, 2005.

Visual Recognition Tutorial

Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.

Linear and generalised linear models

Introduction to Monte Carlo Methods D.J.C. Mackay.

Bayes Factor Based on Han and Carlin (2001, JASA).

1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.

Dr. Richard Young Optronic Laboratories, Inc..  Uncertainty budgets are a growing requirement of measurements.  Multiple measurements are generally.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Module 1: Statistical Issues in Micro simulation Paul Sousa.

9-1 MGMG 522 : Session #9 Binary Regression (Ch. 13)

Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.

STA 216 Generalized Linear Models Meets: 2:50-4:05 T/TH (Old Chem 025) Instructor: David Dunson 219A Old Chemistry, Teaching.

Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.

MCMC in practice Start collecting samples after the Markov chain has “mixed”. How do you know if a chain has mixed or not? In general, you can never “proof”

An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.

Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,

A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.

Stick-breaking Construction for the Indian Buffet Process Duke University Machine Learning Group Presented by Kai Ni July 27, 2007 Yee Whye The, Dilan.

Introduction to Sampling Methods Qi Zhao Oct.27,2004.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.

Introduction: Metropolis-Hasting Sampler Purpose--To draw samples from a probability distribution There are three steps 1Propose a move from x to y 2Accept.

Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.

Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.

Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:

Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.

Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.

Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.

Markov Chain Monte Carlo in R

Estimating standard error using bootstrap

MCMC Output & Metropolis-Hastings Algorithm Part I

Linear and generalized linear mixed effects models

STA 216 Generalized Linear Models

Jun Liu Department of Statistics Stanford University

Omiros Papaspiliopoulos and Gareth O. Roberts

Statistical Learning Dong Liu Dept. EEIS, USTC.

School of Mathematical Sciences, University of Nottingham.

Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.

Bayesian Linear Regression

STA 216 Generalized Linear Models

Predictive distributions

دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry

Bayesian Linear Regression

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.

Presentation transcript:

Generalised linear mixed models in WinBUGS Lecture 15 Generalised linear mixed models in WinBUGS

Lecture Contents Differences in WinBUGS methods Rejection sampling algorithm Adaptive rejection (AR) sampling Reunion Island dataset Hierarchical centering Probit formulation

Logistic regression model (recap) A standard Bayesian logistic regression model (e.g. for the rat tumour example) can be written as follows: Both MLwiN and WinBUGS can fit this model but can we write out the conditional posterior distributions and use Gibbs Sampling?

Conditional distribution for β0 This distribution is not a standard distribution and so we cannot simply simulate from a standard random number generator. However both WinBUGS and MLwiN can fit this model using MCMC. We will in this lecture describe how WinBUGS deals with this problem.

Why not use Gibbs? To use Gibbs we would need to generate samples from the distribution Although this is not a standard distribution and so there is no easily available function to generate directly from this distribution other methods exist. The method that WinBUGS uses is called adaptive rejection (AR) sampling and can be used to sample from any log concave distribution function. Before going on to AR sampling we will consider rejection sampling in general.

Rejection Sampling Assume we wish to generate from a distribution function f(x) but it is difficult to generate from this function directly. However there exists a distribution function g(x) that is easy to sample from and f(x) ≤ M g(x) for all x. Note that this is often easier for distributions with a bounded range but can also be used for unbounded distributions. Then we sample from g(x) and accept the sampled value with probability Mg(x)/f(x). If we reject we sample again until we accept a value. Note the algorithm works best if g(x) is close to f(x) and M is small.

Truncated Normal example Let us assume that we wish to generate random variables from a truncated standard normal truncated at -2 and +2 The standard method would be to generate from a Normal(0,1) and reject values outside the range (-2,2) An alternative would be to generate from a (scaled) Uniform distribution on (2,-2) as shown. The probability of acceptance is around 0.95*sqrt(2π)/4=0.595 5000 random draws results in acceptance rate of .606

Adapted Rejection Sampling AR sampling (Gilks and Wild 1992) takes rejection sampling a step further by clever construction of the function g(x) that is used. The algorithm works for all log concave functions f(x). A few points in f(x) are chosen and the tangent to log f(x) is constructed at each point. These tangents are joined to form a piecewise linear function g(x) that is an envelope for log f(x). When a point is chosen from g(x) it is accepted/rejected using rejection sampling. Also however the tangent at the point is evaluated and the envelope function is updated to be a closer approximation to log f(x).

An illustration of AR sampling

WinBUGS 1.4 New Method In fact for the models we consider WinBUGS 1.4 no longer uses AR sampling for fixed effects.  Instead it uses a method by Gamerman(1997). This is essentially a Metropolis-Hastings algorithm where at each iteration the MV normal proposal distribution is formed by performing one iteration, starting at the current point, of Iterative Weighted Least Squares (IWLS). A routine that is similar to the IGLS estimation method used in MLwiN for MQL/PQL estimates. To see the AR sampler in action you will need to use WinBUGS 1.3 or go to blocking options under the Options/Blocking options screen and remove the tick box. (You can try this in the practical if you wish).

Reunion Island 2 level dataset We will here consider the 2 level reunion dataset as considered in MLwiN and the final model:

WinBUGS code for the model { # Level 1 definition for(i in 1:N) { fscr[i] ~ dbin(p[i],denom[i]) logit(p[i]) <- beta[1] * cons[i] + beta[2] * ai[i] + u2[herd[i]] * cons[i] } # Higher level definitions for (j in 1:n2) { u2[j] ~ dnorm(0,tau.u2) # Priors for fixed effects for (k in 1:2) { beta[k] ~ dflat() } # Priors for random terms tau.u2 ~dgamma(0.001000,0.001000) sigma2.u2 <- 1/tau.u2 Here we see the code for a 2-level logistic regression model. Note the use of dbin for the binomial distribution and the logit link function. The file also contains initial values from the 1st order MQL run of this model and data values. We are using Gamma(ε,ε) priors.

WinBUGS node info On the info menu you can select node info. This will inform you (in a funny language) what method is being used for each node. For this model we get: beta[1] UpdaterGLM.LogitUpdater – means the Gamerman method. u2[1] UpdaterRejection.Logit – means the AR method. (I think!) tau.u2 UpdaterGamma.Updater – means conjugate Gibbs sampling.

WinBUGS chains The model was run for 5,000 iterations after a burnin of 500 which took just over 2 minutes. The chains all look reasonable.

Point Estimates Estimates are similar to MLwiN: Node Mean SD Beta[1] MLwin estimates after 50,000 iterations Node Mean SD Beta[1] 0.651 0.169 Beta[2] -1.113 0.174 Sigma2.u2 0.091 0.056

Reunion Island 3 level dataset We will here consider the 3 level reunion dataset as considered in MLwiN and the final model which gave estimation difficulties:

WinBUGS chains The model was run for 5000 iterations after a burnin of 500 which took 10 minutes! Although the intercept chain on the left looks good and better than MH, the cow level variance has clearly not converged as yet. So the WinBUGS method does not fix the problem.

Between cows variance after 50k iterations using WinBUGS

Hierarchical centering formulation This is a method for improving the mixing of an MCMC method. Consider the VC model This can be written equivalently as and the Gibbs sampling algorithm written using this parameterisation. This may improve mixing if j is less correlated with 0 than uj is. See Browne (2004) for models where this works well.

Hierarchical centering in WinBUGS model { # Level 1 definition for(i in 1:N) { fscr[i] ~ dbin(p[i],denom[i]) logit(p[i]) <- beta[2] * ai[i] + u2[cow[i]] } # Higher level definitions for (j in 1:n2) { u2[j] ~ dnorm(u3[herd[j]],tau.u2) for (j in 1:n3) { u3[j] ~ dnorm(beta[1],tau.u3) # Priors for fixed effects for (k in 1:2) { beta[k] ~ dflat() } # Priors for random terms tau.u2 ~ dgamma(0.001000,0.001000) sigma2.u2 <- 1/tau.u2 tau.u3 ~ dgamma(0.001000,0.001000) sigma2.u3 <- 1/tau.u3 It is easy to modify the code in WinBUGS to fit a hierarchically centred model as shown to the left. Note the main difficulty is for a 3 level model the mapping vector herd must now map from cows to herd rather than observations to herd. This means changing the datafile.

Trace plot for hierarchical centred formulation Unfortunately this doesn’t improve things much! The parameter expansion method also discussed in Browne (2004) may work better here and may be worth examining.

Estimates comparison In the following table we compare estimates after 50k (100k for MLwiN) iterations from 3 methods and see reasonable agreement. Parameter MLwiN WinBUGS H.C. β0 0.563 (0.125) 0.560 (0.127) 0.570 (0.131) β1 -1.014 (0.130) -1.007 (0.130) -1.021 (0.131) σ2v 0.094 (0.043) 0.096 (0.044) σ2u 0.229 (0.137) 0.202 (0.139) 0.231 (0.127)

Probit Regression The logistic link function is only one possible link function for Binomial data. Any functions that will map from (0,1) to the whole real line will do and another popular choice is the probit link (the inverse of the normal cdf). This link with the normal distribution can work to our advantage and allow another way of using Gibbs sampling for a binary data model. This is through the use of latent variables.

Latent variable approach One source of binary data is the thresholding of a continuous response, for example in education students often take exams and rather than a mark we observe whether the student passes or fails. The latent variable approach works like the reverse of this i.e. we see the binary response and from this we generate the underlying continuous variable.

Simple example Consider the random effects probit regression model: This model is equivalent to the following Normal response model: where yij* is unobserved but is restricted to positive values when y = 1 and negative values when y = 0.

Gibbs Sampling algorithm We then have four steps for our latent variable model: Generate yij* from its truncated Normal conditional distribution for all i and j. Generate  from its (multivariate) Normal conditional distribution. Generate uj from its Normal conditional distribution for each j. Generate 2u from its inverse Gamma conditional distribution.

Probit model for 2 level reunion island dataset (demo) Using Gibbs in MLwiN:

Trace plots for only 5k iterations

Probit for 3 level reunion island dataset Still not great but ESS upto 206!

Introduction to the Practical In the next practical you will have the chance to explore the possible methods from this lecture on the two datasets from the last practical: The Bangladesh contraceptive use dataset. The pig pneumonia dataset.