Multilevel survival models A paper presented to celebrate Murray Aitkin’s 70 th birthday Harvey Goldstein ( also 70 ) Centre for Multilevel Modelling University.

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Multilevel Multiprocess Models for Partnership and Childbearing Event Histories Fiona Steele, Constantinos Kallis, Harvey Goldstein and Heather Joshi Institute.
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and.
Missing data – issues and extensions For multilevel data we need to impute missing data for variables defined at higher levels We need to have a valid.
Non response and missing data in longitudinal surveys.
The Relationship between Childbearing and Transitions from Marriage and Cohabitation in Britain Fiona Steele 1, Constantinos Kallis 2, Harvey Goldstein.
Multilevel modelling short course
Multilevel Event History Analysis of the Formation and Outcomes of Cohabiting and Marital Partnerships Fiona Steele Centre for Multilevel Modelling University.
What is Event History Analysis?
Multilevel Multivariate Models with responses at several levels Harvey Goldstein Centre for Multilevel Modelling University of Bristol.
Multilevel Multiprocess Models for Partnership and Childbearing Event Histories Fiona Steele, Constantinos Kallis, Harvey Goldstein and Heather Joshi Institute.
Multilevel Event History Modelling of Birth Intervals
What is Event History Analysis?
Multilevel Event History Models with Applications to the Analysis of Recurrent Employment Transitions Fiona Steele.
Latent normal models for missing data Harvey Goldstein Centre for Multilevel Modelling University of Bristol.
Efficient modelling of record linked data A missing data perspective Harvey Goldstein Record Linkage Methodology Research Group Institute of Child Health.
Efficient modelling of record linked data A missing data perspective Harvey Goldstein Record Linkage Methodology Research Group Institute of Child Health.
Models with Discrete Dependent Variables
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
How Long Until …? Given a strike, how long will it last?
Discrete-time Event History Analysis Fiona Steele Centre for Multilevel Modelling Institute of Education.
Simulation.
Modeling clustered survival data The different approaches.
Generalized Linear Models
Analysis of Complex Survey Data
Lecture 16 Duration analysis: Survivor and hazard function estimation
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
1 1. Observations and random experiments Observations are viewed as outcomes of a random experiment.
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
Lecture 14-1 (Wooldridge Ch 17) Linear probability, Probit, and
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
G Lecture 121 Analysis of Time to Event Survival Analysis Language Example of time to high anxiety Discrete survival analysis through logistic regression.
Modelling non-independent random effects in multilevel models William Browne Harvey Goldstein University of Bristol.
A meeting to celebrate Murray Aitkin’s 70 th Birthday.
Lecture 8: Generalized Linear Models for Longitudinal Data.
Sep 2005:LDA - ONS1 Event history data structures and data management Paul Lambert Stirling University Prepared for “Longitudinal Data Analysis for Social.
1 Multiple Imputation : Handling Interactions Michael Spratt.
HSRP 734: Advanced Statistical Methods June 19, 2008.
Bayesian Analysis and Applications of A Cure Rate Model.
Modelling non-independent random effects in multilevel models Harvey Goldstein and William Browne University of Bristol NCRM LEMMA 3.
“Further Modeling Issues in Event History Analysis by Robert E. Wright University of Strathclyde, CEPR-London, IZA-Bonn and Scotecon.
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Data structure for a discrete-time event history analysis Jane E. Miller, PhD.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Modelling Longitudinal Data Survival Analysis. Event History. Recurrent Events. A Final Point – and link to Multilevel Models (perhaps).
HSRP 734: Advanced Statistical Methods July 17, 2008.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Discrete Choice Modeling William Greene Stern School of Business New York University.
The dynamics of poverty in Ethiopia : persistence, state dependence and transitory shocks By Abebe Shimeles, PHD.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Survival Analysis/Event History Analysis:
More complex event history analysis. Start of Study End of Study 0 t1 0 = Unemployed; 1 = Working UNEMPLOYMENT AND RETURNING TO WORK STUDY Spell or Episode.
Generalized Linear Models (GLMs) and Their Applications.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Multilevel and multifrailty models. Overview  Multifrailty versus multilevel Only one cluster, two frailties in cluster e.g., prognostic index (PI) analysis,
1/53: Topic 3.1 – Models for Ordered Choices Microeconometric Modeling William Greene Stern School of Business New York University New York NY USA William.
6. Ordered Choice Models. Ordered Choices Ordered Discrete Outcomes E.g.: Taste test, credit rating, course grade, preference scale Underlying random.
[Part 5] 1/43 Discrete Choice Modeling Ordered Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI.
SECTION 1 TEST OF A SINGLE PROPORTION
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 8.1: Cohort sampling for the Cox model.
M.Sc. in Economics Econometrics Module I
William Greene Stern School of Business New York University
CHAPTER 18 SURVIVAL ANALYSIS Damodar Gujarati
Introduction to logistic regression a.k.a. Varbrul
How to handle missing data values
Non response and missing data in longitudinal surveys
Presentation transcript:

Multilevel survival models A paper presented to celebrate Murray Aitkin’s 70 th birthday Harvey Goldstein ( also 70 ) Centre for Multilevel Modelling University of Bristol

Time to event (survival) models Murray has contributed : Aitkin, M. and Clayton, D. (1980) The fitting of exponential, Weibull and extreme value distributions to complex censored survival data using GLIM. Appl. Statist. 29, Journal of the Royal Statistical Society, Series A 1986; 149: 1-43 Murray A. Aitkin, Brian Francis, John Hinde (2005). Statistical Modelling in GLIM4 – Chap 6. survival models. Basic notions - consider employment duration (u): The proportion of the workforce employed for periods greater than t is the survivor function with risk of unemployment in next unit interval given survival to t – the hazard

The traditional grouped discrete time hazard model Suppose time is grouped into pre-assigned categories: if the survivor function at start of time interval t is then the probability of death and the hazard are Thus the basic data consists of one record for each time interval for each individual (within each higher level unit for a multilevel structure) with the response being a binary indicator of failure for each interval. The estimation follows that for the binary response model, e.g with a logit or probit link function. This formulation is very flexible, it can be extended to competing risks (multinomial response), allows time-varying covariates, automatically handles right censored data and easily extends to incorporate random effects in multilevel data structures. Can be fitted with existing software.

A repeated measures discrete time data structure

A GLM for the grouped discrete time model The hazard is where k indexes individual, j indexes episode (of a partnership) and i indexes the state (partnership, non-partnership) - modelled by dummy variables. We can use a ‘standard’ model, e.g. where z indexes the modelled interval at discrete time t using a p-order polynomial (typically p<5) to describe the baseline hazard. v is between-individual random effect, u is within-individual between-episode random effect (extra-binomial frailty) The downside is that this requires data expansion and can result in very large files. So: staying with grouped discrete time data we consider another formulation

An ordered categorical model For time at death t write cumulative probability as the standard normal integral Discretise the time scale, as before, by defining cut points and consider the cumulative distribution This thus defines the ordered probit model where represents the effect of any covariates, and is the probability that an event occurs in time interval. We shall discuss how to model the threshold parameters. Note that the hazard for time interval is.

Advantages of the ordered probit model First proposed by McCullagh (1980) – used by others e.g. Hedeker et al (2000). Does not require data expansion Generalises to multivariate and multilevel case ( e.g. repeated episodes within individuals) easily Handles any kind of censoring/missing data The threshold parameters correspond to the cut points and we require that they are strictly ordered. We can set, if we assume that the intercept is incorporated in. We generalise to the 2-level case by adding random effects with further levels or classifications similarly specified. This is a ‘latent normal’ model and can be combined with other responses, normal and categorical, and levels in a general multivariate multilevel framework. (Goldstein, H., Carpenter, J., Kenward, M. and Levin, K. (2009). Multilevel Models with multivariate mixed response types. Statistical Modelling. 9(3): ) An MCMC algorithm has been developed. This involves sampling from posterior distributions for parameters + sampling from the latent normal given observed category. Missing data can be handled by multiple imputation if covariates are missing.

Censoring Right censored data after time h simply involves a random draw from the standard normal in Interval censored data likewise involves a random draw from the corresponding normal interval Left censored data time h or earlier involves random draw from

Estimating threshold parameters We require strict monotonicity so consider: for q time varying explanatory variables. Guarantees parameters are strictly increasing: MH algorithm used in MCMC step. This allows time-varying covariates to contribute cumulatively to the parameter value. Alternatively they can contribute according to current mean: Other link functions for such as logistic are possible and the baseline hazard can be a smooth function of time, rather than a step function.

Example – partnership durations Data are based upon partnership histories of female respondents in the National Child Development Study collected retrospectively at 33 and 42 years. A full description is given in Steele et al. (2005). The present analysis uses a subset of the data and explanatory variables. Six month intervals

Partnership durations Negative values are right censored observations.

Fitted model (omitting threshold estimates)

Interpretation - Model B The overall effect of having a young child at the start of a partnership is to increase the value on the latent normal scale (since it is a covariate belonging to X) and hence to and hence to increase the probability that the partnership will end for each given interval, i.e. decrease the overall probability of remaining in a partnership for all time periods, presumably reflecting the characteristics of a partnership that starts with an existing younger child. It could also reflect unobserved characteristics of women who have children from a previous relationship. Given the presence (or absence) of a young child at the start of the partnership, the effect at separation of the current average of the younger child variable, in effect the proportion of times over the period that there is a younger child present, is to multiply that threshold parameter (additive) contribution (compared to no younger children during the time period) by the mean multiplied by = 0.30, so that 0.30 is the multiplier when a younger child is always present. This therefore leads to a decrease on the latent normal scale and hence to increase the probability of remaining in a partnership. This suggests that the arrival of a young child during a partnership tends to prolong the partnership as opposed to the effect of starting the partnership with a young child.

Interpretation - Model A In model A we see that the fixed part contribution for a younger child is greater and the time-dependent effect is larger with a multiplying factor of =0.73. The effect is thus to multiply the cumulative base threshold by 0.73 which provides perhaps a more straightforward interpretation than model B. References Steele, F., Kallis, C., Goldstein, H. and Joshi, H. (2005). The Relationship between Childbearing and Transitions from Marriage and Cohabitation in Britain. Demography 42: Goldstein, H (2010). A general model for the analysis of multilevel discrete time survival data. (Submitted for publication).

The models I have used can all be traced back to work carried out by Murray Aitkin over the course of a long and distinguished career. We owe him a great debt of gratitude