Statistical methods in longitudinal studies Jouko Miettunen, PhD Department of Psychiatry University of Oulu
2 Topics of this presentation n Logistic regression analysis n Survival analysis n Analysis of variance n Random regression analysis n Structural equation modeling n Latent class analysis n Imputing missing data
3 Logistic regression analysis (1) n Most common modeling method to analyze confounders in epidemiology, especially in longitudinal studies n Outcome variable should be dichotomized (no/yes, healthy/sick) n Exposure variables can be both dichotomized or continuous
4 Variables in logistic regression n Include sociodemographic variables, e.g. sex, social class n Include previous known risk factors u Especially if statistically significant in the model n Do not include too many variables u Depends on data size and distribution of variables n Do not include intercorrelating variables
5 Example data set Northern Finland 1966 Birth Cohort n Women who were living in the provinces of Oulu and Lapland and were due to deliver during 1966 n N = 12,058 live births n N = 10,934 living 1997 in Finland n Data on biological, socio-economic and health conditions collected prospectively from pregnancy up to the age of 35 years n Data from several registers and e.g. from large follow-ups at 14 and 31 years
6 Example question Northern Finland 1966 Birth Cohort n What predicts rehospitalization in psychoses? n N = 158 hospital treated cases n Exposure variables u sex u father’s social class (1980) u familial risk u onset age u length of first hospitalization u diagnosis (schz / other psychosis)
7 SPSS Output (1)
8 SPSS Output (2)
9 Survival analysis (1) n Examines time between two events, e.g. u from birth to illness onset u from illness onset to death u from end of treatment to rehospitalization n Kaplan-Meier model estimates probability of events in each time point
10 Survival analysis (2) n Required information uEvent (0,1) uTime to event (days, months,…) or to censoring uData is censored due to FEnd of follow-up time FLoss of contact FOr e.g. other dead than that of interest
11 Example question Northern Finland 1966 Birth Cohort n What predicts age of suicide? n People alive and living in Finland at 16 years (N=10,934) n Data till end of 2001 u 58 (0.5%) suicides u 140 (1.3%) other deaths u 10,736 (98.2%) alive n Predictor variable: u family type at birth (full, single)
12 log rank test, p=0.002 Test Statistics for Equality of Survival Distributions SPSS Output (1)
13 Survival analysis (3) n Difference or trend in difference between groups should be about the same across time, at least curves shouldn’t cross (if statistically tested) n Can be done also with small samples n Curve can be presented as survival or as hazard function n References, e.g. u Parmar & Machin: Survival analysis. A practical approach. John Wiley & Sons, 1995.
14 SPSS Output (2)
15 Example question (2) Age of suicide and family type n Possible confounding variables u sex u social class 1966 (I-II,III-IV,V) u average school mark at 14 u psychiatric diagnosis (no, yes) u crime (no, violent, non-violent) n Cox regression analysis
16 Cox regression analysis SPSS Output (3)
17 Cox regression analysis SPSS Output (4)
18 Analysis of variance n ANOVA u One continuous outcome (dependent) variable n MANOVA u Several continuous outcome variables n Repeated measurements ANOVA u Same measurements are made several times on each subject n ANOVA, MANOVA and rANOVA u Only categorical predictors n ANCOVA, MANCOVA, rANCOVA u Also continuous predictors
19 Example question Difference in size of hippocampus n Northern Finland 1966 Birth Cohort u Follow-up study n Schizophrenia patients (N=56) vs. healthy controls (N=104) n Repeated measurements ANCOVA u Measurements of right and left side were thought as repeated measurements
20 Example table Schizophrenia and Comparison subjects Hippocampus volumes F Sig. Model 1 Within effect: side20.3< Diagnosis Gender Model 2 Within effect: side Covariate: brain vol.35.0< Diagnosis< Gender Familial psychosis Perinatal risk Handedness Tanskanen et al. Schizophrenia Research (in press)
21 Random regression analysis n Random regression analysis = Random-effects (multilevel) models = … u Allow presence of missing data u Allow time-varying covariates u Allow subjects measured at different timepoints u Takes into account several levels of subjects (multilevel analysis)
22 Random regression analysis n Available software u SAS Proc Mixed u Stata (GLLAMM) u Specific multilevel modeling software F MLWin F HLM
23 Random regression analysis n References u Goldstein et al. Tutorial in biostatistics. Multilevel modelling of medical data. Stat Med, 21, , u Hedeker & Mermelstein. Application of random-effects regression models in relapse research, Addiction, 91, S211-30, u Sharma et al. A longitudinal study of plasma cortisol and depressive symptomatology by random regression analysis. Biol Psychiatry 31, , u Tilling et al. A new method for predicting recovery after stroke. Stroke 32, , u Homepage of Don Hedeker: F u Homepage of Sophia Rabe-Hesketh (GLLAMM) F
24 Structural Equation Modeling n Combination of factor analysis and regression n Continuous and discrete predictors and outcomes n Relationships among measured or latent variables
25 Caring orientation Expertise orientation Life orientation Catalytic- co-operational nursing Controlling nursing Confirming nursing male, p=.002 older, p<.0001 no children, p=.048 Swedish, p<.0001 older, p<.0001 no children, p=.036 Finnish, p=.020 younger, p=.0003 sairaanhoit, p=.020 no children, p<.0001 older, p=.034 Swedish, p<.0001 older, p0.002 older, p= (r=.64) + (r=.11) + (r=.27) + (r=.47) (r=.22) + (r=.44) + (r=.18) + (r=.19) Orientation to nursing Orientation to learning nursing Example: Nursing orientation Vanhanen-Nuutinen et al. (manuscript)
26 Structural Equation Modeling n References u Bentler & Stein. Structural equation models in medical research. Stat Methods Med Res 1: 159–181, u Bollen. Structural equations with latent variables. John Wiley & Sons, Inc, New York, u Finch & West. The investigation of personality structure: statistical models. J Res Pers 31: 439– 485, u MacCallum & Austin. Applications of structural equation modeling in psychological research. Annu Rev Psychol 51: 201–226, 2000.
27 Latent class analysis n Specific statistical method developed to group subjects according to selected characteristics u Classifies subjects to groups u Identifies characteristics that indicate groups
28 Example: Anti-Social Behavior n Damaged property n Fighting n Shoplifting n Stole <$50 n Stole >$50 n Use of force n Seriously threaten n Intent to injure n Use Marijuana n Use other drug n Sold Marijuana n Sold hard drugs n ‘Con’ somebody n Stole an Automobile n Broken into a building n Held stolen goods n Gambling Operation n National Longitudinal Survey of Youth (NLSY) n Respondent ages between 16 and 23 n Background information: age, gender and ethnicity n N=7, antisocial dichotomously scored behavior items: Reference:
29 Example: Anti-Social Behavior Damage Property FightingShopliftingStole <$50Gambling... Male Race Age C
30 Example: Anti-Social Behavior probabilities
31 Relationship between class probabilities and age by gender FemalesMales (age)
32 n Summary of four classes: u Property Offense Class (9.8%) u Substance Involvement Class (18.3%) u Person Offenses Class (27.9%) u Normative Class (44.1%) n Classification Table: Example: Anti-Social Behavior Rows: Average latent class probability for most likely latent class membership Columns: Latent class
33 Latent class analysis n References u Muthén & Muthén. Integrating person-centered and variable-centered analyses: Growth mixture modeling with latent trajectory classes. Alcohol Clin Exp Res, 24, , u default.htm n More references and examples u Homepage of Mplus software:
34 Missing data n Major problem in longitudinal studies n Usually data is not missing at random n One “solution” u Compare included and excluded cases F Not very good! F Smaller sample size give less power (change to get low p-values)
35 Imputing single missing data n With mean of sample (or subsample) u Gives less variability to data n Nearest neighbour imputation u Gives less variability to data n Use regression techniques to predict missing data n Mean of variables of same subject measuring appr. same thing u e.g. in psychological scales n Now “missing value analysis” also in SPSS
36 Multiple imputation n Requires special software u SAS/STAT (PROC MI & PROC MIANALYZE) u S-PLUS (MICE) u SOLAS for Missing Data Analysis 3.0 n References u Kmetic et al. Multiple imputation to account for missing data in a survey: estimating the prevalence of osteoporosis. Epidemiology, 13, , u McCleary. Using multiple imputation for analysis of incomplete data in clinical research. Nurs Res, 51, , u Streiner. The case of the missing data: methods of dealing with dropouts and other research vagaries. Can J Psychiatry, 47, 68-75, 2002.
37 General references in Finnish n Metsämuuronen. Tutkimuksen tekemisen perusteet ihmistieteissä (2003) n Nummenmaa et al. Tutkimusaineiston analyysi (1997) n Uhari & Nieminen. Epidemiologia & Biostatistiikka (2001) n SPSS, SAS, etc. manuals