Download presentation
Presentation is loading. Please wait.
Published byMohammed Hussien Modified over 5 years ago
1
BY: Mohammed Hussien Feb 2019 A Seminar Presentation on Longitudinal data analysis Bahir Dar University School of Public Health Post Graduate Program
2
Presentation outlines Introduction Methods of longitudinal data analysis Continuous outcome variables Binary outcome variables Count outcome variables Example, GEE analysis 2
3
Introduction same individuals repeatedly the defining feature of longitudinal studies is that measurements of the same individuals are taken repeatedly through time characterize the change in response over time and the factors that influence change with repeated measures on individuals one can capture within-individual change is in contrast to cross-sectional studies, in which a single outcome is measured for each individual 3
4
Introduction… clustered a distinctive feature of longitudinal data is that they are clustered positive correlation observations within a cluster will typically exhibit positive correlation independence this correlation invalidates the crucial assumption of independence that is the cornerstone of many standard statistical techniques correlation instead, statistical models for clustered data must account for this correlation 4
5
Introduction… When conducting longitudinal data analysis, the researcher needs to determine whether the data are “balanced” or “unbalanced” balanced balanced - repeated-measures data indicate an equal number of observations for all possible combinations of factor levels, and all subjects measured on the same occasions, unbalanced unbalanced - unequal number of observations or different subjects are measured on different occasions 5
6
Introduction… Longitudinal data presents methodological challenges in study designs and data analyses 1. Correlation among the repeated responses of the same subject classic models for cross-sectional data analysis - do not apply to longitudinal data if correlation is ignored then inferences such as statistical tests or confidence intervals can be grossly invalid. 6
7
Introduction… 2. missing values Since patients are followed up for a period of time, it is common that some patients will drop out of the studies 3. values of the independent variables can also change over time the direction of causality between the outcome and the exposure variables can be complicated 7
8
Methods of longitudinal data analysis Commonly used statistical techniques mostly assume independence of the observations or measurements ignoring the correlation between repeated measurements may lead to biased estimates as well as invalid P-values and confidence intervals appropriate analysis of repeated measures data requires specific statistical techniques 8
9
Continuous outcome variables Paired t-Test The simplest form of longitudinal study is that in which a continuous outcome variable is measured twice in time differences in the response measurements are statistically tested and assessed between t1 and t2 t-test The test statistic of the paired t-test is the average of the differences divided by the standard deviation of the differences divided by the square root of the number of subjects 9
10
ANOVA and MANOVA approaches differences repeated-measures ANOVA can be used to assess differences in an outcome serially measured at different time points in the same subjects while ANOVA gives an overall P-value, it does not identify which group means are equal or not a common approach to answer this question is to perform post hoc tests are pairwise comparisons of either selected means of interest or all possible combinations of means. involves testing of multiple null hypotheses, which leads to an inflated risk of a type I 10
11
ANOVA and MANOVA… Repeated-measures ANOVA Assumptions the correlation between repeated measurements for the same subject is constant, irrespective of how far apart in time the measurements are made (termed “compound symmetry correlation structure”) assumption of sphericity - the variances of the differences between any two time points is the same, irrespective of how distant the measurements are 11
12
ANOVA and MANOVA… Multivariate ANOVA (MANOVA) does not assume a specific correlation structure or sphericity a popular alternative to repeated-measures ANOVA when its assumptions are violated. complete data for all subjects While this is an important advantage of MANOVA over ANOVA, it is tempered by the larger limitation of requiring complete data for all subjects in the MANOVA model application of MANOVA must follow deletion of all subjects without complete data 12
13
ANOVA and MANOVA… ANOVA has several shortcomings or limitations The outcome must be continuous and should generally be normally distributed around the mean value do not accommodate for continuous covariates cannot handle covariates that vary over time does not allow for unequally spaced time intervals in different subjects. excludes the entire subject from the analysis when a single observation is missing ANOVA tests for “significance,” but does not allow inferences on how the outcome is expected to change as the predictor variable(s) change 13
14
Generalized Estimating Equations (GEE) non-independent two commonly used modern approaches that make use of the flexibility of regression models while accounting for non-independent measurements Generalized Estimating Equation (GEE) and Mixed-effect models. 14
15
GEE… used to estimate the parameters in a marginal model, in combination with robust standard errors to quantify the precision of the estimate covariates of interest marginal - indicates that the model for the mean response depends only on the covariates of interest, and not on any random effects or previous responses random effects in mixed effects models the mean response depends not only on covariates but also on a vector of random effects 15
16
GEE… 16
17
GEE… the marginal approach models the mean response the interpretation refers to population-average effects of the predictor variables the regression parameters thus allow inferences on how the mean outcome, averaged over the whole population, is expected to change relative to changes in the independent variable(s) 17
18
GEE… GEE can be used to estimate the regression parameters, while accounting for repeated measurements on same subjects working correlation structure to account for nonindependence, the analyst needs to specify a working correlation structure, the correlations are treated as nuisance parameters; interest focuses on the statistical inference for the regression parameters 18
19
Working correlation structure average dependence working correlation structures represent the average dependence among the repeated observations across subjects efficiency Specification of the correct form of the working correlation structure increases the efficiency of the parameter estimates (βs and standard errors) independent, exchangeable, AR1 autoregressive, stationary m-dependent, and unstructured There are five commonly used correlation structures: independent, exchangeable, AR1 autoregressive, stationary m-dependent, and unstructured 19
20
Working correlation structure… Independent correlation structure The assumption is that responses are uncorrelated within a cluster This form is equivalent to assuming that the longitudinal data are not correlated 20
21
Working correlation structure… Exchangeable correlation structure any two responses within a cluster have the same correlation (ρ). there is only one correlation parameter to be estimated the order of observations within a cluster is arbitrary. 21
22
Working correlation structure… Autoregressive correlation structure (AR1) the correlation between responses depends on the interval of time between responses correlations decrease as the time interval between the measurements increases ρ |t1-t2| The correlation between any two responses from the same subject equals ρ |t1-t2| 22
23
Working correlation structure… Stationary m-dependent correlation structure correlations k occasions apart are the same for k = 1, 2,..., m, whereas correlations more than m occasions apart are zero m - units The assumption is that responses within a cluster are uncorrelated if they are more than m - units apart 23 ma stationary m-dependent correlation structure has m distinct correlation parameters
24
Working correlation structure… Unstructured correlation structure there are less constraints on the correlation parameters has a separate correlation parameter for each pair of observations within a cluster n(n – 1)/2 In general, for a cluster that has n responses, there are n(n – 1)/2 correlation parameters 24 If there are a large number of correlation parameters to estimate, the model may be unstable and results unreliable
25
Working correlation structure… inappropriate choice will lead to inefficient parameter estimation large however, the loss of efficiency is lessened as the number of subjects gets large misspecification the model yields consistent estimates of the regression coefficients and their standard errors, even with misspecification of the correlational structure 25
26
Working correlation structure… Selection of a “working” correlation structure is at the discretion of the researcher Within GEE analysis there is no straightforward way to determine which correlation structure best describes the relationship between correlations some researchers have proposed new selection criteria Pan (2001) proposes the QIC criteria the working correlation structure with the smallest QIC value is the preferred one 26 quasi-likelihood under the independence model criterion (QIC).
27
Example: Results of a GEE analysis The output below shows the results of a GEE analysis that was applied to investigate the longitudinal relationship between Diastolic Blood Pressure (DBP) the outcome variable Diastolic Blood Pressure (DBP) and age, gender, treatment, and pre-test level of DBP the four predictors (age, gender, treatment, and pre-test level of DBP). DBP is measured 4 times 27
28
Choosing the working correlation structure The dataset was analyzed using the different correlation structures The independent and exchangeable correlations provide identical results, Generally, the four correlations structures give similar results or parameter estimates. 28
29
Choosing the working correlation structure… QIC was used and the independent and exchangeable correlation structures give the smallest QIC value the minimum correlation value was 0.46 which is far from zero independent not working The exchangeable correlation structure chosen 29
30
The results of the GEE analysis with exchangeable correlation structures 30
31
Interpretation of the GEE analysis output 31
32
Interpretation… 32
33
Interpretation… 33
34
Linear Mixed-Effects Model continuous used to model the linear relationship between a continuous response and a set of independent variables within a longitudinal data setting random effects The LMM addresses the correlated responses by modeling the between-subject variability using random effects this approach enables one to address the difficulty in modeling correlated responses arising from varying assessment. 34
35
Linear Mixed-Effects Model… 35
36
Linear Mixed-Effects Model… it is not only possible to estimate parameters that describe how the mean response changes in the population of interest, but it is also possible to predict how individual response trajectories change over time One very appealing aspect of linear mixed effects models is their flexibility in accommodating any degree of unbalanced longitudinal data 36
37
Linear Mixed-Effects Model… 37
38
Linear Mixed-Effects Model… 38
39
Linear Mixed-Effects Model… 39
40
Dichotomous outcome variables When a dichotomous outcome variable is used in a longitudinal study, and the objective of the study is to analyze the longitudinal relationship between such a variable and one or more covariates, it is possible to use GEE analysis mixed effect model analysis generalized linear mixed effect models mixed effect model analysis (generalized linear mixed effect models) 40
41
Dichotomous outcome variables… 41
42
Dichotomous outcome variables… Like in cross-sectional multiple logistic regression analysis, all covariates can be continuous, dichotomous or categorical, although in the latter situation dummy coding can or must be used 42
43
Dichotomous outcome variables… with GEE analysis an adjustment is made for the within-subject correlations between the repeated measurements by assuming a working correlation structure random intercept and random slopes. while with mixed model analysis this adjustment is made by allowing different regression coefficients to vary between subjects, i.e. random intercept and random slopes. 43
44
Dichotomous outcome variables… 44
45
Dichotomous outcome variables… 45
46
Count outcome variables Longitudinal analysis with count outcome variables is comparable to a cross-sectional Poisson regression analysis, within-subject correlations the difference being that the longitudinal technique takes into account the within-subject correlations The longitudinal Poisson regression analysis is an extension of the cross-sectional Poisson regression analysis, i.e. an adjustment for the dependency of the observations within the subject 46
47
Count outcome variables… 47
48
Count outcome variables… 48
49
Thank You! 49
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.