Data Analysis – Statistical Issues Bernd Genser, PhD Instituto de Saúde Coletiva, Universidade Federal da Bahia, Salvador Slides available at: Seminário ABRASCO- Métodos em Epidemiologia: ESTUDOS DE COORTE, Rio de Janeiro, 01-AUG BCG REVAC- Cluster Randomization Trial
Main Objective: Estimation of an unbiased consistent measure of Vaccine Efficacy (VE) incl. 95% CI of a BCG dose given to school children in a population with a high coverage of neonatal BCG vaccination Secondary objective: Identify effect modifiers (city, BCG scar, …) The BGC-trial from a statistician‘s point of view
Issues to be addressed in Statistical Analysis 1) Potential confounding and effect modification - Trial design: Complex multi-level covariate structure - Adjusting/controlling for confounding of fixed and time- varying (e.g. age) tb predictors - Heterogeneity of VE across covariate strata expected 2) Cluster Randomization – Adjusting the estimates for potential intra-cluster correlation 3) Expected low incidence of tb: More clusters than cases expected => Traditional statistical methods for CRT could not applied
Analytical Solutions for the BCG trial 1) Issue 1: Dealing with potential confounding variables: Controlled by study design Controlled by study designStratification/randomization: Allocation groups were highly balanced in confounding variables => No statistical adjustment required for these covariates Allocation groups were highly balanced in confounding variables => No statistical adjustment required for these covariates Matching by size of school accounts additionally for effect of “cluster size” Matching by size of school accounts additionally for effect of “cluster size” Adjusted in Statistical Analysis Adjusted in Statistical Analysis Tb incidence is well known strongly dependent on age => age modeled as time-varying variable Tb incidence is well known strongly dependent on age => age modeled as time-varying variable
Dealing with covariates in the BCG trial Design: Random. Subgroup analysis Stat. Adjustment Design: Random. Design: Matching Design: Strat. Subgroup analysis Design: Random.
Evaluation of the random allocation procedure
Issue 2: Dealing with effect modification: Issue 2: Dealing with effect modification: Subgroup analyses conducted by Subgroup analyses conducted by No. of BCG Scars (First or Second dose) No. of BCG Scars (First or Second dose) City (Salvador and Manaus) City (Salvador and Manaus) Clinical form/Certainty level Clinical form/Certainty level Strong evidence of effect heterogeneity found: - We decided to analyze children with 1 and 0 scar seperately: 1st, 2nd dose effect are completely different scientific questions =>No interaction model fitted! - All analyses were presented overall and by city and clinical form Analytical Solutions for the BCG trial (2)
Issue 3: Adjusting the estimates for the “design effect” Issue 3: Adjusting the estimates for the “design effect” Statistical problem: between-cluster variation (=intra-cluster correlation), induced by unexplained dependence structure between children from the same school, usually caused by common unknow/unobserved risk factores => Consequence: standard statistical approaches can substantially underestimate the true variance of the effect estimators (Overdispersion)!!! – confidence intervals too narrow! standard statistical approaches can substantially underestimate the true variance of the effect estimators (Overdispersion)!!! – confidence intervals too narrow! Analytical Solutions for the BCG trial (3)
Statistical approaches to deal with ICC: Statistical approaches to deal with ICC: For binary or quantitative outcomes: Direct adjustment of confidence intervals possible by estimating intracluster (intraclass-) correlation (ICC) For count outcomes (Poisson distributed data): Explicit estimation of ICC not possible! Explicit estimation of ICC not possible! Examining the magnitude of the design effect by comparing unadjusted and adjusted CI Examining the magnitude of the design effect by comparing unadjusted and adjusted CI Novel univariate approaches that directly adjust the CI and P-values for the clustering Novel univariate approaches that directly adjust the CI and P-values for the clustering Analytical Solutions for the BCG trial (4)
Two basic approaches for CRT with Poisson data: A) Analyses at the cluster level: „Cluster summary statistic“, meta-analysis techniques: not recommended in our trial because of the very low cluster specific incidence – i.e. more clusters than cases!!! B) Analyses at the individual level New approach for univariate analysis: Ratio estimator approach for overdispersed Poisson data (Rao & Scott, Stat Med 1999, implemented in Software ACLUSTER): Direct adjustment of confidence intervals using an robust variance estimator Analytical Solutions for the BCG trial (5)
Ratio estimator approach for overdispersed Poisson data
Multivariate modeling - Poisson Regression Multivariate modeling - Poisson Regression Basic Assumption: constant rate over the follow-up time Basic Assumption: constant rate over the follow-up time Could be relaxed by inclusion of time-varying variables (e.g. age) Could be relaxed by inclusion of time-varying variables (e.g. age) Extensions for clustered data: Parametric random effects or multi-level modelling: intra-cluster correlation modeled by cluster specific random effect Disadvantage: strong distributional assumptions! => Random effects models not recommended for that trial: - violation of distributional assumptions, - violation of distributional assumptions, - convergence problems, l - convergence problems, l - large bias in variance estimation of the random effect!!! - large bias in variance estimation of the random effect!!! Better: Semi-parametric approach based onGeneralized Estimating Procedures (GEE): calculate an adjusted variance estimator by an iterative algorithm assuming a „working correlating structure“ Advantage: No distributional assumptions! Disadvantage: Very computer intensive for large datasets because of the calculation complexity: time for the BCG data: 1 hour! (1000 Analytical Solutions for the BCG trial (6)
Results of the Poisson Regression models Naive and robust variance estimations were very similar: No “design effect” observed
Statistical software for analysing/planning CRT STATA 7/8/9, General Purpose Statistical Package, Stata Corporation STATA 7/8/9, General Purpose Statistical Package, Stata Corporation GLM with GEE, random effects or robust variance estimation to adjust for clustering GLM with GEE, random effects or robust variance estimation to adjust for clustering STATA 9, MLWin: Multi-level models STATA 9, MLWin: Multi-level models ACLUSTER - Software for the Design and Analysis of Cluster Randomized Trials ACLUSTER - Software for the Design and Analysis of Cluster Randomized Trials Easy computation of the intraclass correlation coefficient Easy computation of the intraclass correlation coefficient Direct adjustment approaches for univariate analysis Direct adjustment approaches for univariate analysis Power Analysis for the three types of cluster randomized study design Power Analysis for the three types of cluster randomized study design
Literatur Statistics in Medicine (2001); 20 (Special Issue): Design and Analysis of Cluster Randomized Trials Statistics in Medicine (2001); 20 (Special Issue): Design and Analysis of Cluster Randomized Trials Donner A. Klar N. Design and analysis of cluster randomisation trials (2000). Arnold Publications, London. Donner A. Klar N. Design and analysis of cluster randomisation trials (2000). Arnold Publications, London.
Obrigado!