Data Analysis – Statistical Issues Bernd Genser, PhD Instituto de Saúde Coletiva, Universidade Federal da Bahia, Salvador

Slides:



Advertisements
Similar presentations
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Advertisements

Lecture 11 (Chapter 9).
Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS)
HSRP 734: Advanced Statistical Methods July 24, 2008.
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
Lecture 8 Relationships between Scale variables: Regression Analysis

Clustered or Multilevel Data
Mixed models Various types of models and their relation
Meta-analysis & psychotherapy outcome research
Today Concepts underlying inferential statistics
Using Regression Models to Analyze Randomized Trials: Asymptotically Valid Tests Despite Incorrect Regression Models Michael Rosenblum, UCSF TAPS Fellow.
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
Foster Care Reunification: The use of hierarchical modeling to account for sibling and county correlation Emily Putnam-Hornstein, MSW Center for Social.
Generalized Linear Models
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
Design and Analysis of Cluster Randomization Trials in Health Research Allan Donner, Ph.D., Professor and Chair Department of Epidemiology & Biostatistics.
School of Education Archive Analysis: On EEF Trials Adetayo Kasim, ZhiMin Xiao, and Steve Higgins.
STrengthening the Reporting of OBservational Studies in Epidemiology
Stratification and Adjustment
Lecture 9: Marginal Logistic Regression Model and GEE (Chapter 8)
Inference for regression - Simple linear regression
Advanced Statistics for Interventional Cardiologists.
Simple Linear Regression
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.2 Estimating Differences.
The Campbell Collaborationwww.campbellcollaboration.org Introduction to Robust Standard Errors Emily E. Tanner-Smith Associate Editor, Methods Coordinating.
Sampling and Nested Data in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
1 ICEBOH Split-mouth studies and systematic reviews Ian Needleman 1 & Helen Worthington 2 1 Unit of Periodontology UCL Eastman Dental Institute International.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Introduction Multilevel Analysis
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Andrew Thomson on Generalised Estimating Equations (and simulation studies)
Consumer behavior studies1 CONSUMER BEHAVIOR STUDIES STATISTICAL ISSUES Ralph B. D’Agostino, Sr. Boston University Harvard Clinical Research Institute.
Multilevel Data in Outcomes Research Types of multilevel data common in outcomes research Random versus fixed effects Statistical Model Choices “Shrinkage.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Introduction to Multilevel Modeling Stephen R. Porter Associate Professor Dept. of Educational Leadership and Policy Studies Iowa State University Lagomarcino.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
Multilevel Modeling Software Wayne Osgood Crime, Law & Justice Program Department of Sociology.
CREATE Biostatistics Core THRio Statistical Considerations Analysis Plan.
Osteoarthritis Initiative Analytic Strategies for the OAI Data December 6, 2007 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and.
Analytical epidemiology Disease frequency Study design: cohorts & case control Choice of a reference group Biases Alain Moren, 2006 Impact Causality Effect.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Data Analysis in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine October.
Estimation in Marginal Models (GEE and Robust Estimation)
Sample Size Determination
Sampling and Nested Data in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine.
Instructor Resource Chapter 15 Copyright © Scott B. Patten, Permission granted for classroom use with Epidemiology for Canadian Students: Principles,
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
Tutorial I: Missing Value Analysis
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
How to Conduct a Meta-Analysis Arindam Basu MD MPH About the Author Required Browsing.
Chapter 22 Inferential Data Analysis: Part 2 PowerPoint presentation developed by: Jennifer L. Bellamy & Sarah E. Bledsoe.
Meta-analysis of observational studies Nicole Vogelzangs Department of Psychiatry & EMGO + institute.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Using Multilevel Modeling in Institutional Research
Sample Size Determination
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
Lecture 4: Meta-analysis
Generalized Linear Models
Comparisons among methods to analyze clustered multivariate biomarker predictors of a single binary outcome Xiaoying Yu, PhD Department of Preventive Medicine.
What is Regression Analysis?
Linear Hierarchical Modelling
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Narrative Reviews Limitations: Subjectivity inherent:
Effect Modifiers.
Presentation transcript:

Data Analysis – Statistical Issues Bernd Genser, PhD Instituto de Saúde Coletiva, Universidade Federal da Bahia, Salvador Slides available at: Seminário ABRASCO- Métodos em Epidemiologia: ESTUDOS DE COORTE, Rio de Janeiro, 01-AUG BCG REVAC- Cluster Randomization Trial

Main Objective: Estimation of an unbiased consistent measure of Vaccine Efficacy (VE) incl. 95% CI of a BCG dose given to school children in a population with a high coverage of neonatal BCG vaccination Secondary objective: Identify effect modifiers (city, BCG scar, …) The BGC-trial from a statistician‘s point of view

Issues to be addressed in Statistical Analysis 1) Potential confounding and effect modification - Trial design: Complex multi-level covariate structure - Adjusting/controlling for confounding of fixed and time- varying (e.g. age) tb predictors - Heterogeneity of VE across covariate strata expected 2) Cluster Randomization – Adjusting the estimates for potential intra-cluster correlation 3) Expected low incidence of tb: More clusters than cases expected => Traditional statistical methods for CRT could not applied

Analytical Solutions for the BCG trial 1) Issue 1: Dealing with potential confounding variables: Controlled by study design Controlled by study designStratification/randomization: Allocation groups were highly balanced in confounding variables => No statistical adjustment required for these covariates Allocation groups were highly balanced in confounding variables => No statistical adjustment required for these covariates Matching by size of school accounts additionally for effect of “cluster size” Matching by size of school accounts additionally for effect of “cluster size” Adjusted in Statistical Analysis Adjusted in Statistical Analysis Tb incidence is well known strongly dependent on age => age modeled as time-varying variable Tb incidence is well known strongly dependent on age => age modeled as time-varying variable

Dealing with covariates in the BCG trial Design: Random. Subgroup analysis Stat. Adjustment Design: Random. Design: Matching Design: Strat. Subgroup analysis Design: Random.

Evaluation of the random allocation procedure

Issue 2: Dealing with effect modification: Issue 2: Dealing with effect modification: Subgroup analyses conducted by Subgroup analyses conducted by No. of BCG Scars (First or Second dose) No. of BCG Scars (First or Second dose) City (Salvador and Manaus) City (Salvador and Manaus) Clinical form/Certainty level Clinical form/Certainty level Strong evidence of effect heterogeneity found: - We decided to analyze children with 1 and 0 scar seperately: 1st, 2nd dose effect are completely different scientific questions =>No interaction model fitted! - All analyses were presented overall and by city and clinical form Analytical Solutions for the BCG trial (2)

Issue 3: Adjusting the estimates for the “design effect” Issue 3: Adjusting the estimates for the “design effect” Statistical problem: between-cluster variation (=intra-cluster correlation), induced by unexplained dependence structure between children from the same school, usually caused by common unknow/unobserved risk factores => Consequence: standard statistical approaches can substantially underestimate the true variance of the effect estimators (Overdispersion)!!! – confidence intervals too narrow! standard statistical approaches can substantially underestimate the true variance of the effect estimators (Overdispersion)!!! – confidence intervals too narrow! Analytical Solutions for the BCG trial (3)

Statistical approaches to deal with ICC: Statistical approaches to deal with ICC: For binary or quantitative outcomes: Direct adjustment of confidence intervals possible by estimating intracluster (intraclass-) correlation (ICC) For count outcomes (Poisson distributed data): Explicit estimation of ICC not possible! Explicit estimation of ICC not possible! Examining the magnitude of the design effect by comparing unadjusted and adjusted CI Examining the magnitude of the design effect by comparing unadjusted and adjusted CI Novel univariate approaches that directly adjust the CI and P-values for the clustering Novel univariate approaches that directly adjust the CI and P-values for the clustering Analytical Solutions for the BCG trial (4)

Two basic approaches for CRT with Poisson data: A) Analyses at the cluster level: „Cluster summary statistic“, meta-analysis techniques: not recommended in our trial because of the very low cluster specific incidence – i.e. more clusters than cases!!! B) Analyses at the individual level New approach for univariate analysis: Ratio estimator approach for overdispersed Poisson data (Rao & Scott, Stat Med 1999, implemented in Software ACLUSTER): Direct adjustment of confidence intervals using an robust variance estimator Analytical Solutions for the BCG trial (5)

Ratio estimator approach for overdispersed Poisson data

Multivariate modeling - Poisson Regression Multivariate modeling - Poisson Regression Basic Assumption: constant rate over the follow-up time Basic Assumption: constant rate over the follow-up time Could be relaxed by inclusion of time-varying variables (e.g. age) Could be relaxed by inclusion of time-varying variables (e.g. age) Extensions for clustered data: Parametric random effects or multi-level modelling: intra-cluster correlation modeled by cluster specific random effect Disadvantage: strong distributional assumptions! => Random effects models not recommended for that trial: - violation of distributional assumptions, - violation of distributional assumptions, - convergence problems, l - convergence problems, l - large bias in variance estimation of the random effect!!! - large bias in variance estimation of the random effect!!! Better: Semi-parametric approach based onGeneralized Estimating Procedures (GEE): calculate an adjusted variance estimator by an iterative algorithm assuming a „working correlating structure“ Advantage: No distributional assumptions! Disadvantage: Very computer intensive for large datasets because of the calculation complexity: time for the BCG data: 1 hour! (1000 Analytical Solutions for the BCG trial (6)

Results of the Poisson Regression models Naive and robust variance estimations were very similar: No “design effect” observed

Statistical software for analysing/planning CRT STATA 7/8/9, General Purpose Statistical Package, Stata Corporation STATA 7/8/9, General Purpose Statistical Package, Stata Corporation GLM with GEE, random effects or robust variance estimation to adjust for clustering GLM with GEE, random effects or robust variance estimation to adjust for clustering STATA 9, MLWin: Multi-level models STATA 9, MLWin: Multi-level models ACLUSTER - Software for the Design and Analysis of Cluster Randomized Trials ACLUSTER - Software for the Design and Analysis of Cluster Randomized Trials Easy computation of the intraclass correlation coefficient Easy computation of the intraclass correlation coefficient Direct adjustment approaches for univariate analysis Direct adjustment approaches for univariate analysis Power Analysis for the three types of cluster randomized study design Power Analysis for the three types of cluster randomized study design

Literatur Statistics in Medicine (2001); 20 (Special Issue): Design and Analysis of Cluster Randomized Trials Statistics in Medicine (2001); 20 (Special Issue): Design and Analysis of Cluster Randomized Trials Donner A. Klar N. Design and analysis of cluster randomisation trials (2000). Arnold Publications, London. Donner A. Klar N. Design and analysis of cluster randomisation trials (2000). Arnold Publications, London.

Obrigado!