Biostatistics Case Studies Peter D. Christenson Biostatistician Session 5: Analysis Issues in Large Observational Studies.

Slides:



Advertisements
Similar presentations
Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins
Advertisements

LSU-HSC School of Public Health Biostatistics 1 Statistical Core Didactic Introduction to Biostatistics Donald E. Mercante, PhD.
Associations between Obesity and Depression by Race/Ethnicity and Education among Women: Results from the National Health and Nutrition Examination Survey,
1 Case-Control Study Design Two groups are selected, one of people with the disease (cases), and the other of people with the same general characteristics.
Cross-sectional study. Definition in Dictionary of pharmaceutical medicine 2009 by G Nahler Dictionary of pharmaceutical medicine cross-sectional study.
BIAS AND CONFOUNDING Nigel Paneth. HYPOTHESIS FORMULATION AND ERRORS IN RESEARCH All analytic studies must begin with a clearly formulated hypothesis.
Chance, bias and confounding
ODAC May 3, Subgroup Analyses in Clinical Trials Stephen L George, PhD Department of Biostatistics and Bioinformatics Duke University Medical Center.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 5: Reporting Subgroup Results.
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
Journal Club Alcohol, Other Drugs, and Health: Current Evidence July-August 2007.
Journal Club Alcohol and Health: Current Evidence March-April 2007.
Journal Club Alcohol, Other Drugs, and Health: Current Evidence January–February 2009.
Sugar-Sweetened Beverage Consumption and Incident Cardiovascular Risk Factors: The Multi-Ethnic Study of Atherosclerosis (MESA) Christina Shay PhD MA 1.
Journal Club Alcohol, Other Drugs, and Health: Current Evidence July–August 2009.
Journal Club Alcohol and Health: Current Evidence July-August 2006.
1 Journal Club Alcohol, Other Drugs, and Health: Current Evidence May–June 2011.
Journal Club Alcohol and Health: Current Evidence July–August 2004.
Reporting Results P9419 Class #6 November 17, 2003.
A Longitudinal Study of Maternal Smoking During Pregnancy and Child Height Author 1 Author 2 Author 3.
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Biostat Didactic Seminar Series Analyzing Binary Outcomes: Analyzing Binary Outcomes: An Introduction to Logistic Regression Robert Boudreau, PhD Co-Director.
1 Journal Club Alcohol, Other Drugs, and Health: Current Evidence July–August 2012.
STrengthening the Reporting of OBservational Studies in Epidemiology
Multiple Choice Questions for discussion
Evidence-Based Medicine 4 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
1 Journal Club Alcohol, Other Drugs, and Health: Current Evidence January–February 2014.
Simple Linear Regression
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Biostatistics in Clinical Research Peter D. Christenson Biostatistician January 12, 2005IMSD U*STAR RISE.
Coffee Consumption and Risk of Myocardial Infarction among Older Swedish Women SA Rosner, A Akesson,MJ. Stampfer, A Wolk; AJE; :
Evidence-Based Medicine 3 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
Nut consumption and diseases 實習生:張瀞文 指導老師:蕭佩珍營養師 1.
Statistical Bootstrapping Peter D. Christenson Biostatistician January 20, 2005.
Study design P.Olliaro Nov04. Study designs: observational vs. experimental studies What happened?  Case-control study What’s happening?  Cross-sectional.
 Is there a comparison? ◦ Are the groups really comparable?  Are the differences being reported real? ◦ Are they worth reporting? ◦ How much confidence.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 2: Diagnostic Classification.
Gabrielle Sherer Cardiovascular Risk Reduction Jeff Luckring MS, RD.
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 3: Incomplete Data in Longitudinal Studies.
Biostat Didactic Seminar Series Correlation and Regression Part 2 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical.
Literature searching & critical appraisal Chihaya Koriyama August 15, 2011 (Lecture 2)
1 THE ROLE OF COVARIATES IN CLINICAL TRIALS ANALYSES Ralph B. D’Agostino, Sr., PhD Boston University FDA ODAC March 13, 2006.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 6: Discrepancies as Predictors: Discrepancy.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 6: Case Study.
Issues concerning the interpretation of statistical significance tests.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 3: Testing Hypotheses.
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size for Precision or Power.
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 1: The Logic Behind Statistical Adjustment.
Surveillance and Surveys Higher Blood pressure among Inuit migrants in Denmark than among Inuit in Greenland Bjerregaard et al.
BC Jung A Brief Introduction to Epidemiology - XIII (Critiquing the Research: Statistical Considerations) Betty C. Jung, RN, MPH, CHES.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 2: Correlation of Time Courses of Simultaneous.
Matching. Objectives Discuss methods of matching Discuss advantages and disadvantages of matching Discuss applications of matching Confounding residual.
Session 6: Other Analysis Issues In this session, we consider various analysis issues that occur in practice: Incomplete Data: –Subjects drop-out, do not.
Unit 11: Evaluating Epidemiologic Literature. Unit 11 Learning Objectives: 1. Recognize uniform guidelines used in preparing manuscripts for publication.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Meta-analysis of observational studies Nicole Vogelzangs Department of Psychiatry & EMGO + institute.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Case Control study. An investigation that compares a group of people with a disease to a group of people without the disease. Used to identify and assess.
Measures of disease frequency Simon Thornley. Measures of Effect and Disease Frequency Aims – To define and describe the uses of common epidemiological.
Copyright © 2009 American Medical Association. All rights reserved.
Coffee drinking and leukocyte telomere length: A meta-analysis
Effective Feedback, Rubrics, and Grading
ERRORS, CONFOUNDING, and INTERACTION
Narrative Reviews Limitations: Subjectivity inherent:
Epidemiology MPH 531 Analytic Epidemiology Cohort Studies
Evaluating Effect Measure Modification
Presentation transcript:

Biostatistics Case Studies Peter D. Christenson Biostatistician Session 5: Analysis Issues in Large Observational Studies

Case Study

Available Data Dietary InfoXXXX Physical ActivityXX X OtherXXX XXX Nurses’ Health Study II : Prospective cohort of 116,671 female US nurses aged years at study initiation in All info from biennial questionnaires:

Outcome Measures Weight Gain : Outcome : Change 1991 to 1995 and 1995 to Main Predictor : Categories of change in soft drink use. Development of DM: Outcome: Onset of DM by pre-1997 definition. Main Predictor : Categories of soft drink use at one time.

Selected Subjects N =116,671 Exclusions Soft drink data missing in 1995 and Hx DM,CVD <1995 or CA ever. Body weight missing on any Q. Physical activity data missing in Incomplete diet or physical activity data in Hx DM,CA,CVD at baseline. N = 51,603 for weight gain outcome N = 91,249 for diabetes outcome Exclusions

General Issue #1: Selected Subjects Potential biases due to subjects not having data missing at random. Partial solutions: Compare subjects who have missing data with other subjects on other characteristics. Can use those w/o physical activity data in 1997 in primary weight gain analyses. [Note Table 3 only considers baseline physical activity.] Seriousness: Unlikely to have much effect with large study and high (90%) response rates. But 40,000 were excluded from weight gain analyses, but in DM analyses.

General Issue #2: False Positive Conclusions Using p<0.05, each individual statistical test has a 5% chance of a false positive conclusion. Many tests are performed. False positive conclusions become likely. An issue in most studies, but: 1.least problematic in randomized studies with limited specific hypotheses. 2.needs to be addressed in all observational studies, at least by reporting number of tests. 3.most serious in large observational studies due to many factors examined, and many investigators, multiplying the overall number of comparisons enormously.

Issue #2: Continued Question: If 1000 statistical tests are performed at p<0.05, what % of positive conclusions are false? Solution: This is identical to the issue of predictive value of a diagnostic screening test. If a disease is rare, then the specificity must be very large, or else the few cases detected will be overwhelmed by the large number of false positives. Example: If 10% of hypotheses are positive [disease rate] and statistical power is 50% [sensitivity], and p<0.05 is used [specificity=95%], then 45/95=47% of positive conclusions are false: Stern, et al BMJ 2001; 322:226-31

Issue #2: Continued For other power, p-values, and data dredging rates: Stern, et al BMJ 2001; 322:226-31

Issue #2: Partial Solution Use tests less, and confidence intervals more, not as surrogates for tests, but for examining the range of values in the interval for clinical relevance. Using p<0.001 rather than p<0.05 in studies with many comparisons would help. The study size does not increase as much as might be expected: a factor of 1.75 to move from p<0.05 to p<0.01 a factor of 2.82 to move from p<0.05 to p<0.001 This paper does generally follow these recommendations. Stern, et al BMJ 2001; 322:226-31

Table 1: Univariate, static associations Higher intake of sugared soft drinks is associated with: Less physical activity More smoking Higher total energy intake, sucrose, fructose, total carbs. Lower intake of protein, alcohol, magnesium, cereal fiber. Higher glycemic index. Note that due to large study size, all means and percents are very precise. SDs are reported; SEs would be tiny. There are many other “significant” (p<0.05) results in this table, e.g., caffeine.

Weight Gain: Table 2: Adjusted effects of changes in soft drink use Change from low to high use > mean weight gain of 4.69 kg Change from high to low use > mean weight gain of 1.34 kg Low use, no change > mean weight gain of 3.21 kg High use, no change > mean weight gain of 3.12 kg Adjustments use stratification for categorical variables and ANOCOV-type for continuous variables such as BMI.

Weight Gain: Analysis Issues Physical activity in 1997 was used for 1995 and 1999, when it was not asked. Thus models 2-4 are not possible for changes, actually use changes beyond 1995 for the changes. Confounding is the major issue. Due to large study size, we can adjust for so many potential confounders together. Note that “adjust” means subgroup calculations for categorical variables such as quintiles of fat, so Ns can become relatively small is smaller studies. This is a big strength of large studies. Differential effects in subgroups (interactions) should have been checked. If large, should not adjust, but report separately. But what is “large”, here with such precise data due to large N, and thus small p-values likely? Can use magnitude of differential effect as criterion instead.

Development of Diabetes: Tables 3 and 4 RR of DM for high to low sugared soft drink use = 1.98 for age-adjustment only; 1.83 including other adjustments. Conclusion is similar according to subgroups based on other risk factors (Table 4). Note that: Adjustments use “most recent” information, could be several years distant for some subjects, current for others. Cannot use changes in soft drink use due to outcome = time to onset of DM.

Development of Diabetes: Comparison of Different Sweet Drinks BMI-adjusted RR (95% CI) of DM for >1/day vs. <1/mo: Sugar-sweetened: 1.39 (1.07 – 1.76) Diet: 1.21 (0.97 – 1.50) Fruit juice*: 0.97 (0.64 – 1.47) * may not be adjusted for BMI (?)

Onset of DM: Analysis Issues Values for covariate adjustment use “most recent” information, variable among subjects. Measurement error in self-reported soft drink use and confounders such as weight and physical activity. Note p. 928 for low correlations of 0.36, 0.55 between questionnaire and dietary records for sugared soft drinks in other studies. These information errors can alter RRs substantially, so that 1.39 for sugared drinks and 1.21 for diet drinks would not meaningfully different. This is termed residual confounding and can be analyzed with sensitivity analysis. [This is an issue for weight gain analyses as well.]

Residual Confounding Analogy: Exposed= >1/day Unexposed= <1/day. With 20% error in soft drink classification, OR is reduced from 3.0 to 1.9. Similar for RR. Rothman and Greenland, Modern Epidemiology,1998, p 129.

Sensitivity Analysis Gives ranges of possible RRs based on amount of measurement error. Previous slide: sensitivity analysis for dichotomous factor, such as smoker or not. Can use correlations of continuous factors for sensitivity analyses for say, body weight reporting errors.

Conclusions Well analyzed study. Not over-use of statistical testing. Inevitable measurement error in self-reported data: Attempted to minimize this for soft drink use by averaging use in 1991 and 1995 for DM analysis of post-1995 data. Could possibly use sensitivity analysis to assess effect of other factors, such as weight, measured with error. DM conclusions weak due to similarity of sugared and diet drinks.