Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biostatistics Case Studies Peter D. Christenson Biostatistician Session 5: Analysis Issues in Large Observational Studies.

Similar presentations


Presentation on theme: "Biostatistics Case Studies Peter D. Christenson Biostatistician Session 5: Analysis Issues in Large Observational Studies."— Presentation transcript:

1 Biostatistics Case Studies Peter D. Christenson Biostatistician http://gcrc.humc.edu/Biostat Session 5: Analysis Issues in Large Observational Studies

2 Case Study

3 Available Data 198919911993199519971999 Dietary InfoXXXX Physical ActivityXX X OtherXXX XXX Nurses’ Health Study II : Prospective cohort of 116,671 female US nurses aged 24-44 years at study initiation in 1989. All info from biennial questionnaires:

4 Outcome Measures Weight Gain : Outcome : Change 1991 to 1995 and 1995 to 1999. Main Predictor : Categories of change in soft drink use. Development of DM: Outcome: Onset of DM by pre-1997 definition. Main Predictor : Categories of soft drink use at one time.

5 Selected Subjects N =116,671 Exclusions Soft drink data missing in 1995 and 1999. Hx DM,CVD <1995 or CA ever. Body weight missing on any Q. Physical activity data missing in 1997. Incomplete diet or physical activity data in 1991. Hx DM,CA,CVD at baseline. N = 51,603 for weight gain outcome N = 91,249 for diabetes outcome Exclusions

6 General Issue #1: Selected Subjects Potential biases due to subjects not having data missing at random. Partial solutions: Compare subjects who have missing data with other subjects on other characteristics. Can use those w/o physical activity data in 1997 in primary weight gain analyses. [Note Table 3 only considers baseline physical activity.] Seriousness: Unlikely to have much effect with large study and high (90%) response rates. But 40,000 were excluded from weight gain analyses, but in DM analyses.

7 General Issue #2: False Positive Conclusions Using p<0.05, each individual statistical test has a 5% chance of a false positive conclusion. Many tests are performed. False positive conclusions become likely. An issue in most studies, but: 1.least problematic in randomized studies with limited specific hypotheses. 2.needs to be addressed in all observational studies, at least by reporting number of tests. 3.most serious in large observational studies due to many factors examined, and many investigators, multiplying the overall number of comparisons enormously.

8 Issue #2: Continued Question: If 1000 statistical tests are performed at p<0.05, what % of positive conclusions are false? Solution: This is identical to the issue of predictive value of a diagnostic screening test. If a disease is rare, then the specificity must be very large, or else the few cases detected will be overwhelmed by the large number of false positives. Example: If 10% of hypotheses are positive [disease rate] and statistical power is 50% [sensitivity], and p<0.05 is used [specificity=95%], then 45/95=47% of positive conclusions are false: Stern, et al BMJ 2001; 322:226-31

9 Issue #2: Continued For other power, p-values, and data dredging rates: Stern, et al BMJ 2001; 322:226-31

10 Issue #2: Partial Solution Use tests less, and confidence intervals more, not as surrogates for tests, but for examining the range of values in the interval for clinical relevance. Using p<0.001 rather than p<0.05 in studies with many comparisons would help. The study size does not increase as much as might be expected: a factor of 1.75 to move from p<0.05 to p<0.01 a factor of 2.82 to move from p<0.05 to p<0.001 This paper does generally follow these recommendations. Stern, et al BMJ 2001; 322:226-31

11 Table 1: Univariate, static associations Higher intake of sugared soft drinks is associated with: Less physical activity More smoking Higher total energy intake, sucrose, fructose, total carbs. Lower intake of protein, alcohol, magnesium, cereal fiber. Higher glycemic index. Note that due to large study size, all means and percents are very precise. SDs are reported; SEs would be tiny. There are many other “significant” (p<0.05) results in this table, e.g., caffeine.

12 Weight Gain: Table 2: Adjusted effects of changes in soft drink use Change from low to high use > mean weight gain of 4.69 kg Change from high to low use > mean weight gain of 1.34 kg Low use, no change > mean weight gain of 3.21 kg High use, no change > mean weight gain of 3.12 kg Adjustments use stratification for categorical variables and ANOCOV-type for continuous variables such as BMI.

13 Weight Gain: Analysis Issues Physical activity in 1997 was used for 1995 and 1999, when it was not asked. Thus models 2-4 are not possible for 1995- 1999 changes, actually use changes beyond 1995 for the 1991-1995 changes. Confounding is the major issue. Due to large study size, we can adjust for so many potential confounders together. Note that “adjust” means subgroup calculations for categorical variables such as quintiles of fat, so Ns can become relatively small is smaller studies. This is a big strength of large studies. Differential effects in subgroups (interactions) should have been checked. If large, should not adjust, but report separately. But what is “large”, here with such precise data due to large N, and thus small p-values likely? Can use magnitude of differential effect as criterion instead.

14 Development of Diabetes: Tables 3 and 4 RR of DM for high to low sugared soft drink use = 1.98 for age-adjustment only; 1.83 including other adjustments. Conclusion is similar according to subgroups based on other risk factors (Table 4). Note that: Adjustments use “most recent” information, could be several years distant for some subjects, current for others. Cannot use changes in soft drink use due to outcome = time to onset of DM.

15 Development of Diabetes: Comparison of Different Sweet Drinks BMI-adjusted RR (95% CI) of DM for >1/day vs. <1/mo: Sugar-sweetened: 1.39 (1.07 – 1.76) Diet: 1.21 (0.97 – 1.50) Fruit juice*: 0.97 (0.64 – 1.47) * may not be adjusted for BMI (?)

16 Onset of DM: Analysis Issues Values for covariate adjustment use “most recent” information, variable among subjects. Measurement error in self-reported soft drink use and confounders such as weight and physical activity. Note p. 928 for low correlations of 0.36, 0.55 between questionnaire and dietary records for sugared soft drinks in other studies. These information errors can alter RRs substantially, so that 1.39 for sugared drinks and 1.21 for diet drinks would not meaningfully different. This is termed residual confounding and can be analyzed with sensitivity analysis. [This is an issue for weight gain analyses as well.]

17 Residual Confounding Analogy: Exposed= >1/day Unexposed= <1/day. With 20% error in soft drink classification, OR is reduced from 3.0 to 1.9. Similar for RR. Rothman and Greenland, Modern Epidemiology,1998, p 129.

18 Sensitivity Analysis Gives ranges of possible RRs based on amount of measurement error. Previous slide: sensitivity analysis for dichotomous factor, such as smoker or not. Can use correlations of continuous factors for sensitivity analyses for say, body weight reporting errors.

19 Conclusions Well analyzed study. Not over-use of statistical testing. Inevitable measurement error in self-reported data: Attempted to minimize this for soft drink use by averaging use in 1991 and 1995 for DM analysis of post-1995 data. Could possibly use sensitivity analysis to assess effect of other factors, such as weight, measured with error. DM conclusions weak due to similarity of sugared and diet drinks.


Download ppt "Biostatistics Case Studies Peter D. Christenson Biostatistician Session 5: Analysis Issues in Large Observational Studies."

Similar presentations


Ads by Google