Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5.

Slides:



Advertisements
Similar presentations
M2 Medical Epidemiology
Advertisements

Using Excel Biostatistics 212 Lecture 4. Housekeeping Questions about Lab 3? –replace vs. recode Final Project Dataset! –“Housekeeping” commands vs. data.
Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5.
What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007.
1 Confounding and Interaction: Part II  Methods to Reduce Confounding –during study design: »Randomization »Restriction »Matching –during study analysis:
HSRP 734: Advanced Statistical Methods July 24, 2008.
Multinomial Logit Sociology 8811 Lecture 11 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
From last time….. Basic Biostats Topics Summary Statistics –mean, median, mode –standard deviation, standard error Confidence Intervals Hypothesis Tests.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Basic epidemiologic analysis with Stata
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
Event History Models Sociology 229: Advanced Regression Class 5
Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Biostat Didactic Seminar Series Analyzing Binary Outcomes: Analyzing Binary Outcomes: An Introduction to Logistic Regression Robert Boudreau, PhD Co-Director.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Stratification and Adjustment
Analysis of Categorical Data
Concepts of Interaction Matthew Fox Advanced Epi.
Making a figure, dates, and other advanced topics Biostatistics 212 Lecture 6.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Evidence-Based Medicine 3 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
Making a figure with Stata or Excel Biostatistics 212 Lecture 7.
EPI 811 – Work Group Exercise #2 Team Honey Badgers Alex Montoye Kellie Mayfield Michele Fritz Anton Frattaroli.
Organizing a project, making a table Biostatistics 212 Lecture 7.
Organizing a project, making a table Biostatistics 212 Session 5.
Basic epidemiologic analysis with Stata Part II Biostatistics 212 Lecture 6.
Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5.
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
October 15. In Chapter 19: 19.1 Preventing Confounding 19.2 Simpson’s Paradox 19.3 Mantel-Haenszel Methods 19.4 Interaction.
Organizing a project, making a table Biostatistics 212 Lecture 7.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Analytical epidemiology Disease frequency Study design: cohorts & case control Choice of a reference group Biases Alain Moren, 2006 Impact Causality Effect.
Lecture 18 Ordinal and Polytomous Logistic Regression BMTRY 701 Biostatistical Methods II.
BIOST 536 Lecture 1 1 Lecture 1 - Introduction Overview of course  Focus is on binary outcomes  Some ordinal outcomes considered Simple examples Definitions.
11/20091 EPI 5240: Introduction to Epidemiology Confounding: concepts and general approaches November 9, 2009 Dr. N. Birkett, Department of Epidemiology.
A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Special topics. Importance of a variable Death penalty example. sum death bd- yv Variable | Obs Mean Std. Dev. Min Max
The dangers of an immediate use of model based methods The chronic bronchitis study: bronc: 0= no 1=yes poll: pollution level cig: cigarettes smokes per.
01/20151 EPI 5344: Survival Analysis in Epidemiology Confounding and Effect Modification March 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
Confounding and effect modification Epidemiology 511 W. A. Kukull November
Today Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation – GOF.
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Conditional Logistic Regression Epidemiology/Biostats VHM812/802 Winter 2016, Atlantic Veterinary College, PEI Raju Gautam.
Exact Logistic Regression
Logistic Regression 2 Sociology 8811 Lecture 7 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
1 Ordinal Models. 2 Estimating gender-specific LLCA with repeated ordinal data Examining the effect of time invariant covariates on class membership The.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
Advanced Quantitative Techniques
Discussion: Week 4 Phillip Keung.
Advanced Quantitative Techniques
Lecture 18 Matched Case Control Studies
Introduction to Logistic Regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Soc 3306a Lecture 11: Multivariate 4
Problems with infinite solutions in logistic regression
Discussion Week 1 (4/1/13 – 4/5/13)
Common Statistical Analyses Theory behind them
Effect Modifiers.
Presentation transcript:

Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5

Housekeeping Questions about Lab 4? –Extra credit puzzle Lab 3 issues –Make sure your do file executes –Make sure your do file opens the dataset Final Project – by the last session you should: –Have dataset imported into Stata –Clean up the variables you will use –Sketch out (paper and pencil) a table and a figure –Be ready to write analysis do files

Today... What’s the difference between epidemiologic and statistical analysis? Interaction and confounding with 2 x 2’s Stata’s “Epitab” commands Adjusting for many things at once Logistic regression Testing for trends

Epi vs. Biostats Statistical analysis – Evaluating the role of chance Epidemiologic analysis – Analyzing and interpreting clinical research data in the context of scientific knowledge –Directionality of causes –Mediation vs. confounding –Prediction vs. causal inference –Clinical importance of effect size –“Cost” of a type I and type II error

Epi vs. Biostats Epi –Confounding, interaction, and causal diagrams. –What to adjust for? –What do the adjusted estimates mean? A B C ABC

2 x 2 Tables “Contingency tables” are the traditional analytic tool of the epidemiologist Outcome Exposure ab cd OR = (a/b) /(c/d) = ad/bc RR = a/(a+b) / c/(c+d)

2 x 2 Tables Example Coronary calcium Binge drinking OR = 2.1 (1.6 – 2.7) RR = 1.9 (1.6 – 2.4)

2 x 2 Tables Example Coronary calcium Binge drinking OR = 2.1 (1.6 – 2.7) RR = 1.9 (1.6 – 2.4) Can we say that binge drinking CAUSES atherosclerosis?

2 x 2 Tables There is a statistically significant association, but is it causal? Does male gender confound the association? Binge drinking Coronary calcium Male

2 x 2 Tables Men more likely to binge –34% of men, 14% of women Men have more coronary calcium –15% of men, 7% of women

2 x 2 Tables But what does confounding look like in a 2x2 table? And how do you adjust for it?

2 x 2 Tables But what does confounding look like in a 2x2 table? And how do you adjust for it? –Stratify –Examine strata-specific estimates (for interaction) –Combine estimates if appropriate (if no interaction) Weighted average of strata-specific estimates

2 x 2 Tables First, stratify… CAC Binge CAC Binge CAC Binge In menIn women RR = 1.94 ( ) (34%)(14%) (15%)(7%) RR = 1.57 ( )RR = 1.50 ( )

2 x 2 Tables …compare strata-specific estimates… (they’re about the same) CAC Binge CAC Binge In menIn women (34%)(14%) (15%)(7%) RR = 1.57 ( )RR = 1.50 ( )

2 x 2 Tables …and then “combine” the estimates CAC Binge CAC Binge In menIn women RR = 1.50 ( )RR = 1.57 ( ) RRadj = 1.51 ( )

Binge CAC Binge CAC Binge In menIn women (34%)(14%) (15%)(7%) RR = 1.57 ( )RR = 1.50 ( ) RR = 1.94 ( ) RRadj = 1.51 ( )

2 x 2 Tables How do we do this with Stata? –Tabulate – output not exactly what we want. –The “epitab” commands Stata’s answer to stratified analyses cs, cc csi, cci tabodds, mhodds

2 x 2 Tables Example – demo using Stata cs cac binge cs cac binge, by(male) cs cac modalc cs cac modalc, by(racegender) cc cac binge

2 x 2 Tables Example of a crude association (unadjusted). cs cac binge | Binge pattern [>5 drinks| | on occasion] | | Exposed Unexposed | Total Cases | | 292 Noncases | | Total | | 3042 | | Risk | | | | | Point estimate | [95% Conf. Interval] | Risk difference | | Risk ratio | | Attr. frac. ex. | | Attr. frac. pop | | chi2(1) = Pr>chi2 =

2 x 2 Tables Example of Confounding. cs cac binge, by(male) male | RR [95% Conf. Interval] M-H Weight | | Crude | M-H combined | Test of homogeneity (M-H) chi2(1) = Pr>chi2 =

2 x 2 Tables Example of Effect Modification. cs cac modalc, by(racegender) racegender | RR [95% Conf. Interval] M-H Weight Black women | White women | Black men | White men | Crude | M-H combined | Test of homogeneity (M-H) chi2(3) = Pr>chi2 =

2 x 2 Tables Inmediate commands –csi, cci –No dataset required – just 2x2 cell frequencies csi a b c d csi (for cac binge)

Multivariable adjustment Binge drinking appears to be associated with coronary calcium –Association partially due to confounding by gender What about race? Age? SES? Smoking?

Multivariable adjustment manual stratification # 2x2 tables Crude association1 Adjust for gender2 Adjust for gender, race4 Adjust for gender, race, age68 Adjust for “” + income, education816 Adjust for “” + “” + smoking2448

Multivariable adjustment cs command cs command –Does manual stratification for you Lists results from every strata Tests for overall homogeneity Adjusted and crude results –Demo cs cac binge, by(male black age)

Multivariable adjustment cs command cs command –Does manual stratification for you Lists results from every strata Tests for overall homogeneity Adjusted and crude results –Demo cs cac binge, by(male black age) –Can’t interpret interactions!

Multivariable adjustment mhodds command mhodds allows you to look at specific interactions, adjusted for multiple covariates –Does same stratification for you –Adjusted results for each interaction variable –P-value for specific interaction (homogeneity) –Summary adjusted result Demo mhodds cac binge age, by(racegender)

Multivariable adjustment mhodds command mhodds allows you to look at specific interactions, adjusted for multiple covariates –Does same stratification for you –Adjusted results for each interaction variable –P-value for specific interaction (homogeneity) –Summary adjusted result Demo mhodds cac binge age, by(racegender) But strata get thin!

Multivariable adjustment logistic command Assumes logit model –Await biostats class for details! –Coefficients estimated, no actual stratification –Continuous variables used as they are

Multivariable adjustment logistic command Basic syntax: logistic outcomevar [predictorvar1 predictorvar2 predictorvar3…]

Multivariable adjustment logistic command If using any categorical predictors: logistic outcomevar [i.catvar var2…] Creates “dummy variables” on the fly If you forget, Stata won’t know they are categorical, and you’ll get the wrong answer!

Multivariable adjustment logistic command Demo logistic cac binge logistic cac binge male logistic cac binge male black logistic cac binge male black age logistic cac binge male black age i.smoke logistic cac binge##i.racegender age i.smoke logistic cac modalc##racegender

Multivariable adjustment logistic command Demo. xi: logistic cac binge male black age i.smoke i.smoke _Ismoke_0-2 (naturally coded; _Ismoke_0 omitted) Logistic regression Number of obs = 3036 LR chi2(6) = Prob > chi2 = Log likelihood = Pseudo R2 = cac | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] binge | male | black | age | _Ismoke_1 | _Ismoke_2 |

logistic command interaction demo. logistic cac modalc##racegender age i.smoke Logistic regression Number of obs = 2795 LR chi2(10) = Prob > chi2 = Log likelihood = Pseudo R2 = cac | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] modalc | | racegender | 2 | | | | modalc#| racegender | 1 2 | | | | age | | smoke | 1 | |

Multivariable adjustment logistic command Pro’s –Provides all OR’s in the model –Accepted approach ( mhodds rarely used by statisticians) –Can deal with continuous variables (like age) –Better estimation for large models? Con’s –Interaction testing more cumbersome, less automatic –More assumptions –Harder to test for trends

Multivariable adjustment Format for linear regression, and other types of regression is the same as for logistic regression, except for the initial command: regress outcomevar [predictorvar1 predictorvar2 predictorvar3…] ologit outcomevar [predictorvar1 predictorvar2 predictorvar3…] etc

Testing for trend Test of trend with tabodds. tabodds cac alccat alccat | cases controls odds [95% Conf. Interval] | <1 | | | Test of homogeneity (equal odds): chi2(3) = Pr>chi2 = Score test for trend of odds: chi2(1) = Pr>chi2 =

Testing for trends tabodds command Adjustment for multiple variables possible –tabodds cac alccat, adjust(age male black)

Approaching your analysis Number of potential models/analyses is daunting –Where do you start? How do you finish? My suggestion –Explore –Plan definitive analysis, make dummy tables/figures –Do analysis (do/log files), fill in tables/figures –Show to collaborators, reiterate prn –Write paper

Summary Make sure you understand confounding and interaction with 2x2 tables in Stata Epitab commands are a great way to explore your data –Emphasis on interaction Logistic regression is a more general approach, ubiquitous, but testing for interactions and trends is more difficult

In lab today… Lab 5 –Epi analysis of coronary calcium dataset –Walks you through evaluation of confounding and interaction Judgment calls – often no right answer, just focus on reasoning. Reminder – put your answers as comments in the do file * 15c – 15%, p<.001