Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Chapter 2 Describing Contingency Tables Reported by Liu Qi.
Lecture 11 (Chapter 9).
Logistic Regression Psy 524 Ainsworth.
Departments of Medicine and Biostatistics
Measures of association
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
Jul-15H.S.1 Linear Regression Hein Stigum Presentation, data and programs at:
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
Generalized Linear Models
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Regression and Correlation
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic.
Simple Linear Regression
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Assessing Survival: Cox Proportional Hazards Model
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
01/20151 EPI 5344: Survival Analysis in Epidemiology Interpretation of Models March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
HSRP 734: Advanced Statistical Methods July 17, 2008.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Chapter 16 Data Analysis: Testing for Associations.
BC Jung A Brief Introduction to Epidemiology - XIII (Critiquing the Research: Statistical Considerations) Betty C. Jung, RN, MPH, CHES.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Measures of disease frequency Simon Thornley. Measures of Effect and Disease Frequency Aims – To define and describe the uses of common epidemiological.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression: Regression with a Binary Dependent Variable.
Stats Methods at IC Lecture 3: Regression.
Nonparametric Statistics
EPID 503 – Class 12 Cohort Study Design.
BINARY LOGISTIC REGRESSION
EHS Lecture 14: Linear and logistic regression, task-based assessment
Logistic Regression APKC – STATS AFAC (2016).
Advanced Quantitative Techniques
CHAPTER 7 Linear Correlation & Regression Methods
Notes on Logistic Regression
Epidemiologic Measures of Association
Applied Biostatistics: Lecture 2
THE LOGIT AND PROBIT MODELS
Statistics.
Analysis of Covariance (ANCOVA)
Lecture 9: Retrospective cohort studies and nested designs
Generalized Linear Models
Lecture 1: Fundamentals of epidemiologic study design and analysis
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Statistics 103 Monday, July 10, 2017.
Lecture 3: Introduction to confounding (part 1)
BMTRY 747: Introduction Jeffrey E. Korte, PhD
THE LOGIT AND PROBIT MODELS
Nonparametric Statistics
Lecture 4: Introduction to confounding (part 2)
Lecture 6: Introduction to effect modification (part 2)
Scatter Plots of Data with Various Correlation Coefficients
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
LEARNING OUTCOMES After studying this chapter, you should be able to
Logistic Regression.
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Introduction to Logistic Regression
Statistics II: An Overview of Statistics
Sample size.
Research Techniques Made Simple: Interpreting Measures of Association in Clinical Research Michelle Roberts PhD,1,2 Sepideh Ashrafzadeh,1,2 Maryam Asgari.
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
IRCCS San Raffaele Pisana, Rome, Italy, 28 February - 2 March 2018
Effect Modifiers.
Presentation transcript:

Lecture 10: cohort analysis (part 1): Intro to regression and dummy variables Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II Department of Public Health Sciences Medical University of South Carolina Spring 2015

Measures of association: linked to study design e.g. if you have a dichotomous outcome: Linear regression – not okay Assumes continuous normal outcome Logistic regression – okay Assumes follow-up is the same for all subjects Predictor variables can be continuous, dichotomous, ordinal, nominal Produces odds ratio for each variable Regression coefficient is natural log of adjusted OR

Measures of association: linked to study design If you have a dichotomous outcome (cont): Proportional hazards regression – usually better Also called “Cox models”, “survival analysis” Takes into account different follow-up possible for each subject; different risk sets for each case Predictor variables can be continuous, dichotomous, ordinal, nominal Produces hazard ratio for each variable Assumes that the hazard ratio is constant over time

Measures of association: linked to study design If you have a dichotomous outcome (cont): Log binomial or negative binomial Can directly estimate relative risk Preferable to logistic regression when outcome is common Good for causal models in cohort studies where you are not modeling time-to-event

Measures of association Odds ratio Can be obtained from cross-sectional, case-control, cohort study, randomized controlled trial Rate ratio, risk ratio, hazard ratio Poisson regression, log binomial, survival analysis Can be obtained from cohort study, randomized controlled trial Cross-sectional study can give you risk ratio, prevalence ratio, odds ratio

Measures of association Analysis strategy should progress in stages Univariate analysis Bivariate analysis Modeling Analysis methods are linked to: Study design Question of interest Structure of data and variable coding Appropriate measure of association

Linear regression Assume continuous normal outcome (e.g. hypertension) Model the actual outcome (no link function) Coefficient is the weight for each variable Model predicts value of Y among unexposed (i.e. all X=0), and change in Y associated with a one-unit increase for each variable, adjusted for other variables Can obtain predicted risk for an individual

Logistic regression Assume dichotomous outcome (e.g. preterm birth) Model the “logit” link function (log odds) Coefficient is the weight for each variable Intercept is log odds of outcome among unexposed Other betas are the increased “log odds” associated with a one-unit increase for each variable, adjusted for the other variables

Logistic regression This works out so that each beta (except for the intercept) is the adjusted log odds ratio for a one-unit increase in the exposure variable of interest To obtain adjusted odds ratio, exponentiate the beta

Why that works Odds:

Why that works Odds ratio: estimate odds of disease in exposed (X1=1) versus unexposed (X1=0)

Why that works Exponential mathematics: e(x+y) = ex * ey

Why that works Adjusted OR for X1 (hold X2 steady)

Why that works If X1 is (1=yes, 0=no), then: β1X1e= β1 β1X1u=0 Note that e0=1

Why that works So therefore, in logistic regression:

Why that works Note: exponentiated beta is simply the adjusted OR for a one-unit increase in the variable, no matter how it is coded Dichotomous 1=yes, 0=no Dichotomous 0=yes, 1=no Dichotomous 85=yes, 0=no Continuous ranging from 85 to 450 Ordinal (1, 2, 3, 4, 5)

Why that works One-unit increase for continuous variable:

Implication Odds ratio is assumed to be the same for a one-unit increase anywhere along the scale of the exposure variable e.g. 1 versus 0, 2 versus 1, 3 versus 2, etc. i.e. the relationship is linear in the logit This is why it is called a “multiplicative model” Each unit increase in exposure is assumed to multiply risk, rather than add to the risk, by a defined amount

Linearity in the logit Dichotomous variable: okay Two points always form a perfect line Continuous or ordinal variables: maybe Need to test the assumption by constructing “dummy variables” Categorize the variable with regular intervals Define reference category All other categories will be compared to it Assess whether dose-response curve is smooth

Dummy variables: example Outcome variable: Type II diabetes Continuous exposure variable: age Fit as continuous variable: logistic regression model will assume the OR is the same for each one-unit increase in age e.g. OR=1.14 (95% CI: 1.08 – 1.19) Does this fit the data well? Find out by testing dummy variables for age groups

Dummy variables: example Age range: 18-85 Choose categories with regular intervals to test linearity in logit Reference category should be chosen as described earlier Robust sample size May be convenient to have it be the category with expected lowest (or highest) risk

Dummy variables: example Make 7 categories: 18-29 (reference) 30-39 40-49 50-59 60-69 70-79 80-85

Dummy variables: example 6 dummy variables for 7 categories: each dichotomous (1=yes, 0=no) 18-29 (reference) 30-39: dummy variable 1 (AGE2) 40-49: dummy variable 2 (AGE3) 50-59: dummy variable 3 (AGE4) 60-69: dummy variable 4 (AGE5) 70-79: dummy variable 5 (AGE6) 80-85: dummy variable 6 (AGE7)

Dummy variables: example AGE1 AGE2 AGE3 AGE4 AGE5 AGE6 18-29 30-39 1 40-49 50-59 60-69 70-79 80-85

Dummy variables: example This coding strategy results in independent comparisons of risk between the reference group and each other category Each dummy variable corresponds to the comparison between that category (dummy variable = 1) and the reference category

Dummy variables: example Possible results (logistic regression): 18-29 (reference) 30-39: AGE2: OR=1.5 (0.7 – 2.2) 40-49: AGE3: OR=2.0 (0.9 – 2.9) 50-59: AGE4: OR=2.5 (1.3 – 3.1) 60-69: AGE5: OR=3.0 (1.8 – 4.1) 70-79: AGE6: OR=3.5 (2.0 – 4.9) 80-85: AGE7: OR=4.0 (2.4 – 6.8)

Dummy variables: example Possible results (logistic regression): 18-29 (reference) 30-39: AGE2: OR=1.5 (0.7 – 2.2) 40-49: AGE3: OR=2.0 (0.9 – 2.9) 50-59: AGE4: OR=2.5 (1.3 – 3.1) 60-69: AGE5: OR=2.6 (1.2 – 3.7) 70-79: AGE6: OR=2.4 (1.1 – 3.3) 80-85: AGE7: OR=2.5 (1.1 – 3.4)

Dummy variables: example Possible results (logistic regression): 18-29 (reference) 30-39: AGE2: OR=0.92 (0.64 – 1.4) 40-49: AGE3: OR=1.2 (0.76 – 1.5) 50-59: AGE4: OR=1.6 (1.0 – 2.4) 60-69: AGE5: OR=2.6 (1.2 – 3.7) 70-79: AGE6: OR=4.8 (2.1 – 7.1) 80-85: AGE7: OR=7.5 (2.9 – 13.4)

Dummy variables To confirm linearity in the logit: Beta coefficients for successive categories should progress in a linear scale (e.g. each category has a beta approximately 0.4 units higher than the previous beta) Odds ratios for successive categories should progress in a multiplicative scale (e.g. each category has an odds ratio approximately 1.8 times higher than the previous odds ratio)

Dummy variables If the exposure-disease relationship is not linear in the logit, may be advisable to create new dummy variables to use in modeling Choose categories (including the designation of reference category) in some meaningful way based on theoretical considerations, sample size This is in contrast to choosing standard-width categories to assess log-linearity

Dummy variables Final notes: Any simple dichotomous variable is an example of a dummy variable Also known as “nominal” variables Three types of variables: continuous, ordinal, nominal