POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)

Slides:



Advertisements
Similar presentations
M2 Medical Epidemiology
Advertisements

Logistic Regression Psy 524 Ainsworth.
Objectives 10.1 Simple linear regression
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins
KRUSKAL-WALIS ANOVA BY RANK (Nonparametric test)
Simple Logistic Regression
EPID Introduction to Analysis and Interpretation of HIV/STD Data Confounding Manya Magnus, Ph.D. Summer 2001 adapted from M. O’Brien and P. Kissinger.
What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007.
Chance, bias and confounding
x – independent variable (input)
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
EPI 809/Spring Multiple Logistic Regression.
STAT262: Lecture 5 (Ratio estimation)
Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data.
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
Regression and Correlation
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
Unit 6: Standardization and Methods to Control Confounding.
Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic.
Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.
Concepts of Interaction Matthew Fox Advanced Epi.
Evidence-Based Medicine 4 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
Simple Linear Regression
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Evidence-Based Medicine 3 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
POTH 612A Quantitative Analysis Dr. Nancy Mayo. © Nancy E. Mayo A Framework for Asking Questions Population Exposure (Level 1) Comparison Level 2 OutcomeTimePECOT.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
The binomial applied: absolute and relative risks, chi-square.
What is “collapsing”? (for epidemiologists) Picture a 2x2 tables from Intro Epi: (This is a collapsed table; there are no strata) DiseasedUndiseasedTotal.
Analytical epidemiology Disease frequency Study design: cohorts & case control Choice of a reference group Biases Alain Moren, 2006 Impact Causality Effect.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Case Control Study : Analysis. Odds and Probability.
A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Multiple Logistic Regression STAT E-150 Statistical Methods.
1 Chapter 16 logistic Regression Analysis. 2 Content Logistic regression Conditional logistic regression Application.
Instructor Resource Chapter 15 Copyright © Scott B. Patten, Permission granted for classroom use with Epidemiology for Canadian Students: Principles,
Matching. Objectives Discuss methods of matching Discuss advantages and disadvantages of matching Discuss applications of matching Confounding residual.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Probability and odds Suppose we a frequency distribution for the variable “TB status” The probability of an individual having TB is frequencyRelative.
Logistic Regression For a binary response variable: 1=Yes, 0=No This slide show is a free open source document. See the last slide for copyright information.
Analysis of matched data Analysis of matched data.
Meta-analysis of observational studies Nicole Vogelzangs Department of Psychiatry & EMGO + institute.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
(www).
Remember the equation of a line: Basic Linear Regression As scientists, we find it an irresistible temptation to put a straight line though something that.
Measures of disease frequency Simon Thornley. Measures of Effect and Disease Frequency Aims – To define and describe the uses of common epidemiological.
Chapter 9: Case Control Studies Objectives: -List advantages and disadvantages of case-control studies -Identify how selection and information bias can.
CHAPTER 7 Linear Correlation & Regression Methods
Discussion: Week 4 Phillip Keung.
Sec 9C – Logistic Regression and Propensity scores
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Scatter Plots of Data with Various Correlation Coefficients
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
1/18/2019 ST3131, Lecture 1.
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Case-control studies: statistics
Effect Modifiers.
Presentation transcript:

POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)

Which method does not control for confounding? A)Stratification B)Exclusion criteria C)Regression modelling D)Objective assessment of outcomes

Observational epidemiology Usually in epidemiology studies are “observational” Myriad factors determine the occurrence of disease Trying to elicit the effects of specific factors from others (confounding variables) is often difficult Regression models (as alternative to stratification) are useful

Often too many confounders – stratifying leads to too many strata e.g. 4 categories of AGE, 2 of sex, 4 of ethnicity = 32 strata. Empty cells problematic Need better, more statistically efficient, way to deal with problem Want to control for (many) possible confounding variables while eliciting effect (relative risk or odds ratio) of an exposure of interest Building statistical models is one solution Stratification difficulty

What is a [statistical regression] model? Usually regarded as a formula that relates an outcome Y to one or more predictors (exposures) X 1 X 2 ….of Y The formula imposes a framework that we assume is the way we think Y is related to X 1, X 2,…. in the real world Model is specified as unknown parameters estimated from data – ‘model fitting’.

Linear (regression) model Often may consider “Y increases with X” e.g blood pressure increases with age May also consider it does so “linearly” Data seems to support this Though with much variability.

Straight line model (simple linear regression model) for how Y “depends on X” X Y 0 This is the model structure, framework, Fitting to data involves drawing a “good fitting” line through the points – line gives mean Y for given X [E(Y|X)]

Regression Relationship between X and Y Y “depends” on X (rather than X depends on Y) Y is dependent (outcome, disease) variable X is independent (exposure, predictor, covariate) variable

Consider 2 potential predictors of Y, say X 1 X 2 Can plot data scatter points in a 3-dimensional space: Y X1X1 X2X2

Analog of a line in 2-D is a plane in 3-D: Y X1X1 X2X2

Straight line model with just X 1 is Extending this to a plane is Or further Here E(Y|…) means “average Y given ….”

Binary Y in epidemiology In epi, Y is often a binary disease/no disease outcome X 1 X 2 etc are risk factors for the disease. One of which may be an exposure of interest, the others confounders.

logistic model: need to modify to account for binary Y, occurrence of disease D Again information on X 1 X 2 …collapsed into a risk score relationship between probability of disease and Q follows now follows logistic formula:

Logistic regression formula is of the form Probability of CHD=e Q /(1+e Q ) where Q is a weighted average of risk factors (a linear score). For example: Q= *SMOKE+ 0.41*SEX and SEX=1 if man, 0 if woman SMOKE=1 if smokes, 0 if no The values -5.31, 1.09, 0.41 are estimated from the data and are the “beta-coefficients”.

The model gives a probability for each of the 4 combinations: Smoking man has probability Q= x x 1= Prob = e /(1+e )= Nonsmoking man Q= x x 1= -4.91, Prob= Nonsmoking woman Q=-5.32, Prob= Smoking woman Q= , Prob=

Relative risk estimates RR for smoking (in men) is: / =2.92 RR for smoking (in women) is: / =2.95 Notice these are also approximately e 1.09 =2.97 i.e take exponential of beta-coefficient of variable estimates its RR (actually e 1.09 =2.97 is the disease odds ratio, but approx equal to RR when disease is rare)

Why logistic formula? Ans: P(D|…) always between 0 and 1 whatever value of Q i.e. behaves like a probability should.

Can include as many variables in Q as we like: Q= SMOKE+0.31SEX+.124AGE -0.2ETHNIC … but model may be too ambitious. i.e. Can a single model be expected to really accurately account for effects of numerous variables?

Logistic model in epidemiology: controlling for confounding Y is occurrence of disease on a cohort study X 1 is binary exposure of interest X 2 X 3 … are confounding risk factors  1 is effect of X 1 “controlling” for effects of X 2 X 3 etc

Relative risk In fact is approximately relative risk of X 1 (assumed to be same for all values of X 2 X 3 ) - no effect modification/interaction RR is (assumed) same whatever X 2, X 3, etc e.g. if X 1 is smoking, X 2 age, X 3 is alcohol RR for smoking is same whatever age, alcohol

Case-control studies Development is for cohort studies (since probs of disease P( D | …) are estimable in a cohort study)…. ….but can use for case-control studies too (even though probs of disease are not estimable) Can still use as RR estimate.

Logistic Modelling advantages Can adjust for many confounders at once beta coefficients give odds ratio estimates of relative risk, valid if disease rare deals with “interactions” (effect modification) if necessary easy to do on computer gives confidence intervals, P-values etc can apply to case-control data

Disadvantages Model is just a model - not necessarily reality Black box approach, can lose touch with data Requires decisions: what variables in model? How to code variables? Continuous or dichotomised? ORs not valid as RR for non-rare disease (in cohort)

Logistic regression is favoured in epidemiology because: A)It can be used to adjust for many confounders at once B)It enhances statistical power over stratification C)It results in an outcome that is constrained between one and zero (the domain of a probability).

How do you estimate an odds ratio from a logistic model? A)It is equal to the beta coefficient B)It is equal to the exponential of the beta coefficient C)It is equal to the logit of the sum of the product of the variables and the beta coefficients.

Which one of the following statements are true? A)The choice of independent and dependent variable in regression modelling is unimportant B)A regression model estimates the average value of the dependent variable, given the values of a number of independent variables C)Independent variables are outcomes and dependent variables exposures.