1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.

Slides:



Advertisements
Similar presentations
Multiple Regression and Model Building
Advertisements

If we use a logistic model, we do not have the problem of suggesting risks greater than 1 or less than 0 for some values of X: E[1{outcome = 1} ] = exp(a+bX)/
M2 Medical Epidemiology
Logistic Regression Psy 524 Ainsworth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Logistic Regression.
Chance, bias and confounding
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Journal Club Alcohol, Other Drugs, and Health: Current Evidence January–February 2009.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Multiple Linear Regression Model
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Clustered or Multilevel Data
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Topic 3: Regression.
1 G Lect 11M Binary outcomes in psychology Can Binary Outcomes Be Studied Using OLS Multiple Regression? Transforming the binary outcome Logistic.
An Introduction to Logistic Regression
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Sample Size Determination
Generalized Linear Models
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic.
Chapter 13: Inference in Regression
Multiple Choice Questions for discussion
Moderation & Mediation
Assoc. Prof. Pratap Singhasivanon Faculty of Tropical Medicine, Mahidol University.
Evidence-Based Medicine 3 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding.
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
RATES AND RISK Daniel E. Ford, MD, MPH Johns Hopkins School of Medicine Introduction to Clinical Research July 12, 2010.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
Chapter 16 Data Analysis: Testing for Associations.
Chapter 13 Multiple Regression
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 11: Bivariate Relationships: t-test for Comparing the Means of Two Groups.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Multiple Regression  Similar to simple regression, but with more than one independent variable R 2 has same interpretation R 2 has same interpretation.
Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the.
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Probability and odds Suppose we a frequency distribution for the variable “TB status” The probability of an individual having TB is frequencyRelative.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Direct method of standardization of indices. Average Values n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
(www).
Bivariate analysis. * Bivariate analysis studies the relation between 2 variables while assuming that other factors (other associated variables) would.
Chapter 13 LOGISTIC REGRESSION. Set of independent variables Categorical outcome measure, generally dichotomous.
Applied Biostatistics: Lecture 2
Generalized Linear Models
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Kanguk Samsung Hospital, Sungkyunkwan University
Presentation transcript:

1 Multivariable Modeling

2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude of one phenomenon as a mathematical function of “predictors” and random variation. –The phenomenon modeled may be continuous, e.g. HDL cholesterol, or categorical, e.g. survival or death. –The model consists of l a mathematical form of an equation to predict some aspect of the distribution of the predicted phenomenon, e.g. mean cholesterol or probability of death l a probability law that describes, on a group basis, how individuals vary from what the equation predicts

3 Multivariable Modeling n The prediction equation typically includes. –The exposure of interest. –Other exposures of potential importance. –Potential confounders. n The predictors may be. –Continuous variables. –Categorical variables with two or more categories, or. –Combinations of these. n The data are used to estimate coefficients of the prediction equation, and the magnitude of random variation. n The coefficients represent the statistical effects of the various predictors, assuming that the other predictors are adjusted for by holding them constant. –The prediction equation relating the outcome to any single predictor, holding the others constant, may be linear, quadratic, cyclical, or of many other forms.

4 Multivariable Modeling: Multiple Linear Regression n Models the mean of a quantitative outcome as a function of the values of predictor variables. n Assumes independent observations with approximately Gaussian (normal) distributions. n Contrary to what the name suggests, these models need not be linear in the predictor variables. They are always, however, linear in the coefficients by which the values of predictor variables are multiplied. n Example: E(y) =  1 x 1 +  2 x 2 +  3 x 3 +  1 z 1 +  2 z 2 where x 1 is the value of the exposure of interest, x 2 and x 3 are values of other variables that may biologically affect y, and z 1 and z 2 are possible confounders. The x’s and z’s may be continuous, or values of 0-1 “dummy variables” representing categories of qualitative variables. n The Greek coefficients are estimated from the observed data.

5 Example n Outcome is systolic BP n  = 30 n X 1 is age n  1 =1.5 n X 2 is a dummy variable for gender (male=1, female=0) n  2 =10 n Equation is: Mean BP = X X … n So mean systolic blood pressure= 30 +(1.5 X age)+ 10X0 for women n And for men = 30+(1.5Xage) +10X1

6 Interpretation n The most important item here is  1 =1.5 mm Hg. n This will be reported as follows: n We found an association between age and SBP, with a mean increase of 1.5 mmHg for every increase in age of 1 year after adjustment for …. n This applies equally to men and women with the mean SBP being 10 mmHg higher in men at every age.

7 n Note that gender is not an effect modifier because the 1.5 mmHg correlation is same for men and women n Gender here is independently associated with the outcome n Could be a confounder in your crude calculations if it is associated with age in your sample n But even if it is a confounder in the crude calculation, the 1.5 mmHg correlation is already adjusted for gender. (It is adjusted for all the other variables in the equation)

8 Multiple Linear Regression n When a confounder is added into the equation, the beta of the exposure you are interested in becomes “adjusted” for the confounder. n That is to say “This is the correct association after a confounder is taken into account”. n This is how confounders are searched for in regression. By adding each into the equation and finding out whether the  for the exposure of interest changes. n Whenever a  gets close to zero that variable is taken out.

9 Regression n Think of it as each risk factor is adjusted for all the other risk factors in the model.

10 Interpretaion n “There were 10 factors significantly associated with the outcome in univariate analysis. In multivariate analysis only factors 1-5 remained significant.” n Factors 1-5 are truly associated with the outcome. n Factors 6-10 are not independently associated with the outcome. n Factors 6-10 were confounded by factors 1-5. n  for factors 6-10 became 0 after adjustment for factors 1-5

11 Multivariable Modeling: Multiple linear regression n Thus,  1 represents the predicted change in the mean value of y associated with an increase of one unit in the variable represented by x 1, with the variables represented by the other x’s and z’s held fixed. n This type of model accommodates effect modification through the use of interaction terms, e.g.,  x 1 z 2, which allows the effect of a change of one unit in x 1 to vary with the value of z 2. E(y) =  1 x 1 +  2 x 2 +  3 x 3 +  1 z 1 +  2 z 2 +  x 1 z 2.  is thus a difference of differences: how the effect on y of a one unit increase in x 1 is itself modified by a one unit increase in z 2.

12 Multivariable Modeling: Multiple linear regression E(y) = ………………………+  x 1 z 2.

13 Example n Mean SBP=………+ 0.5 X age X race n Mean SBP= X age + 10 X (1 for men and 0 for women) X age X (0 for white and 1 for black) n That is to say for every increase of 1 yr BP goes up 1.5 in whites but = 2 in blacks.

14 Multivariable Modeling: Multiple logistic regression n Models the probability of a dichotomous outcome as a function of the values of predictor variables. n Assumes independent observations with binomial distributions. n The right side of equation is the same as linear regression. The left side (the outcome) is different n Natural log of the odds of outcome =  1 x 1 +  2 x 2 +  3 x 3 +  1 z 1 +  2 z 2 e.g. the odds of response to cancer chemotherapy, and the other symbols are all as defined above for multiple linear regression.

15 Multivariable Modeling: Multiple logistic regression n More specifically: n When x 1 represents levels of a dichotomous predictor by the values 0 (absent) and 1 (present), then exp(  1 ) is the predicted odds ratio relating predictor to outcome, e.g., smoking to lung cancer, adjusted for other possible predictors and confounders. n When x 1 represents values of a quantitative predictor, then exp(  1 ) is the odds ratio between predictor and outcome, e.g., stroke and diastolic blood pressure, associated with a one unit increase in the predictor, and adjusted for other possible predictors and confounders.

16 Multivariable Modeling: Multiple Logistic Regression n Again multiple variables will be introduced to see if the OR for others will become 1 ( or close to 1). Or if the associated p-values will become NS. n These variables are then dropped out of the equation because they were not truly associated with the outcome but were only confounded by the other variables.

17 Multivariable Modeling: Multiple Logistic Regression n At the end the relevant variables’ ORs will be reported and also interactions will be reported. n The OR will be reported as the adjusted OR for that association. (Adjusted for all the variables in the model)

18 Interpretaion n Moderate alcohol consumption protects from coronary disease (OR =0.56) n It is well established that moderate alcohol consumption CAUSES an increase in HDL. n It has been postulated that alcohol’s coronary protective effect is mediated by raising HDL. n “When HDL level was introduced into the model the RR for moderate drinking increased from 0.56 to 0.77 but remained significant.”

19 Interpretaion n HDL explains some but not all of alcohol’s coronary protective effect. n the RR for alcohol (0.77) is independent of it’s effect on HDL. n Alcohol offers more protection (RR 0.56) through its effect on raising HDL. n Some of the protective effect is mediated (not confounded) by HDL n Both HDL and alcohol are truly and independently associated with decreased coronary events.

20 Not a confounder n HDL is not a confounder n Why did we adjust for it? n When should you do that? n When should you not?

21 Propensity Scores n If there isn’t enough outcomes you can’t use logistic regression. n Rifampin and Pyrazinamide versus Isoniazid for latent TB. n 411 patients. 18 cases of hepatotoxicity. n Not randomized. n Patients at higher risk for hepatotoxicity received R/P.

22 Propensity Scores n A crude comparison would be unfair to R/P. n Need some “adjustment’. n Typically we use logistic regression to look for any and all factors associated with hepatotoxicity and adjust for those. n When the outcome is rare this cannot be done.

23 Propensity Scores n We can look for factors associated with the treatment choice. n Certain variables (e.g. alcohol use) make a patient more likely to receive R/P. n These factors are given numeric scores. n The higher the score the higher the propensity to be treated with R/P

24 Propensity Scores n You calculate propensity score for every patient. n Compare patients with equal propensity scores as to the incidence of the outcome. n There might be 90 patients with the same propensity score. n They all are moderate alcohol drinkers, they all had remote history of hepatitis, and so on. n This can accommodate many many variables.

25 Those 90 patients n With identical high propensity scores had a high likelihood of receiving R/P. n Guess what 75 of them received P/R and only 15 received INH. n But NOW we can compare the incidence of that outcome in these 75 to these 15.

26 Typically n 5 groups using the quintiles of the score are used.

27 In a Clinical Trial of Platelet Inhibitor n Data were collected regarding outcomes (death etc.) n Also we have information about who received early statin therapy. n But receiving early statin was not a random process. n Totally up to clinical discretion

28 We want to study n The association between early statin therapy and outcome. n BUT n The patients who received statins are very different than those who didn’t.

29 Crude event rates n Would be unfair comparison. n PROPENSITY SCORES n We find out what factors were associated with statin use. n For example younger patients were more likely to receive statin.

30 Propensity Scores n Are then used to classify patients by quintile of increasing probability of early statin initiation. (The 1 st quintile least likely, the 5 th most likely). n Patients within each quintile were similar in their likelihood to receive a statin.

31

32

33 Patients in 1 st quintile n Were least likely to receive statin n Of 2391 patients144 received statin and 2247 did not. n All these 2391 patients are very similar in all confounding factors and can be compared.