Download presentation
Presentation is loading. Please wait.
Published byNathaniel Gaines Modified over 9 years ago
1
1 Multivariable Modeling
2
2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude of one phenomenon as a mathematical function of “predictors” and random variation. –The phenomenon modeled may be continuous, e.g. HDL cholesterol, or categorical, e.g. survival or death. –The model consists of l a mathematical form of an equation to predict some aspect of the distribution of the predicted phenomenon, e.g. mean cholesterol or probability of death l a probability law that describes, on a group basis, how individuals vary from what the equation predicts
3
3 Multivariable Modeling n The prediction equation typically includes. –The exposure of interest. –Other exposures of potential importance. –Potential confounders. n The predictors may be. –Continuous variables. –Categorical variables with two or more categories, or. –Combinations of these. n The data are used to estimate coefficients of the prediction equation, and the magnitude of random variation. n The coefficients represent the statistical effects of the various predictors, assuming that the other predictors are adjusted for by holding them constant. –The prediction equation relating the outcome to any single predictor, holding the others constant, may be linear, quadratic, cyclical, or of many other forms.
4
4 Multivariable Modeling: Multiple Linear Regression n Models the mean of a quantitative outcome as a function of the values of predictor variables. n Assumes independent observations with approximately Gaussian (normal) distributions. n Contrary to what the name suggests, these models need not be linear in the predictor variables. They are always, however, linear in the coefficients by which the values of predictor variables are multiplied. n Example: E(y) = 1 x 1 + 2 x 2 + 3 x 3 + 1 z 1 + 2 z 2 where x 1 is the value of the exposure of interest, x 2 and x 3 are values of other variables that may biologically affect y, and z 1 and z 2 are possible confounders. The x’s and z’s may be continuous, or values of 0-1 “dummy variables” representing categories of qualitative variables. n The Greek coefficients are estimated from the observed data.
5
5 Example n Outcome is systolic BP n = 30 n X 1 is age n 1 =1.5 n X 2 is a dummy variable for gender (male=1, female=0) n 2 =10 n Equation is: Mean BP =30+ 1.5 X 1 + 10 X 2 +..+… n So mean systolic blood pressure= 30 +(1.5 X age)+ 10X0 for women n And for men = 30+(1.5Xage) +10X1
6
6 Interpretation n The most important item here is 1 =1.5 mm Hg. n This will be reported as follows: n We found an association between age and SBP, with a mean increase of 1.5 mmHg for every increase in age of 1 year after adjustment for …. n This applies equally to men and women with the mean SBP being 10 mmHg higher in men at every age.
7
7 n Note that gender is not an effect modifier because the 1.5 mmHg correlation is same for men and women n Gender here is independently associated with the outcome n Could be a confounder in your crude calculations if it is associated with age in your sample n But even if it is a confounder in the crude calculation, the 1.5 mmHg correlation is already adjusted for gender. (It is adjusted for all the other variables in the equation)
8
8 Multiple Linear Regression n When a confounder is added into the equation, the beta of the exposure you are interested in becomes “adjusted” for the confounder. n That is to say “This is the correct association after a confounder is taken into account”. n This is how confounders are searched for in regression. By adding each into the equation and finding out whether the for the exposure of interest changes. n Whenever a gets close to zero that variable is taken out.
9
9 Regression n Think of it as each risk factor is adjusted for all the other risk factors in the model.
10
10 Interpretaion n “There were 10 factors significantly associated with the outcome in univariate analysis. In multivariate analysis only factors 1-5 remained significant.” n Factors 1-5 are truly associated with the outcome. n Factors 6-10 are not independently associated with the outcome. n Factors 6-10 were confounded by factors 1-5. n for factors 6-10 became 0 after adjustment for factors 1-5
11
11 Multivariable Modeling: Multiple linear regression n Thus, 1 represents the predicted change in the mean value of y associated with an increase of one unit in the variable represented by x 1, with the variables represented by the other x’s and z’s held fixed. n This type of model accommodates effect modification through the use of interaction terms, e.g., x 1 z 2, which allows the effect of a change of one unit in x 1 to vary with the value of z 2. E(y) = 1 x 1 + 2 x 2 + 3 x 3 + 1 z 1 + 2 z 2 + x 1 z 2. is thus a difference of differences: how the effect on y of a one unit increase in x 1 is itself modified by a one unit increase in z 2.
12
12 Multivariable Modeling: Multiple linear regression E(y) = ………………………+ x 1 z 2.
13
13 Example n Mean SBP=………+ 0.5 X age X race n Mean SBP= 30 +1.5 X age + 10 X (1 for men and 0 for women) + 0.5 X age X (0 for white and 1 for black) n That is to say for every increase of 1 yr BP goes up 1.5 in whites but 1.5 + 0.5 = 2 in blacks.
14
14 Multivariable Modeling: Multiple logistic regression n Models the probability of a dichotomous outcome as a function of the values of predictor variables. n Assumes independent observations with binomial distributions. n The right side of equation is the same as linear regression. The left side (the outcome) is different n Natural log of the odds of outcome = 1 x 1 + 2 x 2 + 3 x 3 + 1 z 1 + 2 z 2 e.g. the odds of response to cancer chemotherapy, and the other symbols are all as defined above for multiple linear regression.
15
15 Multivariable Modeling: Multiple logistic regression n More specifically: n When x 1 represents levels of a dichotomous predictor by the values 0 (absent) and 1 (present), then exp( 1 ) is the predicted odds ratio relating predictor to outcome, e.g., smoking to lung cancer, adjusted for other possible predictors and confounders. n When x 1 represents values of a quantitative predictor, then exp( 1 ) is the odds ratio between predictor and outcome, e.g., stroke and diastolic blood pressure, associated with a one unit increase in the predictor, and adjusted for other possible predictors and confounders.
16
16 Multivariable Modeling: Multiple Logistic Regression n Again multiple variables will be introduced to see if the OR for others will become 1 ( or close to 1). Or if the associated p-values will become NS. n These variables are then dropped out of the equation because they were not truly associated with the outcome but were only confounded by the other variables.
17
17 Multivariable Modeling: Multiple Logistic Regression n At the end the relevant variables’ ORs will be reported and also interactions will be reported. n The OR will be reported as the adjusted OR for that association. (Adjusted for all the variables in the model)
18
18 Interpretaion n Moderate alcohol consumption protects from coronary disease (OR =0.56) n It is well established that moderate alcohol consumption CAUSES an increase in HDL. n It has been postulated that alcohol’s coronary protective effect is mediated by raising HDL. n “When HDL level was introduced into the model the RR for moderate drinking increased from 0.56 to 0.77 but remained significant.”
19
19 Interpretaion n HDL explains some but not all of alcohol’s coronary protective effect. n the RR for alcohol (0.77) is independent of it’s effect on HDL. n Alcohol offers more protection (RR 0.56) through its effect on raising HDL. n Some of the protective effect is mediated (not confounded) by HDL n Both HDL and alcohol are truly and independently associated with decreased coronary events.
20
20 Not a confounder n HDL is not a confounder n Why did we adjust for it? n When should you do that? n When should you not?
21
21 Propensity Scores n If there isn’t enough outcomes you can’t use logistic regression. n Rifampin and Pyrazinamide versus Isoniazid for latent TB. n 411 patients. 18 cases of hepatotoxicity. n Not randomized. n Patients at higher risk for hepatotoxicity received R/P.
22
22 Propensity Scores n A crude comparison would be unfair to R/P. n Need some “adjustment’. n Typically we use logistic regression to look for any and all factors associated with hepatotoxicity and adjust for those. n When the outcome is rare this cannot be done.
23
23 Propensity Scores n We can look for factors associated with the treatment choice. n Certain variables (e.g. alcohol use) make a patient more likely to receive R/P. n These factors are given numeric scores. n The higher the score the higher the propensity to be treated with R/P
24
24 Propensity Scores n You calculate propensity score for every patient. n Compare patients with equal propensity scores as to the incidence of the outcome. n There might be 90 patients with the same propensity score. n They all are moderate alcohol drinkers, they all had remote history of hepatitis, and so on. n This can accommodate many many variables.
25
25 Those 90 patients n With identical high propensity scores had a high likelihood of receiving R/P. n Guess what 75 of them received P/R and only 15 received INH. n But NOW we can compare the incidence of that outcome in these 75 to these 15.
26
26 Typically n 5 groups using the quintiles of the score are used.
27
27 In a Clinical Trial of Platelet Inhibitor n Data were collected regarding outcomes (death etc.) n Also we have information about who received early statin therapy. n But receiving early statin was not a random process. n Totally up to clinical discretion
28
28 We want to study n The association between early statin therapy and outcome. n BUT n The patients who received statins are very different than those who didn’t.
29
29 Crude event rates n Would be unfair comparison. n PROPENSITY SCORES n We find out what factors were associated with statin use. n For example younger patients were more likely to receive statin.
30
30 Propensity Scores n Are then used to classify patients by quintile of increasing probability of early statin initiation. (The 1 st quintile least likely, the 5 th most likely). n Patients within each quintile were similar in their likelihood to receive a statin.
31
31
32
32
33
33 Patients in 1 st quintile n Were least likely to receive statin n Of 2391 patients144 received statin and 2247 did not. n All these 2391 patients are very similar in all confounding factors and can be compared.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.