A Conceptual Approach to Multivariable Analysis Mitchell H. Katz, MD Director of Health San Francisco Department of Public Health
Table 2. Bivariate Risk Factors for Perinatal transmission of HIV Maternal Variable No. of Women* No. of Infected Infants (%) Odds Ratio 95% Confidence IntervalP Diagnosis of genital HSV infection during pregnancy Yes216 (28.6) No38140 (10.5) Lack of zidovudine prophylaxis during pregnancy / delivery Yes12421 (16.9) No26625 (9.4) Duration of membrane rupture (h) >4> (19.2) < (7.4) Gestational age at delivery (wk) < (27.3) <.001 > (8.6) * Numbers for some variables do not add up to total because of missing data. From:Chen KT, et al. Genital herpes simplex virus infection and perinatal transmission of human immunodeficiency virus. Obstet Gynecol. 2005;106:
Table 3. Risk Factors for Perinatal Transmission of HIV Risk FactorAdjusted OR95% CIP Diagnosis of genital HSV infection during pregnancy Lack of zidovudine prophylaxis during pregnancy / delivery Rupture of membranes >4 hours Delivery at < 37 weeks of gestation
Why do Multivariable Analyses? Identify independent predictors of outcome.
Table 4. Does indicated pre-term delivery decrease neonatal morbidity in infants <1000 grams? Intraventricular hemorrhage (III/IV) YesNoTotal Indicated pre-term delivery 9 (5.8 %) 147 (94.2 %) 156 Spontaneous rupture of membranes 38 (14.9%) 217 (85.1 %) OR = 0.35 (0.16 – 0.76)
Indicated Pre-term labor Spontaneous Pre-term labor Gestational age28 weeks26 weeks
Multivariable analysis adjusting for gestational age Indicated pre-term versus spontaneous delivery Bivariate odds ratio for intraventricular hemorrhage = 0.35 ( ) Indicated pre-term versus spontaneous delivery Multivariable odds ratio for intraventricular hemorrhage = 0.66 ( ) (adjusting for gestational age)
Why do Multivariable Analyses? Identify independent predictors of outcome. Apparent associations between a risk factor and an outcome may actually be due to a third factor: a confounder.
Randomized group assignment Outcome Potential confounder
Randomized controlled trial of preoperative vaginal preparation with povidone-lodine versus abdominal scrub only prior to cesarean section Vaginal scrubAbdominal scrub only EndometritisYes10 (7%)24 (14%) No132 (93%)142 (86%) OR =.45 ( ) From:Starr R, et al. Preoperative vaginal preparation with povidone-lodine and the risk of postcesarean endometritis. Obstet Gynecol. 2005;105:
Table 5. Multivariate Analysis of Factors Affecting Risk for Postcesarean Endometritis (N=308) Variable Adjusted Odds Ratio 95% Confidence Interval Vaginal scrub Severe anemia (hematocrit <30%) Use of intrapartum internal monitors History of antenatal genitourinary infections
Does the child with CSF pleocytosis have bacterial meningitis? Multivariable Logistic Regression Analysis* Predictor β Coefficient P Value 95% CI Gram stain5.6< CSF protein >80 mg / dL2.2< Peripheral ANC > cells / mm Seizure at or before presentation CSF ANC >1000 cells / mm From:Nigrovic LE, et al. Development and validation of a multivariable predictive model to distinguish bacterial from aseptic meningitis in children in the post- Haemophilus influenzae era. Pediatrics. 2002;110:
Causality versus Prediction Models (Diagnostic/Prognostic) For prediction models it does matter whether variables cause outcome. Association is all that is needed. Prediction models must be stronger models because decision will be made based on them. For causal models small percent of variance is enough. Prediction models will not generally be used unless they are replicated successfully.
Why do Multivariable Analyses? Identify independent predictors of outcome. Apparent associations between a risk factor and an outcome may actually be due to a third factor: a confounder. Construct predictive and diagnostic models. Variation in subjects’ risk factors within study groups can result in incorrect estimates of treatment effect in nonlinear models.
Why is multivariable analysis harder? Multidimensional instead of a flat plane. Harder to assess whether the model fits the data.
Table 3.1 Type of outcome variable determines choice of multivariable analysis. Type of outcomeExample of outcome variableType of multivariable analysis* IntervalBlood pressure, weight, temperatureMultiple linear regression Analysis of variance (and related procedures) Dichotomous Death, cancer, intensive care unit admission Multiple logistic regression Time to occurrence of a dichotomous event Time to death, time to cancerProportional hazards analysis Rare outcomes and counts Time to leukemia, number of infectionsPoisson regression * This text focuses on those procedures that are bolded.
What type of independent variables can be included in multivariable analysis? YesNo DichotomousNominal IntervalOrdinal Create multiple dichotomous variables for nominal and ordinal variables.
What independent variables should be entered into model? YesNo Variable of interest Intervening variables Known confounders Variables not on the causal path Potential confounders Variables with lots of missing data
Figure 5. Maternal viral HIV viral load is an intervening variable between antiretroviral treatment and perinatal transmission.
Prediction of Preterm delivery Multivariable Odds Ratio Low plasma protein A2.2 Elevated AFP2.1 If both risk factors, risk for preterm delivery 2.2 x 2.1 = 4.6 From:Smith GCS, et al. Pregnancy-associated plasma protein A and alpha-fetoprotein and prediction of adverse perinatal outcome. Obstet Gynecol. 2006;107:161-6.
Important assumptions of multivariable analysis Observations independent Censoring is random Proportionality assumption Multiplicative assumption
Checking the assumptions of the model Plot the residuals (the difference between the observed and the estimated value). Assess the pattern.
Limitations of multivariable analysis Can only statistically adjust for those confounders you know (have measured). Models are chosen that we believe fit the data, but the fit is always imperfect.