Download presentation
Presentation is loading. Please wait.
1
Improving Overlap Farrokh Alemi, Ph.D.
This presentation reviews ordinary regression. This brief presentation was organized by Dr. Alemi.
2
Overlap The problem with stratification is that as the number of covariates increases fewer and fewer cases and controls match. As the number of covariates increases, the number of cases per stratum decreases, and combinations of covariates become quite rare. In these circumstances, it is possible that a large portion of the cases may not have matching controls and therefore are not used, reducing the generalizability of the findings.
3
π π π π π π π π π π + π (1βπ π π π ) %
Overlap= π π π π π π π π π π + π (1βπ π π π ) % Percent overlap reports the percent of cases that are matched to controls. In this equation, c is an index to cases, and π π is 1 when the case is matched to a control and 0 otherwise.
4
π π π π π π π π π π + π (1βπ π π π ) %
Overlap= π π π π π π π π π π + π (1βπ π π π ) % The parameter π π indicates the percent of covariates in the case that were matched to the controls. If, in the case, all covariates were matched then π π =100.
5
Unmatched Controls are Ignored
Note that the percent of overlap does not depend on controls that were not matched. The intent of the analysis is to examine the effect of treated patients; thus what matters is matching to the cases; unmatched controls donβt affect the treatment effect and therefore can be ignored.
6
Better than 80% When the percent of overlap is low (e.g. lower than 80%), then findings cannot be generalized as many cases are not matched to controls.
7
Reduce Stratification & Increase Overlap
The remainder of this lecture discusses how one can reduce stratification, increase overlap between cases and controls, and improve the generalizability of stratified covariate balancing
8
Parents in Markov Blanket
Expected Values Parents in Markov Blanket Synthetic Controls 1 2 3 We discuss three strategies. The first is to modify the case so a partial match can be made. This approach calculates the outcome for the modified approach using expected values. The second approach is to use Parents in Markov Blanket of treatment to identify irrelevant covariates and dropping these covariates from the analysis. The third approach is to add in new synthetic controls.
9
1. Partial Match through Expected Values
The first method we discuss has to alter the case and its outcomes to levels that can be matched to the controls. In this method, percent of overlap is increased by partially matching the covariates among the unmatched cases. The unmatched case is altered to the largest portion of the case that matches to at least one control. The matched covariates are referred to as the shared component. The outcome for the altered cases is set to the expected outcome for all cases that share the common component.
10
Partial Match: Expected Values Partial Match Case Male 70 Years Unable
to walk Outcome Over Ages Control Outcome Over Ages For example, suppose a male, 70 years old patient, with walking disability has no match among the controls. The closest match we can find are male patients with walking disabilities. Then the patients age is dropped from the analysis. The outcome for the new case is the average for all male disabled cases, which includes individuals in different age groups. This new case is matched to male patients unable to walk among the controls.
11
Partial Match: Expected Values Partial Match Case Male 70 Years Unable
to walk Outcome Over Ages Control Outcome Over Ages The outcome is changed to the expected values associated with the shared component. The percent of overlap is improved in these cases by the portion of covariates matched, in this case by 2 out of 3..
12
Parents in Markov Blanket
2. Partial Match through Parents in Markov Blanket The second method of improving overlap is to drop irrelevant variables through identifying parents in the Markov Blanket of Treatment
13
Parents in Markov Blanket
Partial Match: Parents in Markov Blanket Markov Blanket of treatment is a set of covariates that block the effect of other covariates on treatment. The Markov Blanket include parents, children and co-parents, named for direct causes, effects, and direct causes of effects, respectively.
14
Parents in Markov Blanket
Partial Match: Parents in Markov Blanket Parents in the Markov Blanket are identified by focusing the analysis on independent variables that occur prior to treatment. In electronic health records healthcare events are time stamped and it is relatively easy to identify what has occurred prior to the treatment. Obviously age, gender and demographics are set at birth so they occur prior to treatment. Medical history and comorbidities also occur prior to treatment.
15
Parents in Markov Blanket
LASSO Regression: Parents in Markov Blanket Many algorithms for identifying the parents in Markov Blanket exist, here we focus on one of these algorithms that uses LASSO regression. LASSO regression is a type of regression that limits variables that have a statistically significant impact to those that have a large effect size.
16
Parents in Markov Blanket
LASSO Regression: Parents in Markov Blanket π π = π½ 0 + π½ 1 π 1π + π½ 2 π 2π +β¦+ π½ π π ππ + π π Prior to conducting the regression, we exclude covariates that occur after treatment. This steps remove covariates in the causal path from treatment to outcome. For example, complications of treatment are excluded from the list of independent variables.
17
Regress Treatment on Prior Events
LASSO Regression: Parents in Markov Blanket Regress Treatment on Prior Events Next, the treatment variable is regressed on independent variables that occur before the treatment, for example, patient demographics, medical history, or comorbidities.
18
Significant & Large Main Effects
LASSO Regression: Parents in Markov Blanket Significant & Large Main Effects Parents in the Markov Blanket consist of covariates that (a) have a statistically significant impact on the outcome and (b) have an effect size greater than a pre-set cutoff value. Stratification focuses on parents in Markov Blanket and ignores other covariates. The procedure allows for matching to relevant covariates and ignoring irrelevant variables. It has been shown to reduce the number of covariates by 3 to 4000 folds depending on the number of variables in the initial data.
19
3. Add Synthetic Controls
The third method of improving overlap is to add synthetic controls. The approach is similar in idea to oversampling rare events. It tries to add in from the existing sample what might have occurred for missing controls.
20
Create Model of Outcomes for Control Patients
Add Synthetic Cases Create Model of Outcomes for Control Patients The analysis is not done on all of the data. It focuses only on control patients, because the missing controls must reflect the pattern of outcomes among the controls. Missing outcome for the control case is estimated from a model of the data, usually using regression or 2 nearest cases. In these models it is important to take into account interactions among the covariates
21
Features of Unmatched Case
Add Synthetic Cases Predict Outcome for Features of Unmatched Case The model is used to predict the outcome for the missing control by evaluating the model at covariates in the unmatched cases.
22
Add Synthetic Cases Survival=β1+ 1β.8 Male 1β.4 Unable to Walk
1β.2 Unable to bathe 1β.9 Unable to Toilet 1β.3 above 74 years old . For example, suppose that a male, 70-year old, resident who is unable to walk is an unmatched case. We need to look for a control that would match this case. Also assume that for control patients the outcome, survival rate, is predicted by the equation shown here. Now that the predicted outcome is available, the case can be added to controls and analysis repeated with the synthetic controls matching to previously unmatched cases.
23
Several methods exist for improving overlap between cases and Controls
Take Home Message is that different methods exist for improving match between cases and controls.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.