Download presentation
Presentation is loading. Please wait.
1
(AERA on-line video – cost is $105)
What would it take to Change your Inference? Quantifying the Discourse about Causal Inferences in the Social Sciences: Replacement of Cases 2018 Kenneth A. Frank #konfoundit (AERA on-line video – cost is $105) Motivation Statistical inferences are often challenged because of uncontrolled bias. There may be bias due to uncontrolled confounding variables or non-random selection into a sample. We will answer the question about what it would take to change an inference by formalizing the sources of bias and quantifying the discourse about causal inferences in terms of those sources. For example, we will transform challenges such as “But the inference of a treatment effect might not be valid because of pre-existing differences between the treatment groups” to questions such as “How much bias must there have been due to uncontrolled pre-existing differences to make the inference invalid?” Approaches In part I we will use Rubin’s causal model to interpret how much bias there must be to invalidate an inference in terms of replacing observed cases with counterfactual cases or cases from an unsampled population. This generates statements such as “One would have to replace qqq% of the cases with cases with no effect to invalidate the inference.” In part II, we will quantify the robustness of causal inferences in terms of correlations associated with unobserved variables or in unsampled populations. This generates statements such as “An omitted variable would have to be correlated at rrr with the treatment and outcome to invalidate the inference.” Calculations for bivariate and multivariate analysis will be presented using an app: as well as macros in STATA and R and a spreadsheet for calculating indices [KonFound-it!]. Format The format will be a mixture of presentation, individual exploration, and group work. Participants may include graduate students and professors, although all must be comfortable with basic regression and multiple regression. Participants should bring their own laptop, or be willing to work with another student who has a laptop. Participants may choose to bring to the course an example of an inference from a published study or their own work, as well as data analyses they are currently conducting. Participants will learn to quantify the robustness of statistical and causal inferences from quantitative analyses. They will learn how to generate statements such as “To invalidate the inference XX% of the estimate must be due to bias” or “To invalidate the inference about the effect of a treatment on an outcome an omitted variable would have to be correlated at ZZ with the treatment and with the outcome.” Participants will learn calculate the specific quantities using the KonFoundit app (konfound-it.com/), Excel spreadsheet or STATA macros. The techniques can be applied to concerns about internal and external validity in participants’ own analyses or to inferences in the literature.
2
Materials for course
3
Quick Examples of Quantifying the Robustness to Invalidate an Inference
Downloads: spreadsheet for calculating indices [KonFound-it!©] quick examples powerpoint for replacement of cases powerpoint for correlation framework powerpoint for comparison of frameworks The value of controlling for pretests published empirical examples R Shiny app for sensitivity analysishttp://konfound-it.com. R Shiny app KonFound-it The whole package is at: project.org/web/packages/konfound/ STATA: . ssc install konfound . ssc install moss . ssc install matsort . ssc install indeplist Commands are konfound for a procedure you ran in stata Pkonfound for published example Mkonfound for multiple analyses
4
Comparison of Correlation and % Bias to Invalidate Frameworks
Different bases for evaluating r-r# Regression as linear adjustment to replace cases Applications of the Frameworks
5
% bias necessary to invalidate the inference
.9 .8 .7 .6 .5 .4 .3 .2 .1 r % bias necessary to invalidate the inference { r } r#
6
Correlation framework
.9 .8 .7 .6 .5 .4 .3 .2 .1 r r# r r# Robustness refers to alternatve world with different sample or control variables. It’s a counterfactual world. MAYBE NEED A SLIDE THAT HAS SAME R BUT DIFFERENT THRESHOLD? R# scales by different n (assuming threshold defined by stat sig) versus % bias that scales by effect size. One scales by the study design (sample size) then other scales by the study result (effect size).
7
.9 .8 .7 .6 .5 .4 .3 .2 .1 r r r# r r# Robustness refers to alternatve world with different sample or control variables. It’s a counterfactual world. MAYBE NEED A SLIDE THAT HAS SAME R BUT DIFFERENT THRESHOLD? R# scales by different n (assuming threshold defined by stat sig) versus % bias that scales by effect size. ITCV gives credit to large effects exceeding large thresholds (for small n). The % bias gives credit to smaller effects exceeding smaller thresholds. If threshold base don stat sig, then % bias gives more credit to larger studies, ITCV gives more credit to smaller studies. ITCV scales by the study design (gives credt to smaller sample sizes), % bias scales by the study result (effect size).
8
.9 .8 .7 .6 .5 .4 .3 .2 .1 r r# r r# Robustness refers to alternatve world with different sample or control variables. It’s a counterfactual world.
9
Transforming % Bias to Replace to ITCV Hmmmm….
10
Reflection What part if most confusing to you?
Why? More than one interpretation? Talk with one other, share Find new partner and problems and solutions
11
Equating the expressions, evaluating for % mixture of unobserved cases (π)
12
Impact Threshold for a confounding variable
Qinyun Lin Impact Threshold for a confounding variable % bias to invalidate
13
Linear Adjustment to Invalidate Inference
Linear model distributes change over all cases How many cases would you have to replace with zero effect counterfactuals to change the inference? Assume threshold is 4 (δ# =4): 1- δ# / =1-4/6=.33 =(1/3) 9 9 10 11 6 3 4 5 +6=9 +3=6+2=6 +1=6 6.00 6 4 The inference would be invalid if you replaced 33% (or 1 case) with counterfactuals for which there was no treatment effect. New estimate=(1-% replaced) +%replaced(no effect)= (1-%replaced) =(1-.33)6=.66(6)=4
14
Integrated Framework In case replacement, πi represents the probability of a case being replaced. In the omitted variables case πi is constant across all i. The omitted variable = f(yideal -yobserved ) where f is any linear function In the weighted case, 0< πi < 1, varying across cases.
15
From previous, regression adjustment can be thought of as case replacement. Adjusting for an omitted variable is understood as replacing some cases with adjusted others. This is consistent with Heckman. So the replacement can come from within as in counterfactual, or from other sample. The real difference in the frameworks is in whether you replace all cases with small adjustments or some cases with full adjustment and others with 0. Could come up with a % adjustment weight. Takes a value of 1 or 0 for case replacement, takes equal weights across all cases for regression Yiideal =wi Yiobserved + (1-wi )Yiunbserved In case replacement, wi =1 with probability 1-π, where π =the proportion of cases that must be replaced to invalidate the inference. In the omitted variables case all wi =1-π. Replacement of the values of Y is achieved through an omitted variable. omitted variable = π(Yiunobserved -Yiobserved ). Where Yiunobserved is just a constant representing no treatment effect
17
How to get correlations from % bias
18
Control for Confound ↔Adjusting the Outcome
Replacing observed cases with linear counterfactuals 6
19
Replacement with null hypothesis case: rxy|cv=0
21
Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept x cv confound
22
Comparison of Frameworks
% bias to invalidate Any estimate + threshold (e.g. t critical*standard error) Case replacement Good for experimental settings (treatments) Think in terms of cases (people or schools) counterfactual or unsampled population Assume equal effect of replacing any case Or weighted cases with weighted replacements Correlational Uses causal diagrams (Pearl) Good for continuous predictors Linear models only Think in terms of correlations, variables ITCV good for internal validity, not good for comparing between studies (different thresholds make comparison difficult) Both can be applied to Internal and External Validity any threshold
23
Extension of ITCV to Logistic
logistic can be thought of as iteratively weighted least squares, but the weights would change with the omitted variable Simulation: Kelcey, B Improving and Assessing Propensity Score Based Causal Inferences in Multilevel and Nonlinear Settings. Unpublished doctoral dissertation, University of Michigan Ichino, Andrea, Fabrizia Mealli, and Tommaso Nannicini “From temporary help jobs to permanent employment: what can we learn from matching estimators and their sensitivity? ” Journal of Applied Econometrics 23:305–327. Nannicini, Tommaso “Simulation–based sensitivity analysis for matching estimators.” Stata Journal 7:334–350. Table David J. Harding “Counterfactual Models of Neighborhood Effects: The Effect of Neighborhood Poverty on High School Dropout and Teenage Pregnancy.” American Journal of Sociology 109(3):
24
Omitted Var and % bias to invalidate
So here we are thinking about Rxy as a correlation that can be conceptualized as adjusted by replacing π observed cases with unobserved cases in which a relationship between x and y has been conditioned on cv rxy|cv. If we set rxy|cv=0 we get:π=1-r#/r, the % bias to invalidate. So how do we get rxy|cv=0? So rxy|cv=0 if rxcv =1 and correspondingly rycv = rxy. That is, if there is an omitted confound in the unobserved data that is perfectly correlated with x. The inference is invalid if we replace π cases with cases in which an omitted variable is perfectly correlated with x. It could also be rxy|cv=0 if rycv =1 and correspondingly rxcv = rxy. That is, the inference is invalid if we replace π cases with cases in which an omitted variable is perfectly correlated with the outcome – cv entirely explains the outcome. Or we could have rxcv = rycv = (rxy).5 Or any combination such that rxcv = rxy / rycv.
25
Internal vs External Validity
ITCV can be applied to external validity if you follow Heckman to describe sample bias as an omitted variable Counterfactuals can be considered an unsampled population. So then the only difference between internal validity and external validity is that the counterfactuals cannot possible be sampled, whereas for externall validity the researcher has chosen not to sample from a population, it is about the agency of the researcher.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.