(AERA on-line video – cost is $105)

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Lesson 10: Linear Regression and Correlation
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Chapter 10 Simple Regression.
Linear Regression with One Regression
Quantifying Statistical Control: the Threshold of Theoretical Randomization Kenneth A. Frank Minh Duong Spiro Maroulis Michigan State University Ben Kelcey.
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
SIMPLE LINEAR REGRESSION
Correlational Designs
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 14 Inferential Data Analysis
Richard M. Jacobs, OSA, Ph.D.
Simple Linear Regression Analysis
The Practice of Social Research
SIMPLE LINEAR REGRESSION
Replacement Cases Framework Conclusion Correlational Framework overview Thresholds for inference and % bias to invalidate The counterfactual paradigm Internal.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Replacement Cases Framework overview Thresholds for inference and % bias to invalidate The counterfactual paradigm Internal validity example: kindergarten.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 3: The Foundations of Research 1.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Chapter 16 Data Analysis: Testing for Associations.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
Regression Analysis © 2007 Prentice Hall17-1. © 2007 Prentice Hall17-2 Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated.
Correlation & Regression Analysis
Chapter 8: Simple Linear Regression Yang Zhenlin.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Randomized Assignment Difference-in-Differences
Chapter 13 Understanding research results: statistical inference.
Chapter 22 Inferential Data Analysis: Part 2 PowerPoint presentation developed by: Jennifer L. Bellamy & Sarah E. Bledsoe.
Replacement Cases Framework Conclusion Correlational Framework overview Thresholds for inference and % bias to invalidate The counterfactual paradigm Internal.
PO 141: INTRODUCTION TO PUBLIC POLICY Summer I (2015) Claire Leavitt Boston University.
Methods of Presenting and Interpreting Information Class 9.
Stats Methods at IC Lecture 3: Regression.
Logic of Hypothesis Testing
Chapter 4 Basic Estimation Techniques
Linear Regression with One Regression
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
26134 Business Statistics Week 5 Tutorial
Correlation and Regression
Comparison of Correlation and % Bias to Invalidate Frameworks
AERA workshop April 4, 2014 (AERA on-line video – cost is $95)
12 Inferential Analysis.
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
I271B Quantitative Methods
Regression Analysis Week 4.
CHAPTER 29: Multiple Regression*
R Shiny app KonFound-it! (konfound-it.com/)
Welcome to on-line audience, ask questions with microphone
Tue 8-10, Period III, Jan-Feb 2018
12 Inferential Analysis.
Sampling and Power Slides by Jishnu Das.
Product moment correlation
R Shiny app KonFound-it! (konfound-it.com/)
(AERA on-line video – cost is $105)
Case Replacement for Logistic Regression
Alternative Scenarios and Related Techniques
Regression Part II.
(AERA on-line video – cost is $105)
(AERA on-line video – cost is $105)
CS 594: Empirical Methods in HCC Experimental Research in HCI (Part 1)
MGS 3100 Business Analysis Regression Feb 18, 2016
(AERA on-line video – cost is $105)
Presentation transcript:

(AERA on-line video – cost is $105) What would it take to Change your Inference? Quantifying the Discourse about Causal Inferences in the Social Sciences: Replacement of Cases 2018 Kenneth A. Frank #konfoundit (AERA on-line video – cost is $105) Motivation Statistical inferences are often challenged because of uncontrolled bias. There may be bias due to uncontrolled confounding variables or non-random selection into a sample. We will answer the question about what it would take to change an inference by formalizing the sources of bias and quantifying the discourse about causal inferences in terms of those sources. For example, we will transform challenges such as “But the inference of a treatment effect might not be valid because of pre-existing differences between the treatment groups” to questions such as “How much bias must there have been due to uncontrolled pre-existing differences to make the inference invalid?” Approaches In part I we will use Rubin’s causal model to interpret how much bias there must be to invalidate an inference in terms of replacing observed cases with counterfactual cases or cases from an unsampled population. This generates statements such as “One would have to replace qqq% of the cases with cases with no effect to invalidate the inference.” In part II, we will quantify the robustness of causal inferences in terms of correlations associated with unobserved variables or in unsampled populations. This generates statements such as “An omitted variable would have to be correlated at rrr with the treatment and outcome to invalidate the inference.” Calculations for bivariate and multivariate analysis will be presented using an app: http://konfound-it.com as well as macros in STATA and R and a spreadsheet for calculating indices [KonFound-it!].   Format The format will be a mixture of presentation, individual exploration, and group work. Participants may include graduate students and professors, although all must be comfortable with basic regression and multiple regression. Participants should bring their own laptop, or be willing to work with another student who has a laptop. Participants may choose to bring to the course an example of an inference from a published study or their own work, as well as data analyses they are currently conducting. Participants will learn to quantify the robustness of statistical and causal inferences from quantitative analyses. They will learn how to generate statements such as “To invalidate the inference XX% of the estimate must be due to bias” or “To invalidate the inference about the effect of a treatment on an outcome an omitted variable would have to be correlated at ZZ with the treatment and with the outcome.” Participants will learn calculate the specific quantities using the KonFoundit app (konfound-it.com/), Excel spreadsheet or STATA macros. The techniques can be applied to concerns about internal and external validity in participants’ own analyses or to inferences in the literature.

Materials for course https://www.msu.edu/~kenfrank/research.htm#causal

Quick Examples of Quantifying the Robustness to Invalidate an Inference https://msu.edu/~kenfrank/research.htm Downloads: spreadsheet for calculating indices [KonFound-it!©] quick examples powerpoint for replacement of cases powerpoint for correlation framework powerpoint for comparison of frameworks The value of controlling for pretests published empirical examples R Shiny app for sensitivity analysishttp://konfound-it.com. R Shiny app KonFound-it The whole package is at: https://cran.r- project.org/web/packages/konfound/ STATA: . ssc install konfound . ssc install moss . ssc install matsort . ssc install indeplist Commands are konfound for a procedure you ran in stata Pkonfound for published example Mkonfound for multiple analyses

Comparison of Correlation and % Bias to Invalidate Frameworks Different bases for evaluating r-r# Regression as linear adjustment to replace cases Applications of the Frameworks

% bias necessary to invalidate the inference .9 .8 .7 .6 .5 .4 .3 .2 .1 r % bias necessary to invalidate the inference { r } r#

Correlation framework .9 .8 .7 .6 .5 .4 .3 .2 .1 r r# r r# Robustness refers to alternatve world with different sample or control variables. It’s a counterfactual world. MAYBE NEED A SLIDE THAT HAS SAME R BUT DIFFERENT THRESHOLD? R# scales by different n (assuming threshold defined by stat sig) versus % bias that scales by effect size. One scales by the study design (sample size) then other scales by the study result (effect size).

.9 .8 .7 .6 .5 .4 .3 .2 .1 r r r# r r# Robustness refers to alternatve world with different sample or control variables. It’s a counterfactual world. MAYBE NEED A SLIDE THAT HAS SAME R BUT DIFFERENT THRESHOLD? R# scales by different n (assuming threshold defined by stat sig) versus % bias that scales by effect size. ITCV gives credit to large effects exceeding large thresholds (for small n). The % bias gives credit to smaller effects exceeding smaller thresholds. If threshold base don stat sig, then % bias gives more credit to larger studies, ITCV gives more credit to smaller studies. ITCV scales by the study design (gives credt to smaller sample sizes), % bias scales by the study result (effect size).

.9 .8 .7 .6 .5 .4 .3 .2 .1 r r# r r# Robustness refers to alternatve world with different sample or control variables. It’s a counterfactual world.

Transforming % Bias to Replace to ITCV Hmmmm….

Reflection What part if most confusing to you? Why? More than one interpretation? Talk with one other, share Find new partner and problems and solutions

Equating the expressions, evaluating for % mixture of unobserved cases (π)

Impact Threshold for a confounding variable Qinyun Lin Impact Threshold for a confounding variable % bias to invalidate

Linear Adjustment to Invalidate Inference Linear model distributes change over all cases How many cases would you have to replace with zero effect counterfactuals to change the inference? Assume threshold is 4 (δ# =4): 1- δ# / =1-4/6=.33 =(1/3) 9 9 10 11 6 3 4 5 +6=9 +3=6+2=6 +1=6 6.00 6 4 The inference would be invalid if you replaced 33% (or 1 case) with counterfactuals for which there was no treatment effect. New estimate=(1-% replaced) +%replaced(no effect)= (1-%replaced) =(1-.33)6=.66(6)=4

Integrated Framework In case replacement, πi represents the probability of a case being replaced. In the omitted variables case πi is constant across all i. The omitted variable = f(yideal -yobserved ) where f is any linear function In the weighted case, 0< πi < 1, varying across cases.

From previous, regression adjustment can be thought of as case replacement. Adjusting for an omitted variable is understood as replacing some cases with adjusted others. This is consistent with Heckman. So the replacement can come from within as in counterfactual, or from other sample. The real difference in the frameworks is in whether you replace all cases with small adjustments or some cases with full adjustment and others with 0. Could come up with a % adjustment weight. Takes a value of 1 or 0 for case replacement, takes equal weights across all cases for regression Yiideal =wi Yiobserved + (1-wi )Yiunbserved In case replacement, wi =1 with probability 1-π, where π =the proportion of cases that must be replaced to invalidate the inference. In the omitted variables case all wi =1-π. Replacement of the values of Y is achieved through an omitted variable. omitted variable = π(Yiunobserved -Yiobserved ). Where Yiunobserved is just a constant representing no treatment effect

How to get correlations from % bias

Control for Confound ↔Adjusting the Outcome Replacing observed cases with linear counterfactuals 6

Replacement with null hypothesis case: rxy|cv=0

Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 4.47723 1.14264 3.92 0.0296 x 1 5.04554 2.09835 2.40 0.0955 cv 1 0.57948 1.14931 0.50 0.6488 confound 0.2458681211 0.7846119788 1.4401851375 -0.486850108 -0.841877368 -1.141937761

Comparison of Frameworks % bias to invalidate Any estimate + threshold (e.g. t critical*standard error) Case replacement Good for experimental settings (treatments) Think in terms of cases (people or schools) counterfactual or unsampled population Assume equal effect of replacing any case Or weighted cases with weighted replacements Correlational Uses causal diagrams (Pearl) Good for continuous predictors Linear models only Think in terms of correlations, variables ITCV good for internal validity, not good for comparing between studies (different thresholds make comparison difficult) Both can be applied to Internal and External Validity any threshold

Extension of ITCV to Logistic logistic can be thought of as iteratively weighted least squares, but the weights would change with the omitted variable Simulation: Kelcey, B. 2009. Improving and Assessing Propensity Score Based Causal Inferences in Multilevel and Nonlinear Settings. Unpublished doctoral dissertation, University of Michigan Ichino, Andrea, Fabrizia Mealli, and Tommaso Nannicini. 2008. “From temporary help jobs to permanent employment: what can we learn from matching estimators and their sensitivity? ” Journal of Applied Econometrics 23:305–327. Nannicini, Tommaso. 2007. “Simulation–based sensitivity analysis for matching estimators.” Stata Journal 7:334–350. Table David J. Harding. 2003. “Counterfactual Models of Neighborhood Effects: The Effect of Neighborhood Poverty on High School Dropout and Teenage Pregnancy.” American Journal of Sociology 109(3): 676-719.

Omitted Var and % bias to invalidate So here we are thinking about Rxy as a correlation that can be conceptualized as adjusted by replacing π observed cases with unobserved cases in which a relationship between x and y has been conditioned on cv rxy|cv. If we set rxy|cv=0 we get:π=1-r#/r, the % bias to invalidate. So how do we get rxy|cv=0? So rxy|cv=0 if rxcv =1 and correspondingly rycv = rxy. That is, if there is an omitted confound in the unobserved data that is perfectly correlated with x. The inference is invalid if we replace π cases with cases in which an omitted variable is perfectly correlated with x. It could also be rxy|cv=0 if rycv =1 and correspondingly rxcv = rxy. That is, the inference is invalid if we replace π cases with cases in which an omitted variable is perfectly correlated with the outcome – cv entirely explains the outcome. Or we could have rxcv = rycv = (rxy).5 Or any combination such that rxcv = rxy / rycv.

Internal vs External Validity ITCV can be applied to external validity if you follow Heckman to describe sample bias as an omitted variable Counterfactuals can be considered an unsampled population. So then the only difference between internal validity and external validity is that the counterfactuals cannot possible be sampled, whereas for externall validity the researcher has chosen not to sample from a population, it is about the agency of the researcher.