Using Instrumental Variables (IV) Analysis in Institutional Research & Program Evaluation GARY PIKE HIGHER EDUCATION & STUDENT AFFAIRS INDIANA UNIVERSITY.

Slides:

Advertisements

Similar presentations

Sociology 601 Class 24: November 19, 2009 (partial) Review –regression results for spurious & intervening effects –care with sample sizes for comparing.

Advertisements

1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce

Instrumental Variables Estimation and Two Stage Least Square

Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)

HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.

Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.

TigerStat ECOTS Understanding the population of rare and endangered Amur tigers in Siberia. [Gerow et al. (2006)] Estimating the Age distribution.

EC220 - Introduction to econometrics (chapter 7)

Sociology 601, Class17: October 27, 2009 Linear relationships. A & F, chapter 9.1 Least squares estimation. A & F 9.2 The linear regression model (9.3)

Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.

Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.

Shall we take Solow seriously?? Empirics of growth Ania Nicińska Agnieszka Postępska Paweł Zaboklicki.

Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.

1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.

So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.

A trial of incentives to attend adult literacy classes Carole Torgerson, Greg Brooks, Jeremy Miles, David Torgerson Classes randomised to incentive or.

Interpreting Bi-variate OLS Regression

Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification iii: consequences for diagnostics Original.

TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.

BINARY CHOICE MODELS: LOGIT ANALYSIS

Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:

EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.

TOBIT ANALYSIS Sometimes the dependent variable in a regression model is subject to a lower limit or an upper limit, or both. Suppose that in the absence.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: the effects of changing the reference category Original citation: Dougherty,

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: Tobit models Original citation: Dougherty, C. (2012) EC220 - Introduction.

1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.

LT6: IV2 Sam Marden Question 1 & 2 We estimate the following demand equation ln(packpc) = b 0 + b 1 ln(avgprs) +u What do we require.

1 TWO SETS OF DUMMY VARIABLES The explanatory variables in a regression model may include multiple sets of dummy variables. This sequence provides an example.

Econometrics 1. Lecture 1 Syllabus Introduction of Econometrics: Why we study econometrics? 2.

1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for.

Returning to Consumption

How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.

Addressing Alternative Explanations: Multiple Regression

MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.

EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.

What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.

Instrumental Variables: Problems Methods of Economic Investigation Lecture 16.

Simple regression model: Y =  1 +  2 X + u 1 We have seen that the regression coefficients b 1 and b 2 are random variables. They provide point estimates.

Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.

. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model |

Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.

Two-stage least squares 1. D1 S1 2 P Q D1 D2D2 S1 S2 Increase in income Increase in costs 3.

POSSIBLE DIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY 1 What can you do about multicollinearity if you encounter it? We will discuss some possible.

Instrumental Variables: Introduction Methods of Economic Investigation Lecture 14.

(1)Combine the correlated variables. 1 In this sequence, we look at four possible indirect methods for alleviating a problem of multicollinearity. POSSIBLE.

Special topics. Importance of a variable Death penalty example. sum death bd- yv Variable | Obs Mean Std. Dev. Min Max

COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.

I271B QUANTITATIVE METHODS Regression and Diagnostics.

Lecture 5. Linear Models for Correlated Data: Inference.

STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.

RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.

SEMILOGARITHMIC MODELS 1 This sequence introduces the semilogarithmic model and shows how it may be applied to an earnings function. The dependent variable.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.

1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,

1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.

F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.

WHITE TEST FOR HETEROSCEDASTICITY 1 The White test for heteroscedasticity looks for evidence of an association between the variance of the disturbance.

1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.

VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE In this sequence we will investigate the consequences of including an irrelevant variable.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.

INSTRUMENTAL VARIABLES Eva Hromádková, Applied Econometrics JEM007, IES Lecture 5.

QM222 Class 9 Section A1 Coefficient statistics

QM222 Class 11 Section A1 Multiple Regression

The slope, explained variance, residuals

Introduction to Econometrics, 5th edition

Introduction to Econometrics, 5th edition

Presentation transcript:

Using Instrumental Variables (IV) Analysis in Institutional Research & Program Evaluation GARY PIKE HIGHER EDUCATION & STUDENT AFFAIRS INDIANA UNIVERSITY SCHOOL OF EDUCATION

Introductions  Who are you?  What do you do?  Why are you here?  What is your background/experience?

Overview  Introductions  The role of IV in institutional research  The role of regression in IR  Omitted variable bias  Using IV analysis to account for omitted variable bias  Stata example: College and Civic Engagement  Using IV in program evaluation  A primer on causal inference  Using IV analysis in quasi-experimental designs  Stata example: The Effect of a Grants Program on 9 th Grade Attainment  Another Stata example: Fifteen-to-Finish

Using Instrumental Variables in Institutional Research

Regression in IR  Regression is the workhorse of institutional research.  Predicted GPA for admission standards.  Role of financial aid in retention and graduation.  Examining the possible impact of “Fifteen-to-Finish.”  Evaluation of freshman interest groups.  Impact of fraternity/sorority membership.  Faculty salary studies.

Regression in ER  “… one can hardly pick up an issue of a higher education journal without running across at least one study in which OLS regression was the methodology of choice.” Ethington, Thomas, & Pike (2002).  Of the articles I’ve written in the last 10 years, exactly 2 have not used some form of regression analysis.  Weighting adjustments in surveys.  Cluster/factor analysis.

If Regression is so important… … shouldn’t we get it right?  Unbiased  Consistent  Asymptotically unbiased

The Basic Regression Model

Regression Assumptions  Linearity  Normality  Homogeneity of Variance  Fixed “X”  Independence  COV[X 1, ε] = 0

The Omitted Variable Problem

Violating Independence

Violating Independence 

An Example (Population Parameters) cumgpa sat100 hscpr10 cumgpa sat hscpr Coef. Std. Err. t P>|t| Beta sat hscpr cons

The Results GPA Revisited Coef. Std. Err. t P>|t| Beta sat hscpr cons Coef. Std. Err. t P>|t| Beta sat _cons

Stata Interlude 1

A Note from the Interlude

From Sample to Population

How Instrumental Variables Work

To be an Instrument (I)  The instrument (I) must be strongly related to (correlated with) the explanatory variable (X).  The instrument (I) must be unrelated to (not correlated with) the error term (ε). Alternatively  The instrument (I) must be related to the outcome variable (Y) only through the explanatory variable (X).

Stata Interlude 2 DEE’S (2004) STUDY OF THE EFFECTS OF ATTENDING COLLEGE ON CIVIC ENGAGEMENT (REGISTERING TO VOTE).

First-stage regressions Number of obs = 9227 F( 1, 9225) = Prob > F = R-squared = Adj R-squared = Root MSE = college | Coef. Std. Err. t P>|t| [95% Conf. Interval] distance | _cons | Instrumental variables (2SLS) regression Number of obs = 9227 Wald chi2(1) = Prob > chi2 = R-squared = Root MSE = register | Coef. Std. Err. z P>|z| [95% Conf. Interval] college | _cons | Instrumented: college Instruments: distance cov[Y,I] = ; cov[X,I] = ; / =

Testing the Assumptions of IV  The instrument must be related to the explanatory variable.  In our example, we have an F-test showing the relationship between the instrument (distance from a college) and the explanatory variable (whether the student attended college: F=115.86; df=1, 9225; p <  Stock, Wright, and Yugo (2002) argue that the F-ratio would be greater than 10. (Two or more instruments require larger F-ratios.)  There is no path linking the instrument directly to the outcome. IX Y

Adding Covariates  Frequently want to add covariates to our models  These covariate may help to account for some of the relationship between the outcome and the explanatory variable.  They provide a better explanation of the outcome, and thereby increase the power/efficiency of estimation.  Another reason to include covariates is to address the “no third path” requirement. (Dee included race/ethnicity & achievement test scores.)  When covariates are present, the instrument needs to be related to the explanatory variable above and beyond the relationships of the covariates to the explanatory variable.  In addition to not being directly related to the outcome, the instrument should not be related to the outcome through the covariates.

Stata Interlude 3

First-stage regressions Number of obs = 9227 F( 4, 9222) = Prob > F = R-squared = Adj R-squared = Root MSE = college | Coef. Std. Err. t P>|t| [95% Conf. Interval] black | hispanic | otherrace | distance | _cons |

Instrumental variables (2SLS) regression Number of obs = 9227 Wald chi2(4) = Prob > chi2 = R-squared = Root MSE = register | Coef. Std. Err. z P>|z| [95% Conf. Interval] college | black | hispanic | otherrace | _cons | Instrumented: college Instruments: black hispanic otherrace distance

Adding Instruments  Only having a single instrument (e.g., distance) is problematic because there is no test of the “no third path” assumption.  If there are more instruments in the model than there are explanatory variables, the model is “over-identified” and there are statistical tests that can be used to evaluate whether there are (1) direct paths between the instruments and the outcome, and/or (2) whether the instruments are related to the outcome through the covariates.  In Dee’s study, he used the number of schools within a 35 mile radius of a student’s high school as a second instrument. (Unfortunately that variable isn’t available in the public-use dataset.)  Alternatively, I’m going to use sex (i.e., female) as the second instrument.

Stata Interlude 4

First-stage regressions Number of obs = 9227 F( 5, 9221) = Prob > F = R-squared = Adj R-squared = Root MSE = college | Coef. Std. Err. t P>|t| [95% Conf. Interval] black | hispanic | otherrace | distance | female | _cons |

Instrumental variables (2SLS) regression Number of obs = 9227 Wald chi2(4) = Prob > chi2 = R-squared = Root MSE = register | Coef. Std. Err. z P>|z| [95% Conf. Interval] college | black | hispanic | otherrace | _cons | Instrumented: college Instruments: black hispanic otherrace distance female. estat overid Tests of overidentifying restrictions: Sargan (score) chi2(1) = (p = ) Basmann chi2(1) = (p = )

BREAK

Using Instrumental Variables in Program Evaluation

Causal Inference in Program Evaluation  Regression is a correlational procedure, and no matter how many variables you have in the model it’s still correlational.  If we are going to evaluate the effectiveness of education programs and initiatives, I would prefer to say the program “caused” the outcome, rather than saying the program is “correlated” with the outcome.

A Quick Tour of Causal Inference

Counterfactuals

Treatment Effects

Descriptive Program Evaluation

Random Assignment

Using Instrumental Variables

However,  An instrumental variables approach can’t be used to estimate the average treatment effect (ATE) for all individuals. In fact, it may not be able to estimate the average treatment effect on the treated (ATET).  Four types of individuals  Always Takers – They will always participate in the treatment.  Never Takers – They will never participate in the treatment.  Defiers – They behave opposite to expectations.  Compliers – They behave in line with expectations.  Angist & Pischke (2009) note that instrumental variables can be used to estimate treatment effects for compliers—they refer to this as a Local Average Treatment Effect (LATE).

However #2,  There are some additional assumptions we need to satisfy:  The instrument must be (strongly) related to the treatment variable.  The instrument must be unrelated to the outcome, except through the treatment (i.e., no third path).  The influence of the treatment will be the same for all individuals, and individuals not receiving the treatment will not be influenced by individuals receiving the treatment (Stable Unit Treatment Value Assumption, SUTVA).  The distribution of the instrument across individuals should be comparable to random assignment. As a practical matter, the instrument should be exogenous (0r close to exogenous).  The instrument has a unidirectional effect on participation in the treatment (monotonicity).

As a Practical Matter ….  An instrumental variables analysis works best when individuals are randomly assigned to a treatment condition, and then some individuals choose not to participate.  Example: Students are randomly assigned to two groups. The first group is invited to join a themed learning community, but the second group is not invited (and cannot) join the learning community.  Students who are invited to join the theamed learning community are free to decide whether to join the learning community or not.  The random assignment of students to the learning community invitation group becomes the instrument  Actually joining the learning community becomes the treatment.  The outcome might be GPA, and a variety of exogenous covariates related to GPA (e.g., SAT & HS GPA) may be included in the analysis.

Assumptions Revisited  Given that only students who are randomly invited to join the TLC can join the TLC, the relationship between the instrument and the treatment is likely to be strong.  Since students are randomly assigned to the invitation group (instrument), the instrument should not be related to the outcome, except through the treatment.  SUTVA can be a problem. Some students may benefit more from the TLC than others, and there can be spillover. TLC students carry their experiences to non-TLC students.  The instrument is based on random assignment.  We need to be able to assume that there are no defiers in the study.

Stata Interlude 5 ANGRIST, BETTINGER,BLOOM,KING, & KREMER (2002). A STUDY OF THE PACES SCHOLARSHIP PROGRAM IN BOGOTÁ, COLUMBIA.

Variables  Outcome Variable: Did students finish 8 th grade (finish8th).  Treatment Variable: Did they participate in the PACES scholarship program (use_fin_aid).  Instrument: Were the students selected to be informed about the PACES scholarship (won_lottry).  Exogenous Covariates:  Age of the student at the beginning of the study (base_age).  Sex of the student (male).

Source | SS df MS Number of obs = F( 3, 1167) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = finish8th | Coef. Std. Err. t P>|t| Beta use_fin_aid | base_age | male | _cons |

First-stage regressions Number of obs = 1171 F( 3, 1167) = Prob > F = R-squared = Adj R-squared = Root MSE = use_fin_aid | Coef. Std. Err. t P>|t| [95% Conf. Interval] base_age | male | won_lottry | _cons | Instrumental variables (2SLS) regression Number of obs = 1171 Wald chi2(3) = Prob > chi2 = R-squared = Root MSE = finish8th | Coef. Std. Err. z P>|z| [95% Conf. Interval] use_fin_aid | base_age | male | _cons | Instrumented: use_fin_aid Instruments: base_age male won_lottry

Question: Why doesn’t everyone use instrumental variables? Answer: A good instrument is hard to find!

Example: Fifteen-to-Finish  Dependent Variable: First-year cumulative grade point average (cumgpa).  Treatment: Student enrolled in 15 or more credit hours in the Fall (fifteen).  Covariates:  SAT (combined) score / divided by 100 (sat100).  High School Class Percentile Rank / divided by 10 (hscpr10).  Student is female.  Underrepresented minority student.  Hours worked.  Instrument: Student placed in University College. (The lore is that advisors in University College encourage students to take credits.)

Source | SS df MS Number of obs = F( 6, 2547) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = cumgpa | Coef. Std. Err. t P>|t| Beta fifteen | female | sat100 | hscpr10 | urm | hrswork | _cons |

First-stage regressions Number of obs = 2554 F( 7, 2546) = Prob > F = R-squared = Adj R-squared = Root MSE = fifteen | Coef. Std. Err. t P>|t| [95% Conf. Interval] female | sat100 | hscpr10 | urm | hrswork | univcol | firstgen | _cons |

Instrumental variables (2SLS) regression Number of obs = 2554 Wald chi2(6) = Prob > chi2 = R-squared = Root MSE = cumgpa | Coef. Std. Err. z P>|z| [95% Conf. Interval] fifteen | female | sat100 | hscpr10 | urm | hrswork | _cons | Instrumented: fifteen Instruments: female sat100 hscpr10 urm hrswork univcol firstgen Tests of overidentifying restrictions: Sargan (score) chi2(1) = (p = ) Basmann chi2(1) = (p = )

Types of Instruments  Identifying appropriate instruments requires a thorough understanding of theory and research related to what you are studying.  You need to understand the setting in which your data were (or will be) obtained.  Types of instruments:  Proximity of educational institutions;  Economic conditions (e.g., unemployment rate);  Institutional rules and personal (demographic) characteristics; &  Deviations from cohort trends.

If applied econometrics were easy, theorists would do it. … DON’T PANIC! (ANGRIST & PISCHKE, 2009)