Assumptions of value-added models for estimating school effects sean f reardon stephen w raudenbush april, 2008.

Slides:

Advertisements

Similar presentations

Multiple Regression Analysis

Advertisements

The Multiple Regression Model.

Statistical Analysis Overview I Session 2 Peg Burchinal Frank Porter Graham Child Development Institute, University of North Carolina-Chapel Hill.

Sources of bias in experiments and quasi-experiments sean f. reardon stanford university 11 december, 2006.

IRT Equating Kolen & Brennan, IRT If data used fit the assumptions of the IRT model and good parameter estimates are obtained, we can estimate person.

MEASURING COLLEGE VALUE-ADDED: A DELICATE INSTRUMENT Richard J. Shavelson SK Partners & Stanford University AERA Ben Domingue University of Colorado Boulder.

Omitted Variable Bias Methods of Economic Investigation Lecture 7 1.

Linear regression models

Introduction to Regression Analysis

Correlation. Introduction Two meanings of correlation –Research design –Statistical Relationship –Scatterplots.

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.1 CorrelationCorrelation The underlying principle of correlation analysis.

The Simple Linear Regression Model: Specification and Estimation

Threats to Conclusion Validity. Low statistical power Low statistical power Violated assumptions of statistical tests Violated assumptions of statistical.

Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.

Common Problems in Writing Statistical Plan of Clinical Trial Protocol Liying XU CCTER CUHK.

ANCOVA Psy 420 Andrew Ainsworth. What is ANCOVA?

Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review

Clustered or Multilevel Data

Overview of Lecture Parametric Analysis is used for

Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.

Statistics 350 Lecture 10. Today Last Day: Start Chapter 3 Today: Section 3.8 Homework #3: Chapter 2 Problems (page 89-99): 13, 16,55, 56 Due: February.

1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.

Analysis of Clustered and Longitudinal Data

Objectives of Multiple Regression

3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors.

Opportunities and Challenges in a Multi-Site Regression Discontinuity Design Stephen W. Raudenbush University of Chicago Presentation at the MultiLevel.

Introduction to Linear Regression and Correlation Analysis

Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.

Hypothesis Testing in Linear Regression Analysis

Measurement Error.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.

T tests comparing two means t tests comparing two means.

Comparing Two Proportions

A Neural Network MonteCarlo approach to nucleon Form Factors parametrization Paris, ° CLAS12 Europen Workshop In collaboration with: A. Bacchetta.

PARAMETRIC STATISTICAL INFERENCE

Basic Statistical Concepts  M. Burgman & J. Carey 2002.

Effect Size Estimation in Fixed Factors Between-Groups ANOVA

Effect Size Estimation in Fixed Factors Between- Groups Anova.

Problems with the Design and Implementation of Randomized Experiments By Larry V. Hedges Northwestern University Presented at the 2009 IES Research Conference.

Issues in Assessment Design, Vertical Alignment, and Data Management : Working with Growth Models Pete Goldschmidt UCLA Graduate School of Education &

Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.

Managerial Economics Demand Estimation & Forecasting.

Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.

Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.

Propensity Score Matching for Causal Inference: Possibilities, Limitations, and an Example sean f. reardon MAPSS colloquium March 6, 2007.

2.4 Units of Measurement and Functional Form -Two important econometric issues are: 1) Changing measurement -When does scaling variables have an effect.

Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.

ANOVA Assumptions 1.Normality (sampling distribution of the mean) 2.Homogeneity of Variance 3.Independence of Observations - reason for random assignment.

CHAPTER 2 Research Methods in Industrial/Organizational Psychology

T tests comparing two means t tests comparing two means.

Chapter Eight: Quantitative Methods

Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.

Chapter 22 Comparing Two Proportions.  Comparisons between two percentages are much more common than questions about isolated percentages.  We often.

INFERENCE Farrokh Alemi Ph.D.. Point Estimates Point Estimates Vary.

T tests comparing two means t tests comparing two means.

Using Prior Scores to Evaluate Bias in Value-Added Models Raj Chetty, Stanford University and NBER John N. Friedman, Brown University and NBER Jonah Rockoff,

The “Big Picture” (from Heath 1995). Simple Linear Regression.

Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.

Classical Test Theory Psych DeShon. Big Picture To make good decisions, you must know how much error is in the data upon which the decisions are.

IRT Equating Kolen & Brennan, 2004 & 2014 EPSY

Applied Regression Analysis BUSI 6220

Measuring College Value-Added: A Delicate Instrument

Inference about the slope parameter and correlation

Stephen W. Raudenbush University of Chicago December 11, 2006

Comparison of Correlation and % Bias to Invalidate Frameworks

Fundamentals of regression analysis

Common Problems in Writing Statistical Plan of Clinical Trial Protocol

Confidence Intervals Chapter 10 Section 1.

Linear Regression Summer School IFPRI

Chapter 3 Hernán & Robins Observational Studies

Presentation transcript:

assumptions of value-added models for estimating school effects sean f reardon stephen w raudenbush april, 2008

goals of this paper  frame value-added models (VAMs) within counterfactual model of causality  articulate six assumptions necessary to interpret estimated VAM parameters as causal effects  discuss plausibility of these assumptions  assess sensitivity of VAM estimates to violations of three of these assumptions

counterfactual model of school effects  define the potential outcome,, as the test score for student i that would be observed (at some time t) if student i were assigned to school j.  define as the distribution of potential outcomes if all students in population P are assigned to school j.  comparing effectiveness of schools j and k requires a comparison of and (typically we compare their means).

VAM assumptions (1) and (2)  (1) manipulability: each student can, in principle, be assigned to any school j  must be independently manipulable  necessary in order to define potential outcome  (2) no interference across units:  also known as stable unit treatment value assumption (SUTVA)  potential outcome must be unique; must not depend on assignment of other students  both are required so that we can write down the estimand of interest

a stylized VAM  consider a stylized VAM:  the mean potential outcome is then  so we can compare schools j and k by comparing  this relies on two additional assumptions

VAM assumptions (3) and (4)  (3) homogeneity: average effect of school j is constant over x.  requires that if school j is better than school k for students with x=x 1, then school j is also better than school k for students with x=x 2. (a good school is good for every type of student)  (4) interval metric: Y is measured in an interval- scaled metric  ranking schools by their mean potential outcomes implies Y is interval-scaled

fitting VAMs from observed data  we cannot observe the full distribution of potential outcomes for all students in all schools  (this is “the fundamental problem of causal inference” Holland, 1986)  we must estimate the parameters of a VAM (especially ) from observed data  we use the observed/realized potential outcomes in school j (and their covariance with x) to estimate the parameter(s) of (such as the mean of ).  this requires two additional assumptions

VAM assumptions (5) and (6)  (5) ignorability: school assignment is independent of potential outcomes, given observed covariates x.  necessary in order to obtain unbiased estimates  (6) common support/correct functional form: the functional form of the model is valid in regions where data are sparse  if there are no/few students with x=x in school j (no common support), then we rely on the functional form of the VAM and extrapolation to infer the mean potential outcomes of students with x=x in school j.

how plausible are these assumptions?  manipulability  no interference between units  homogeneity  interval metric  ignorability  common support/functional form

how sensitive are VAM estimates to violations of these assumptions?  for the sake of argument, we assume manipulability, SUTVA, and ignorability  we focus on implications of plausible violations of homogeneity, interval metric, and common support assumptions  assess sensitivity of estimates to violations of the assumptions via simulation exercises

simulation study  ‘true’ model for potential outcomes (data generating model):  this is a very simple model (too simple)  heterogeneity parameter is we use

observed data parameters  given the true potential outcomes, two key parameters determine features of the observed data:  intracluster correlation of Y i0 (determines degree of common support) we use ICC = 0.1, 0.2, 0.3  curvature parameter of metric of Y allow observed Y metric to be a nonlinear (quadratic) transformation of the true Y metric define curvature as ratio of derivative of transformation function at 95 th percentile to derivative at 5 th percentile we use curvature = 1/5, 1/3, 2/3, 1, 1.5, 3, 5

simulation model  we simulate observed data from 500 schools, each containing 500 students  large samples ensure precise estimation of school effects  assume Y is measured without error (rel.=1.0)  assume Y 0 is measured without error  assume ICC of gains for students with average initial score is 0.40 (based on extant research)

four VAM estimators  model A: linear, no heterogeneity  model B: quadratic, no heterogeneity  model C: linear, with heterogeneity  model D: quadratic, with heterogeneity  note:

correlation of VAM A estimates and true school effects

correlation of VAM B estimates and true school effects

correlation of VAM C estimates and true school effects

correlation of VAM D estimates and true school effects

conclusions  sorting among schools need not degrade estimates of school effects, so long as homogeneity, interval metric (and ignorability) hold.  heterogeneity of effects degrades estimates in the presence of sorting (lack of common support) and/or failure of interval metric assumption.  failure of interval metric assumption degrades estimates, particularly in presence of heterogeneity of effects or incorrect specification of functional form.  if there is heterogeneity of effects, models that allow for it perform substantially better than those that do not.

caveats  many other potential sources of error in VAMs  bias due to violation of ignorability assumption  bias due to violation of SUTVA assumption  small school sizes may lead to substantially greater sampling variation  small school sizes may lead to less common support  measurement error in initial score Y 0  measurement error in outcome score Y  more complex potential outcome processes (our is exceedingly simplified)