Quasi-Experimental Strategies When Randomization Is Not Feasible: Propensity Score Matching Shenyang Guo Judy Wildfire School of Social Work University.

Slides:



Advertisements
Similar presentations
Introduction to Propensity Score Matching
Advertisements

REGRESSION, IV, MATCHING Treatment effect Boualem RABTA Center for World Food Studies (SOW-VU) Vrije Universiteit - Amsterdam.
A workshop introducing doubly robust estimation of treatment effects
Cross Sectional Designs
Statistical Analysis Overview I Session 2 Peg Burchinal Frank Porter Graham Child Development Institute, University of North Carolina-Chapel Hill.
Random Assignment Experiments
Evaluation Research Pierre-Auguste Renoir: Le Grenouillere, 1869.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Mitigating Risk of Out-of-Specification Results During Stability Testing of Biopharmaceutical Products Jeff Gardner Principal Consultant 36 th Annual Midwest.
Selection of Research Participants: Sampling Procedures
Clustered or Multilevel Data
1 Psychological Symptoms among Young Maltreated Children: Do Services Make a Difference? The research for this presentation was funded by the Administration.
Lecture 10 Comparison and Evaluation of Alternative System Designs.
© Institute for Fiscal Studies The role of evaluation in social research: current perspectives and new developments Lorraine Dearden, Institute of Education.
TOOLS OF POSITIVE ANALYSIS
Introduction to Propensity Score Matching: A Review and Illustration
Introduction to PSM: Practical Issues, Concerns, & Strategies Shenyang Guo, Ph.D. School of Social Work University of North Carolina at Chapel Hill January.
Analysis of Clustered and Longitudinal Data
I want to test a wound treatment or educational program but I have no funding or resources, How do I do it? Implementing & evaluating wound research conducted.
Determining Sample Size
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
Matching Methods. Matching: Overview  The ideal comparison group is selected such that matches the treatment group using either a comprehensive baseline.
EVAL 6970: Cost Analysis for Evaluation Dr. Chris L. S. Coryn Nick Saxton Fall 2014.
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
Understanding Statistics
Lecture 8: Generalized Linear Models for Longitudinal Data.
Copyright restrictions may apply Household, Family, and Child Risk Factors After an Investigation for Suspected Child Maltreatment: A Missed Opportunity.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Propensity Score Matching and Variations on the Balancing Test Wang-Sheng Lee Melbourne Institute of Applied Economic and Social Research The University.
Welfare Reform and Lone Parents Employment in the UK Paul Gregg and Susan Harkness.
Staying the Course: Facility and Profession Retention among Nursing Assistants in Nursing Homes Sally C. Stearns, PhD Laura D’Arcy, MPA The University.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Evaluating the Efficacy of the Research Initiative for Scientific Enhancement (RISE) by using Propensity Scores to Identify a Matched Comparison Group.
Quantitative and Qualitative Approaches
Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.
CAUSAL INFERENCE Presented by: Dan Dowhower Alysia Cohen H 615 Friday, October 4, 2013.
CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.
Propensity Score Matching for Causal Inference: Possibilities, Limitations, and an Example sean f. reardon MAPSS colloquium March 6, 2007.
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation David Evans Impact Evaluation Cluster, AFTRL Slides by Paul J.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
EDCI 696 Dr. D. Brown Presented by: Kim Bassa. Targeted Topics Analysis of dependent variables and different types of data Selecting the appropriate statistic.
Generalizing Observational Study Results Applying Propensity Score Methods to Complex Surveys Megan Schuler Eva DuGoff Elizabeth Stuart National Conference.
Applying impact evaluation tools A hypothetical fertilizer project.
Non-experimental methods Markus Goldstein The World Bank DECRG & AFTPM.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
Framework of Preferred Evaluation Methodologies for TAACCCT Impact/Outcomes Analysis Random Assignment (Experimental Design) preferred – High proportion.
Using Propensity Score Matching in Observational Services Research Neal Wallace, Ph.D. Portland State University February
Chapter Eight: Quantitative Methods
Randomized Assignment Difference-in-Differences
Tutorial I: Missing Value Analysis
REBECCA M. RYAN, PH.D. GEORGETOWN UNIVERSITY ANNA D. JOHNSON, M.P.A. TEACHERS COLLEGE, COLUMBIA UNIVERSITY ANNUAL MEETING OF THE CHILD CARE POLICY RESEARCH.
Effectiveness of Selected Supplemental Reading Comprehension Interventions: Impacts on a First Cohort of Fifth-Grade Students June 8, 2009 IES Annual Research.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Outcomes Evaluation A good evaluation is …. –Useful to its audience –practical to implement –conducted ethically –technically accurate.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
MATCHING Eva Hromádková, Applied Econometrics JEM007, IES Lecture 4.
Patricia Gonzalez, OSEP June 14, The purpose of annual performance reporting is to demonstrate that IDEA funds are being used to improve or benefit.
Methods of Presenting and Interpreting Information Class 9.
Looking for statistical twins
Using a Summary Score Approach to Analyze Performance Measures Over Time Charlie Ferguson, Ph.D. San Jose State University School of Social Work Demonstration.
Constructing Propensity score weighted and matched Samples Stacey L
Chapter Eight: Quantitative Methods
Impact evaluation: The quantitative methods with applications
Methods of Economic Investigation Lecture 12
Matching Methods & Propensity Scores
Impact Evaluation Methods: Difference in difference & Matching
IV-E Prevention Family First Implementation & Policy Work Group
Evaluating Impacts: An Overview of Quantitative Methods
Inferential Statistics
Presentation transcript:

Quasi-Experimental Strategies When Randomization Is Not Feasible: Propensity Score Matching Shenyang Guo Judy Wildfire School of Social Work University of North Carolina at Chapel Hill For presentation at the 9th Annual Child Welfare Waiver Demonstration Projects Meeting, Washington DC, June 9, 2005

Acknowledgements Support for this research is from two sources. The Children’s Bureau provided a grant for the development of innovative quantitative methods for child welfare research (Shenyang Guo, PI). The NC Dept of Health and Human Services is supporting an evaluation of NC’s Title IV-E Waiver Demonstration project (Judy Wildfire, PI). We thank Rick Barth, Mark Fraser, and Lynn Usher for comments and suggestions on this presentation.

Presentation Overview  The challenges: situations under which randomized experiments are infeasible  The counterfactual framework  The fundamental assumption embedded in all evaluations  The propensity score matching (PSM) approach  An illustration  Analytic plan for the evaluation of the NC Waiver Demonstration project

A PSM Resource Package  “PSM References” compiled by Guo, Barth, and Gibbons (2003).  Guo, S., Barth, R.P., & Gibbons, C. (Forthcoming). Propensity score matching strategies for evaluating substance abuse services for child welfare clients. Children and Youth Services Review.  Guo, S., & Barth, R.P. (2005) Running propensity score matching with STATA/PSMATCH2. Workshop Conducted at the SSWR Annual Conference.

Limitations of Randomization  Can researchers really accomplish randomization in social behavior evaluation and research? Heckman & Smith (1995, 1998)  The case of educational study: evaluation of the Catholic school effect vis-à-vis the public school effect on learning. Morgan (2001) found that the Catholic school effect is the strongest among those Catholic school students who are less likely to attend Catholic schools.

The Counterfactual Framework  Counterfactual: what would have been the outcomes for persons being served had they not been served?  The counterfactual framework was developed by Neyman (1923) and Rubin (1978).  The key assumption of the counterfactual framework is that individuals selected into experimental groups have potential outcomes in both states: the one in which they are observed and the one in which they are not observed. A rigorous evaluation design should aim to accomplish a robust estimation of counterfactual.

The Fundamental Assumption  Rosenbaum & Rubin (1983): Conditional on observed covariates, the service assignment should not correlate with the outcome under either service or control condition.  Different versions: “unconfoundedness” & “ignorable treatment assignment” (Rosenbaum & Robin, 1983), “selection on observables” (Barnow, Cain, & Goldberger, 1980), “conditional independence” (Lechner 1999, 2002), and “exogeneity” (Imbens, 2004).

Violation of the Assumption Is Equivalent to Ignoring Threats to Internal Validity  Internal validity relates to questions about ascertaining whether the intervention caused the changes in the target problem (Campbell & Stanley, 1963; Cook & Campbell 1979).  Key among the nine threats outlined by Campbell & Stanley is selection.

Violation of the Assumption Also Is Equivalent to Violating An OLS Assumption  An ordinary least square (OLS) regression model assumes no correlation of the error term with the explanatory variables. The assumption is often violated when researchers evaluate service effects from observational data.  Consider the model: The independent variable w is usually correlated with the error term . The consequence is inconsistent and biased estimate about the treatment effect .

Development and Application of PSM: An Overview  Who? Heckman (1978, 1979) Rosenbaum and Rubin (1983) Heckman et al (1997). difference-in-differences approach  When is PSM appropriate? (1) Analyzing treatment effects from observational data; (2) Evaluating effectiveness of intervention with a quasi-experimental design.  Software packages: R; PSMATCH2 of Stata.

1-to-1 or 1-to-n Match  Nearest neighbor matching  Caliper matching  Mahalanobis  Mahalanobis with propensity score added Run Logistic Regression:  Dependent variable: Y=1, if participate; Y = 0, otherwise.  Choose appropriate conditioning variables.  Obtain propensity score: predicted probability (p) or log[(1-p)/p]. General Procedure Multivariate analysis based on new sample  1-to-1 or 1-to-n match and then stratification (subclassification)  Kernel or local linear weight match and then estimate Difference-in- differences (Heckman) Either Or

Evaluation Challenges Posed by Title IV-E Waiver Demonstration Projects in NC (1)  The fundamental challenge is that under welfare reform, counties and states have more freedom to choose their own policies.  It is a system reform initiative that combines individual-level, agency-level, and community strategies to change outcomes for children in child welfare. An evaluation must consider combined effect of all of these in determining the impact of the waiver.  Counties are allowed flexibility to define own “waiver” interventions that fit the needs and environment of the population that they serve. There is no one “silver bullet” being implemented across the state.

Evaluation Challenges Posed by Title IV-E Waiver Demonstration Projects in NC (2)  It is not administratively feasible to assign individual families or children to distinct service conditions because the changes in approach to service reflect a fundamental shift in philosophy.  Group randomization (Bloom and Raudenbush, 2004) is not feasible because participation reflects a voluntary commitment to reform.  Counties participating in the Demonstration necessarily and desirably reflect a “selection bias”.

An Illustration A Monte Carlo Study Simulating the Evaluation of the Title IV-E Waiver Demonstration Project in NC

Objective of the Monte Carlo Study  To simulate the settings typically found in the Title IV-E Waiver Demonstration project. Through data simulations, we compare PSM to a “quasi-random” method. Because the population parameters are known in advance, we will show via the study the bias and efficiency of estimation associated with each method.

The Hypothetical Population  Suppose in a hypothetical state, there are 20 counties participating in Waiver Demonstration, and another 20 counties forming a comparison group. The intervention is designed to reduce the number of placements among children with a substantiated report.  Assume further: in each county, there are 1,000 children who meet the study criteria and eligible for the study. That is, the population N = 1,000*20 + 1,000*20=40,000.  Assume further: the Waiver Counties have lower average income, and their children have more severe behavioral problems than the Comparison Counties (selection bias is built in).

The Hypothetical Population (cont’d)  Assume further: the intervention has a small effect (i.e., the hazard rate of placement since the first substantiated report for the Waiver group is 30% lower than that of the Comparison group).  Assume further: due to cost constraints, we cannot evaluate the outcomes for the entire population. Instead, we must create a sample comprising 2,000 Waiver children and 2,000 comparison children.

Various Types of Research and Sampling Designs  Pure random: create a random sample of 4,000, and assign 2,000 to Waiver and 2,000 to Comparison purely randomly. This method is infeasible due to spillover effects.  Quasi-random: draw a random sample of 2,000 from the 20,000 Waiver subjects, and a random sample of 2,000 from the 20,000 Comparison subjects.  PSM: draw a random sample of 2,000 from the 20,000 Waiver subjects, and to find 2,000 best matches on propensity scores from the Comparison group.

Research Question for the Monte Carlo Study Between the two approaches (i.e., quasi-random and PSM), which design is better?

Descriptive Statistics of the Population

Estimated Survival Functions in Population: Small Effect Size of Waiver Intervention

Quasi-Random: This pattern is shown by all 100 simulations. PSM: This pattern is shown by all 100 simulations. Finding 1: The quasi-random approach violates the fundamental assumption, but PSM does not

Finding 2: Based on Cox regression analysis, PSM yields less error than the quasi-random approach (100 simulations using all predictors) Effect TRUE = -.37 MEAN PSM = MEAN RANDOM = Error PSM = Error RANDOM = SD PSM = SD RANDOM =

Finding 3: Even if a Cox model is not correctly specified (INCOME is dropped), PSM is more robust (100 Simulations) Effect TRUE = -.37 MEAN PSM = MEAN RANDOM = Error PSM = Error RANDOM = SD PSM = SD RANDOM =

Limitations  Propensity scores cannot adjust for unobserved covariates.  PSM works better in larger samples.  PSM can handle a covariate that is related to the service condition, but not to the outcome in the same way as a covariate with the same relation to the service condition, but strongly related to outcome (Rubin 1997).

Continuing Debate Among Researchers  Much skepticism that PSM or any other nonexperimental approach can reliably simulate, much less, replace randomized experiments.  Proponents: Heckman et al (1997), Dehejia & Wahba (1999), etc.  Opponents: Lalonde (1986), Michalopoulos, Bloom, & Hill (2004), etc.

Context of the Debate  Debate about PSM being better than a randomized experiment is moot with regard to evaluating Waiver Demonstrations in which a saturation approach makes individual or group randomization infeasible.  Conclusion: PSM provides an analytic framework that can add an important degree of control in quasi-experimental evaluations.

NC Waiver Demonstration Project  System reform effort that utilizes flexible funding to implement county-specific “demonstration” in 39 Waiver counties  All Waiver counties agree to work to change these outcomes: Reduce the likelihood of out-of-home placement while maintaining safety for children Reduce the length of stay for children who must enter out-of-home placement Reduce the likelihood of repeat abuse or neglect for all children

NC Waiver Demonstration Project  County demonstrations include implementation of strategies targeted to individual clients, agency staff and community members that all combine to change the experiences of children involved with the child welfare system

Evaluation of the NC Waiver Demonstration Project  Comparison group design with 39 Waiver counties and 30 comparison counties  Outcomes for all children served in these counties are tracked using longitudinal data derived from administrative data files maintained by the state  There is only limited services data available for all children in the state so it is not possible to adequately track the changes in service utilization at a client level with existing data files

Process Evaluation  Selected 2 random samples of children in the Waiver counties  Using PSM select 2 matched samples of children in the comparison counties using age, race and gender  Implement a case record review for the samples to get supplemental information on services used at varying time points in the case

Why PSM?  Waiver counties self-selected into the demonstration  Comparison counties were selected so that the group as a whole matched Waiver counties on key county-level elements: rate of change in IV-E expenditures over the last several years, similar patterns of use of out-of-home placement, size of county  There were age and race differences in the distribution of children served in the 2 groups

Why PSM?  We could not collect services data on all children in the counties therefore had to rely on sampling strategy  PSM assured that the samples were similarly distributed on key individual characteristics

Process Evaluation  Use the sample data in conjunction with county aggregate data on service availability to describe service patterns for clients in Waiver and comparison counties  Repeat this process midway and at the end of demonstration period to assess changes due to the Waiver

Outcome analysis  Assess changes in outcomes for all children in Waiver counties versus comparison counties  Supplement these analyses by examining outcomes for children in the samples

Current status  Very early in evaluation  Will be finishing up first round of case record abstractions in both Waiver and comparison counties this month  Will begin looking at baseline services data from the samples and linking these data back to existing data sources to check validity and reliability of the data