Stephen Taylor & Brahm Fleisch ZENEX July 2015

Slides:



Advertisements
Similar presentations
Econometric analysis informing policies UNICEF workshop, 13 May 2008 Christian Stoff Statistics Division, UNESCAP,
Advertisements

AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation Muna Meky Impact Evaluation Cluster, AFTRL Slides by Paul J.
REGRESSION, IV, MATCHING Treatment effect Boualem RABTA Center for World Food Studies (SOW-VU) Vrije Universiteit - Amsterdam.
Donald T. Simeon Caribbean Health Research Council
Advantages and limitations of non- and quasi-experimental methods Module 2.2.
Mywish K. Maredia Michigan State University
A Guide to Education Research in the Era of NCLB Brian Jacob University of Michigan December 5, 2007.
Designs to Estimate Impacts of MSP Projects with Confidence. Ellen Bobronnikov March 29, 2010.
Experimental Research Designs
Exploring uncertainty in cost effectiveness analysis NICE International and HITAP copyright © 2013 Francis Ruiz NICE International (acknowledgements to:
Designing Influential Evaluations Session 5 Quality of Evidence Uganda Evaluation Week - Pre-Conference Workshop 19 th and 20 th May 2014.
Types of Evaluation.
RESEARCH A systematic quest for undiscovered truth A way of thinking
Measuring the impact of education interventions Stephen Taylor STELLENBOSCH, AUGUST 2015.
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
Applying impact evaluation tools A hypothetical fertilizer project.
Non-experimental methods Markus Goldstein The World Bank DECRG & AFTPM.
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
Measuring Impact 1 Non-experimental methods 2 Experiments
Africa Impact Evaluation Program on AIDS (AIM-AIDS) Cape Town, South Africa March 8 – 13, Steps in Implementing an Impact Evaluation Nandini Krishnan.
Evaluating Impacts of MSP Grants Ellen Bobronnikov January 6, 2009 Common Issues and Potential Solutions.
Africa Program for Education Impact Evaluation Dakar, Senegal December 15-19, 2008 Experimental Methods Muna Meky Economist Africa Impact Evaluation Initiative.
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.
Building an evidence-base from randomised control trials Presentation of the findings of the impact evaluation of the Reading Catch-Up Programme 18 August.
What is Impact Evaluation … and How Do We Use It? Deon Filmer Development Research Group, The World Bank Evidence-Based Decision-Making in Education Workshop.
Quality Evaluations in Education Interventions 1 March 2016 Dr Fatima Adam Zenex Foundation.
Monitoring and evaluation 16 July 2009 Michael Samson UNICEF/ IDS Course on Social Protection.
Looking for statistical twins
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.
Approaches to social research Lerum
Statistics 200 Lecture #10 Thursday, September 22, 2016
Chapter 11: Quasi-Experimental and Single Case Experimental Designs
Lurking inferential monsters
Measuring Results and Impact Evaluation: From Promises into Evidence
Technical Assistance on Evaluating SDGs: Leave No One Behind
Carina Omoeva, FHI 360 Wael Moussa, FHI 360
Reasoning in Psychology Using Statistics
The Midline Evaluation September 2016
An introduction to Impact Evaluation
Experimental Research Designs
Quasi-Experimental Methods
HLM with Educational Large-Scale Assessment Data: Restrictions on Inferences due to Limited Sample Sizes Sabine Meinck International Association.
Chapter Six Training Evaluation.
Chapter Eight: Quantitative Methods
S1316 analysis details Garnet Anderson Katie Arnold
Development Impact Evaluation in Finance and Private Sector
Impact Evaluation Methods
Empirical Tools of Public Finance
1 Causal Inference Counterfactuals False Counterfactuals
Implementation Challenges
Brahm Fleisch Research supported by the Zenex Foundation October 2017
Impact Evaluation Methods: Difference in difference & Matching
Introduction to Experimental Design
Evaluating Impacts: An Overview of Quantitative Methods
Sampling and Power Slides by Jishnu Das.
Impact Evaluation Designs for Male Circumcision
Explanation of slide: Logos, to show while the audience arrive.
Class 2: Evaluating Social Programs
Early grade reading in South Africa Lessons from a large scale intervention 1 October 2016 Stephen Taylor.
Sampling for Impact Evaluation -theory and application-
Class 2: Evaluating Social Programs
Reasoning in Psychology Using Statistics
Applying Impact Evaluation Tools: Hypothetical Fertilizer Project
Positive analysis in public finance
Sample Sizes for IE Power Calculations.
Reasoning in Psychology Using Statistics
Steps in Implementing an Impact Evaluation
Quantitative research design Guide
Presentation transcript:

Quantitative Impact Evaluation Methods: With illustrations from education research Stephen Taylor & Brahm Fleisch ZENEX July 2015 Department of Basic Education WITS University

In this presentation What is impact evaluation? The evaluation problem we need to solve A menu of methods (identification strategies) Sample size in education research Advantages and challenges with impact evaluation in education

Context: A lack of focus on impact “Development programs and policies are typically designed to change outcomes, for example, to raise incomes, to improve learning, or to reduce illness. Whether or not these changes are actually achieved is a crucial public policy question but one that is not often examined. More commonly, program managers and policy makers focus on controlling and measuring the inputs and immediate outputs of a program—how much money is spent, how many textbooks are distributed—rather than on assessing whether programs have achieved their intended goals of improving well-being.” (World Bank)

What is impact evaluation? “Simply put, an impact evaluation assesses the changes in the well-being of individuals that can be attributed to a particular project, program, or policy.” (World Bank) At the heart of evaluation is the issue of causality. This is what policy-makers should be interested in.

Types of evaluation questions Basic question: Is programme X effective compared with the absence of the programme? E.g. Is the school feeding programme leading to better nutrition and learning than would otherwise be the case? When there are alternative ways of implementing programme X, which way is most effective? E.g. Is school feeding more effective when administered by school staff or by an external service provider?

Theory of Change Process evaluation vs Impact Evaluation Textbooks are delivered Textbooks are distributed by schools to learners Textbooks are used in class and/or at home Textbooks are of sufficient quality Tests are able to measure improved learning Improved test scores Process evaluation vs Impact Evaluation

The evaluation problem: knowing a counterfactual

The evaluation problem: knowing a counterfactual

The evaluation problem: knowing a counterfactual

The evaluation problem: knowing a counterfactual

The evaluation problem: knowing a counterfactual We cannot observe the counterfactual: 2 alternative scenarios for the same person or group. So we have to identify or construct comparison groups as a “pseudo-counterfactual”. Or an estimate of the counterfactual The big question is: when is a comparison group a valid estimate of the counterfactual? Selection bias: Teacher professional development programmes Reverse causality: E.g. Years of Schooling and IQ; test scores and extra lessons; marriage and happiness

Solutions to the Evaluation Problem

Solution 1: Pre- and post-measures Assumption: no other factors are likely to have caused changes over the time period Picture taken from a JPAL presentation

Solution 1: Pre- and post-measures Number of accidents With a time series of measurements, this method is more credible

Solution 2: Simple difference Assumption: no other systematic differences between groups One point in time; 2 groups

Solution 3: Difference-in-Difference Assumption: the 2 groups were on a parallel trend pre- and post-scores for both a treatment group and a control group

Solution 3: Difference-in-Difference

Solution 4: Regression control & matching Take a step back to think about omitted variables bias...

The problem of omitted variables

The problem of omitted variables Evidence from SACMEQ: Schools with good access to English textbooks: Ave reading score: 508.82 Schools with poor access to English textbooks: Ave reading score: 457.11

The problem of omitted variables

The problem of omitted variables Evidence from SACMEQ: SES quintile Poor Access Good Access 1 424.7 427.3 2 440.6 449.6 3 452.7 464.5 4 458.0 508.8 5 629.5 645.6

Solution 4: Regression control & matching Include all the necessary explanatory variables in a regression. Very common solution when working with cross-section data, e.g. TIMSS. Matching: For every treated case, find a similar “looking” comparison case In reality there are usually many potential “confounding factors” that simultaneously determine “treatment” (e.g. attending extra lessons) and outcomes. If we can observe (measure) all these things then we can include them as “control variables” in a multivariate regression: But often there are important unobserved omitted variables.

Solution 4: Regression control & matching E.g. Impact of in-service teacher training on a school’s matric outcomes. School socio-economic status is one NB observable characteristic to include in a regression But what about professional culture within the school (unobserved)?

Solution 5: Fixed effects with panel data If we have a panel dataset, e.g. Matric data for several years, we can observe outcomes for the same school at different times with varying participation in teacher training.

Solution 5: Fixed effects with panel data If we have a panel dataset, e.g. Matric data for several years, we can observe outcomes for the same school at different times with varying participation in teacher training.

Solution 5: Fixed effects with panel data High participation due to school culture Low participation due to school culture

Solution 5: Fixed effects with panel data

Solution 5: Fixed effects in summary Sometimes we may wish to take out the school “fixed effect” E.g. Same school over time Or, only compare students within a school Or compare grades within a school Sometimes we may wish to take out the individual “fixed effect” E.g. Students across time Or, students across subjects (and hence teachers) For a fixed effects approach to work, there must be variation (of outcome variable and treatment variable) within the “fixed effect unit” The fixed effect approach controls for all time-invariant characteristics (observable and unobservable)

Solution 6: Randomisation

Why randomise?

Why randomise?

Why randomise?

Why randomise?

How to randomise

Randomised controlled trials Simplest design: 1 Treatment, 1 control group Multiple arms: “Horse Race” Multiple arms: variations Cross-cutting designs

Randomised controlled trials By “pushing a lever” RCTs can tell us about the binding constraints in the school system Observational data tells us how things are Each treatment arm should have a clear theory of change

RCT of a Reading Catch Up Programme Case Study RCT of a Reading Catch Up Programme

Sample size in education settings The point of a sample Larger samples mean more precise estimates Purposive sampling vs random sampling Confidence intervals, hypothesis testing & statistical significance Simple Random Samples Why is this often not feasible/optimal? Complex sampling Clustering Stratification Sampling weights How large is large enough? A bowl of soup?

Randomised controlled trials: Power calculations The minimum detectable effect size The Power to ensure one observes any actual impact The level of confidence with which one can proclaim an observed impact (alpha parameter) The number of learners to be observed within each school The extent to which variation in educational achievement reflects between-school differences relative to within-school differences amongst learners Having a baseline measure The expected correlation between baseline and endline measurement The ratio of treatment to control schools

Case Study: The impact of study guides on matric performance: Evidence from a randomised experiment

Background to the “Mind The Gap” study Randomised Control Trials (RCTs) very rare in SA education, even in other sectors Mind the Gap study guides developed during 2012 Aimed at acquiring the basic knowledge and skills necessary to pass the matric exam Distributed to schools in some parts of the country Mainly underperforming districts in EC, NC, a bit in Gauteng and elsewhere, but not in Mpumalanga Impact evaluation using 4 subjects in MP ACCN, ECON, GEOG, LFSC

The Sampling Frame National list of schools that were enrolled for the matric 2012 examination. The list was then restricted to only schools in Mpumalanga. Further restricted to schools registered to write the matric 2012 exam in English. The final sampling frame consists of 318 schools. Randomly allocated guides to 79 schools (books were couriered – delivery reliable) Leaves 239 control schools Books delivered late in Year: September

Main Results: OLS regressions with baseline To summarise: No significant impact in Accounting & Economics; Impacts of roughly 2 percentage points in Geography & Life Sciences

Heterogeneous effects

Did impact vary by school functionality? Geography Life Sciences

Matric 2010 simulation Roughly a 1 percentage point increase in matric pass rate 5609 The number of children who did not pass matric in 2010 but would have passed had Mind The Gap been nationally available Geography and Life Sciences.

Interpreting the size of the impact Very rough rule of thumb: 1 year of learning = 0.4 to 0.5 standard deviations of test scores Geography: 13.5% SD Life Sciences: 14.4% SD Roughly a third of a year of learning The unit cost per study guide (reflecting material development, printing and distribution) is estimated to be R41,82

MTG: 3.04 SD per $100 Kremer, Brannen & Glennerster, 2013

Interpretation of results 2 guides had no impact: Interventions do not always impact on desired outcomes Interventions are not uniform in effectiveness The quality of the ACCN & ECON material? Or of the GEOG & LFSC materials? Contextual factors pre-disposing LFSC & GEOG to have an impact but not ACCN & ECON? A certain level of school functionality / managerial capacity needed in order for resources to be effective Timing of the delivery of guides External validity We are more certain about delivery in MP than if this were taken to scale Awareness campaigns could increase the impact at scale

Critiques of RCTs Ethics....

Critiques of RCTs External validity Necessary and sufficient conditions for impact evaluations (internal and external validity) Internal validity = causal inference External validity = transferability to population Context: geography, time, etc...? E.g. Private schools, class size Special experimental conditions Hawthorne effects Implementation agent System support

External validity: Recommendations Choose a representative & relevant study population Investigate heterogeneous impacts Investigate intermediate outcomes Use a realistic (scaleable) model of implementation and cost structure Work with government... But be careful No pre-test...? Or use administrative data (ANA & NSC provide opportunity here for DBE collaboration)

Evaluations in Government: Advantages, risks and perverse incentives Dispelling gnosticism: interventions don’t always work “It is a waste of money to add an evaluation component to nutritional programs – these evaluations never find an impact anyway – we should just move ahead with what we nutrionists know is right.” Nutritional advocate in project decision meeting. Quoted by Pritchett (2002)

Advantages of Evaluations Curtailing ineffective and therefore wasteful programmes Finding out how to improve programme implementation Scaling up effective interventions Finding out how to best design a new intervention Finding out about the binding constraints in a particular context (e.g. SA school system) E.g. “capability” vs “motivation”: Will teachers improve their content knowledge if they attend training workshops? Will teachers improve their content knowledge if they receive a reward for doing so? Will teachers improve their content knowledge only if they receive a reward and training?

Evaluations in Government: Advantages, risks and perverse incentives Accountability Shifts the focus from inputs (e.g. number of teachers trained) to outcomes; From form to function (mimicry). Cooperation between government and other actors (researchers, NGOs, etc) Encourages policy-makers to interact with research and evidence Thinking about theories of change Shifts the focus from did government programme X succeed or fail, to why? The agency of programme recipients to change behaviour. Benefits for research: reduces publication bias

Evaluations in Government: Risks and perverse incentives True believers and programme managers may not have an interest in evaluation Pritchett (2002): Pilot and persuade often the strategic choice Budgets: The cost-effectiveness argument is a no-brainer; but evaluation is typically not budgeted for. If evaluations cost time (experienced by programme managers) and money and may well produce inconvenient outcomes, who can be expected to instigate evaluations? DPME Donors

Solution 7: Regression Discontinuity Design

Solution 8: Natural experiments & Instrumental Variables Mountain Range Attitudes (e.g. Tolerance) Watching TV

Solution 8: Natural experiments & Instrumental Variables Compulsory school age Earnings Years of schooling

In summary Pre & Post Simple Difference Difference-in-differences Regression control Fixed effects RCT RDD IV

} } In summary } Pre & Post Simple Difference Difference-in-differences Regression & matching Fixed effects RCT RDD IV Non-experimental (observed data) } Experimental } Quasi-Experimental