Download presentation
Presentation is loading. Please wait.
1
Quantitative Impact Evaluation Methods: With illustrations from education research
Stephen Taylor & Brahm Fleisch ZENEX July 2015 Department of Basic Education WITS University
2
In this presentation What is impact evaluation?
The evaluation problem we need to solve A menu of methods (identification strategies) Sample size in education research Advantages and challenges with impact evaluation in education
3
Context: A lack of focus on impact
“Development programs and policies are typically designed to change outcomes, for example, to raise incomes, to improve learning, or to reduce illness. Whether or not these changes are actually achieved is a crucial public policy question but one that is not often examined. More commonly, program managers and policy makers focus on controlling and measuring the inputs and immediate outputs of a program—how much money is spent, how many textbooks are distributed—rather than on assessing whether programs have achieved their intended goals of improving well-being.” (World Bank)
4
What is impact evaluation?
“Simply put, an impact evaluation assesses the changes in the well-being of individuals that can be attributed to a particular project, program, or policy.” (World Bank) At the heart of evaluation is the issue of causality. This is what policy-makers should be interested in.
5
Types of evaluation questions
Basic question: Is programme X effective compared with the absence of the programme? E.g. Is the school feeding programme leading to better nutrition and learning than would otherwise be the case? When there are alternative ways of implementing programme X, which way is most effective? E.g. Is school feeding more effective when administered by school staff or by an external service provider?
6
Theory of Change Process evaluation vs Impact Evaluation
Textbooks are delivered Textbooks are distributed by schools to learners Textbooks are used in class and/or at home Textbooks are of sufficient quality Tests are able to measure improved learning Improved test scores Process evaluation vs Impact Evaluation
7
The evaluation problem: knowing a counterfactual
8
The evaluation problem: knowing a counterfactual
9
The evaluation problem: knowing a counterfactual
10
The evaluation problem: knowing a counterfactual
11
The evaluation problem: knowing a counterfactual
We cannot observe the counterfactual: 2 alternative scenarios for the same person or group. So we have to identify or construct comparison groups as a “pseudo-counterfactual”. Or an estimate of the counterfactual The big question is: when is a comparison group a valid estimate of the counterfactual? Selection bias: Teacher professional development programmes Reverse causality: E.g. Years of Schooling and IQ; test scores and extra lessons; marriage and happiness
12
Solutions to the Evaluation Problem
13
Solution 1: Pre- and post-measures
Assumption: no other factors are likely to have caused changes over the time period Picture taken from a JPAL presentation
14
Solution 1: Pre- and post-measures
Number of accidents With a time series of measurements, this method is more credible
15
Solution 2: Simple difference
Assumption: no other systematic differences between groups One point in time; 2 groups
16
Solution 3: Difference-in-Difference
Assumption: the 2 groups were on a parallel trend pre- and post-scores for both a treatment group and a control group
17
Solution 3: Difference-in-Difference
18
Solution 4: Regression control & matching
Take a step back to think about omitted variables bias...
19
The problem of omitted variables
20
The problem of omitted variables
Evidence from SACMEQ: Schools with good access to English textbooks: Ave reading score: Schools with poor access to English textbooks: Ave reading score:
21
The problem of omitted variables
22
The problem of omitted variables
Evidence from SACMEQ: SES quintile Poor Access Good Access 1 424.7 427.3 2 440.6 449.6 3 452.7 464.5 4 458.0 508.8 5 629.5 645.6
23
Solution 4: Regression control & matching
Include all the necessary explanatory variables in a regression. Very common solution when working with cross-section data, e.g. TIMSS. Matching: For every treated case, find a similar “looking” comparison case In reality there are usually many potential “confounding factors” that simultaneously determine “treatment” (e.g. attending extra lessons) and outcomes. If we can observe (measure) all these things then we can include them as “control variables” in a multivariate regression: But often there are important unobserved omitted variables.
24
Solution 4: Regression control & matching
E.g. Impact of in-service teacher training on a school’s matric outcomes. School socio-economic status is one NB observable characteristic to include in a regression But what about professional culture within the school (unobserved)?
25
Solution 5: Fixed effects with panel data
If we have a panel dataset, e.g. Matric data for several years, we can observe outcomes for the same school at different times with varying participation in teacher training.
26
Solution 5: Fixed effects with panel data
If we have a panel dataset, e.g. Matric data for several years, we can observe outcomes for the same school at different times with varying participation in teacher training.
27
Solution 5: Fixed effects with panel data
High participation due to school culture Low participation due to school culture
28
Solution 5: Fixed effects with panel data
29
Solution 5: Fixed effects in summary
Sometimes we may wish to take out the school “fixed effect” E.g. Same school over time Or, only compare students within a school Or compare grades within a school Sometimes we may wish to take out the individual “fixed effect” E.g. Students across time Or, students across subjects (and hence teachers) For a fixed effects approach to work, there must be variation (of outcome variable and treatment variable) within the “fixed effect unit” The fixed effect approach controls for all time-invariant characteristics (observable and unobservable)
30
Solution 6: Randomisation
31
Why randomise?
32
Why randomise?
33
Why randomise?
34
Why randomise?
35
How to randomise
36
Randomised controlled trials
Simplest design: 1 Treatment, 1 control group Multiple arms: “Horse Race” Multiple arms: variations Cross-cutting designs
37
Randomised controlled trials
By “pushing a lever” RCTs can tell us about the binding constraints in the school system Observational data tells us how things are Each treatment arm should have a clear theory of change
38
RCT of a Reading Catch Up Programme
Case Study RCT of a Reading Catch Up Programme
39
Sample size in education settings
The point of a sample Larger samples mean more precise estimates Purposive sampling vs random sampling Confidence intervals, hypothesis testing & statistical significance Simple Random Samples Why is this often not feasible/optimal? Complex sampling Clustering Stratification Sampling weights How large is large enough? A bowl of soup?
40
Randomised controlled trials: Power calculations
The minimum detectable effect size The Power to ensure one observes any actual impact The level of confidence with which one can proclaim an observed impact (alpha parameter) The number of learners to be observed within each school The extent to which variation in educational achievement reflects between-school differences relative to within-school differences amongst learners Having a baseline measure The expected correlation between baseline and endline measurement The ratio of treatment to control schools
41
Case Study: The impact of study guides on matric performance: Evidence from a randomised experiment
42
Background to the “Mind The Gap” study
Randomised Control Trials (RCTs) very rare in SA education, even in other sectors Mind the Gap study guides developed during 2012 Aimed at acquiring the basic knowledge and skills necessary to pass the matric exam Distributed to schools in some parts of the country Mainly underperforming districts in EC, NC, a bit in Gauteng and elsewhere, but not in Mpumalanga Impact evaluation using 4 subjects in MP ACCN, ECON, GEOG, LFSC
43
The Sampling Frame National list of schools that were enrolled for the matric 2012 examination. The list was then restricted to only schools in Mpumalanga. Further restricted to schools registered to write the matric 2012 exam in English. The final sampling frame consists of 318 schools. Randomly allocated guides to 79 schools (books were couriered – delivery reliable) Leaves 239 control schools Books delivered late in Year: September
44
Main Results: OLS regressions with baseline
To summarise: No significant impact in Accounting & Economics; Impacts of roughly 2 percentage points in Geography & Life Sciences
45
Heterogeneous effects
46
Did impact vary by school functionality?
Geography Life Sciences
47
Matric 2010 simulation Roughly a 1 percentage point increase in matric pass rate 5609 The number of children who did not pass matric in 2010 but would have passed had Mind The Gap been nationally available Geography and Life Sciences.
48
Interpreting the size of the impact
Very rough rule of thumb: 1 year of learning = 0.4 to 0.5 standard deviations of test scores Geography: 13.5% SD Life Sciences: 14.4% SD Roughly a third of a year of learning The unit cost per study guide (reflecting material development, printing and distribution) is estimated to be R41,82
49
MTG: 3.04 SD per $100 Kremer, Brannen & Glennerster, 2013
50
Interpretation of results
2 guides had no impact: Interventions do not always impact on desired outcomes Interventions are not uniform in effectiveness The quality of the ACCN & ECON material? Or of the GEOG & LFSC materials? Contextual factors pre-disposing LFSC & GEOG to have an impact but not ACCN & ECON? A certain level of school functionality / managerial capacity needed in order for resources to be effective Timing of the delivery of guides External validity We are more certain about delivery in MP than if this were taken to scale Awareness campaigns could increase the impact at scale
51
Critiques of RCTs Ethics....
52
Critiques of RCTs External validity
Necessary and sufficient conditions for impact evaluations (internal and external validity) Internal validity = causal inference External validity = transferability to population Context: geography, time, etc...? E.g. Private schools, class size Special experimental conditions Hawthorne effects Implementation agent System support
53
External validity: Recommendations
Choose a representative & relevant study population Investigate heterogeneous impacts Investigate intermediate outcomes Use a realistic (scaleable) model of implementation and cost structure Work with government... But be careful No pre-test...? Or use administrative data (ANA & NSC provide opportunity here for DBE collaboration)
54
Evaluations in Government: Advantages, risks and perverse incentives
Dispelling gnosticism: interventions don’t always work “It is a waste of money to add an evaluation component to nutritional programs – these evaluations never find an impact anyway – we should just move ahead with what we nutrionists know is right.” Nutritional advocate in project decision meeting. Quoted by Pritchett (2002)
55
Advantages of Evaluations
Curtailing ineffective and therefore wasteful programmes Finding out how to improve programme implementation Scaling up effective interventions Finding out how to best design a new intervention Finding out about the binding constraints in a particular context (e.g. SA school system) E.g. “capability” vs “motivation”: Will teachers improve their content knowledge if they attend training workshops? Will teachers improve their content knowledge if they receive a reward for doing so? Will teachers improve their content knowledge only if they receive a reward and training?
56
Evaluations in Government: Advantages, risks and perverse incentives
Accountability Shifts the focus from inputs (e.g. number of teachers trained) to outcomes; From form to function (mimicry). Cooperation between government and other actors (researchers, NGOs, etc) Encourages policy-makers to interact with research and evidence Thinking about theories of change Shifts the focus from did government programme X succeed or fail, to why? The agency of programme recipients to change behaviour. Benefits for research: reduces publication bias
57
Evaluations in Government: Risks and perverse incentives
True believers and programme managers may not have an interest in evaluation Pritchett (2002): Pilot and persuade often the strategic choice Budgets: The cost-effectiveness argument is a no-brainer; but evaluation is typically not budgeted for. If evaluations cost time (experienced by programme managers) and money and may well produce inconvenient outcomes, who can be expected to instigate evaluations? DPME Donors
58
Solution 7: Regression Discontinuity Design
59
Solution 8: Natural experiments & Instrumental Variables
Mountain Range Attitudes (e.g. Tolerance) Watching TV
60
Solution 8: Natural experiments & Instrumental Variables
Compulsory school age Earnings Years of schooling
61
In summary Pre & Post Simple Difference Difference-in-differences
Regression control Fixed effects RCT RDD IV
62
} } In summary } Pre & Post Simple Difference
Difference-in-differences Regression & matching Fixed effects RCT RDD IV Non-experimental (observed data) } Experimental } Quasi-Experimental
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.