Download presentation
Presentation is loading. Please wait.
Published byMaurice Greene Modified over 8 years ago
1
EEF Evaluators’ Conference 25 th June 2015
2
Session 1: Interpretation / impact 25 th June 2015
3
Rethinking the EEF Padlocks Calum Davey Education Endowment Foundation 25 th June 2015
4
Overview →Background →Problems →Attrition →Power/chance →Testing →Proposal →Discussion
5
Background Summary of the security of evaluation findings ‘Padlocks’ developed in consultation with evaluators
6
Background Summary of the security of evaluation findings ‘Padlocks’ developed in consultation with evaluators Group Number of pupils Effect size Estimated months’ progress Evidence strength Literacy intervention5500.10 (0.03, 0.18)+2
7
Background Summary of the security of evaluation findings ‘Padlocks’ developed in consultation with evaluators Group Number of pupils Effect size Estimated months’ progress Evidence strength Literacy intervention5500.10 (0.03, 0.18)+2
8
Background Summary of the security of evaluation findings ‘Padlocks’ developed in consultation with evaluators Five categories – combined to create overall rating: Group Number of pupils Effect size Estimated months’ progress Evidence strength Literacy intervention5500.10 (0.03, 0.18)+2 Rating1. Design2. Power (MDES)3. Attrition4. Balance5. Threats to validity 5 Fair and clear experimental design (RCT) < 0.2< 10% Well-balanced on observables No threats to validity 4 Fair and clear experimental design (RCT, RDD) < 0.3< 20% 3 Well-matched comparison (quasi-experiment) < 0.4< 30% 2 Matched comparison (quasi-experiment) < 0.5< 40% 1 Comparison group with poor or no matching < 0.6< 50% 0 No comparator > 0.6> 50% Imbalanced on observables Significant threats
9
Background
10
Oxford Improving Numeracy and Literacy Rating1. Design 2. Power (MDES) 3. Attrition4. Balance5. Threats to validity 5 Fair and clear experimental design (RCT) < 0.2< 10% Well-balanced on observables No threats to validity 4 Fair and clear experimental design (RCT, RDD) < 0.3< 20% 3 Well-matched comparison (quasi- experiment) < 0.4< 30% 2 Matched comparison (quasi- experiment) < 0.5< 40% 1 Comparison group with poor or no matching < 0.6< 50% 0 No comparator > 0.6> 50% Imbalanced on observables Significant threats
11
Act, Sing, Play Rating1. Design 2. Power (MDES) 3. Attrition4. Balance5. Threats to validity 5 Fair and clear experimental design (RCT) < 0.2< 10% Well-balanced on observables No threats to validity 4 Fair and clear experimental design (RCT, RDD) < 0.3< 20% 3 Well-matched comparison (quasi- experiment) < 0.4< 30% 2 Matched comparison (quasi- experiment) < 0.5< 40% 1 Comparison group with poor or no matching < 0.6< 50% 0 No comparator > 0.6> 50% Imbalanced on observables Significant threats
12
Team Alphie Rating1. Design 2. Power (MDES) 3. Attrition4. Balance5. Threats to validity 5 Fair and clear experimental design (RCT) < 0.2< 10% Well-balanced on observables No threats to validity 4 Fair and clear experimental design (RCT, RDD) < 0.3< 20% 3 Well-matched comparison (quasi- experiment) < 0.4< 30% 2 Matched comparison (quasi- experiment) < 0.5< 40% 1 Comparison group with poor or no matching < 0.6< 50% 0 No comparator > 0.6> 50% Imbalanced on observables Significant threats
13
Problems : power MDES at baseline MDES changes Confusion with p-values and CIs: –Effect bigger than MDES! E.g. Calderdale ES=0.74, MDES <0.5 –P-value < 0.05! E.g. Butterfly Phonics ES=0.43, p 0.5 Rating 2. Power (MDES) 5< 0.2 4< 0.3 3< 0.4 2< 0.5 1< 0.6 0> 0.6
14
Problems : attrition Calculated overall at the level of randomisation 10% pupils off school each day Disadvantages individually- randomised: –Act, Sing, Play (pupil): 0% attrition at school or class level, 10% at pupil level –Oxford Science (school): 3% attrition at school level, 16% at pupil level Are the levels right? Rating3. Attrition 5< 10% 4< 20% 3< 30% 2< 40% 1< 50% 0> 50%
15
Problems : testing Lots of testing administered by teachers Teachers rarely blinded to intervention status What is the threat to validity when effect sizes are small? Rating 5. Threats to validity 5 No threats to validity 4 3 2 1 0 Significant threats
16
Potential solution? Assess ‘chance’ as well as MDES in padlock? Assess attrition at pupil level for all trials? Randomise invigilation of testing to assess bias? Number of pupils (number with intervention) Confidence interval for months progress?
17
Discussion Can p-values, confidence intervals, power, sample size, etc. could be combined in measure of ‘chance’? What are the advantages and disadvantages of reporting confidence intervals alongside the security rating? Is it right to include all attrition in the security rating? What potential disadvantages are there? What is the more appropriate way to ensure unbiasedness in testing? Would it be possible to conduct a trial across evaluations?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.