EEF Evaluators’ Conference 25 th June 2015. Session 1: Interpretation / impact 25 th June 2015.

Slides:



Advertisements
Similar presentations
Success for All - Living up to its name? Dr Louise Tracey & Professor Bette Chambers 21 March 2013.
Advertisements

Assessment and Tracking Evening Foundation Stage 2 & Key Stage 1.
Conclusion Epidemiology and what matters most
Sample size issues & Trial Quality David Torgerson.
Robert Coe Neil Appleby Academic mentoring in schools: a small RCT to evaluate a large policy Randomised Controlled trials in the Social Sciences: Challenges.
Designs to Estimate Impacts of MSP Projects with Confidence. Ellen Bobronnikov March 29, 2010.
Adapting Designs Professor David Torgerson University of York Professor Carole Torgerson Durham University.
Experimental Research Designs
Dr Ian Abrahams Combining randomised control trials with qualitative research approaches: The best of both worlds York
Conference for EEF evaluators: Building evidence in education Hannah Ainsworth, York Trials Unit, University of York Professor David Torgerson, York Trials.
A randomised controlled trial to improve writing quality during the transition between primary and secondary school Natasha Mitchell, Research Fellow Hannah.
Dr Amanda Perry Centre for Criminal Justice Economics and Psychology, University of York.
METHODS IN BEHAVIORAL RESEARCH NINTH EDITION PAUL C. COZBY Copyright © 2007 The McGraw-Hill Companies, Inc.
EEF Evaluators’ Conference 25 th June Session 1: Interpretation / impact 25 th June 2015.
Our Vision Bishops Lydeard Primary School. Improve the quality of teaching to good or better through:  Continuing Professional Development  North Somerset.
Addressing educational disadvantage, sharing evidence, finding out what works Camilla Nevill Evaluation Manager.
Raising Attainment Evidence and Challenges Jim Cameron, Head of Schools with Education Support West Lothian Council.
Preliminary Results – Not for Citation Investing in Innovation (i3) Fund Evidence & Evaluation Webinar May 2014 Note: These slides are intended as guidance.
Overview of MSP Evaluation Rubric Gary Silverstein, Westat MSP Regional Conference San Francisco, February 13-15, 2008.
Moving from Development to Efficacy & Intervention Fidelity Topics National Center for Special Education Research Grantee Meeting: June 28, 2010.
Modes of Observations (Research Designs) –Experiments –Survey Research –Field Research –Unobtrusive Research –Evaluation Research Each of these methods.
Building Evidence in Education: Conference for EEF Evaluators 11 th July: Theory 12 th July: Practice
AETIOLOGY Case control studies (also RCT, cohort and ecological studies)
Systematic Reviews Professor Kate O’Donnell. Reviews Reviews (or overviews) are a drawing together of material to make a case. These may, or may not,
CHP400: Community Health Program- lI Research Methodology STUDY DESIGNS Observational / Analytical Studies Case Control Studies Present: Disease Past:
Empowering Evidence: Basic Statistics June 3, 2015 Julian Wolfson, Ph.D. Division of Biostatistics School of Public Health.
SINGLE - CASE, QUASI-EXPERIMENT, AND DEVELOPMENT RESEARCH © 2012 The McGraw-Hill Companies, Inc.
Study design P.Olliaro Nov04. Study designs: observational vs. experimental studies What happened?  Case-control study What’s happening?  Cross-sectional.
ARROW Trial Design Professor Greg Brooks, Sheffield University, Ed Studies Dr Jeremy Miles York University, Trials Unit Carole Torgerson, York University,
1 ‘Making Best Use of Teaching Assistants’ guidance report – Summary of recommendations.
Systematic Review Module 7: Rating the Quality of Individual Studies Meera Viswanathan, PhD RTI-UNC EPC.
EBC course 10 April 2003 Critical Appraisal of the Clinical Literature: The Big Picture Cynthia R. Long, PhD Associate Professor Palmer Center for Chiropractic.
Systematic Reviews By Jonathan Tsun & Ilona Blee.
Programme Information Incredible Years (IY)Triple P (TP) – Level 4 GroupPromoting Alternative Thinking Strategies (PATHS) IY consists of 12 weekly (2-hour)
Clinical Trials: Introduction from an Epidemiologic Study Design Perspective Health Sciences Center Health Sciences Center School of Public Health & Stanley.
Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
WWC Standards for Regression Discontinuity Study Designs June 2010 Presentation to the IES Research Conference John Deke ● Jill Constantine.
Precision Gains from Publically Available School Proficiency Measures Compared to Study-Collected Test Scores in Education Cluster-Randomized Trials June.
National Assessment Arrangements – Changing Times Stephen Anwyll, Head of 3-14 Assessment 14 th September 2011 AAIA Conference, Bournemouth.
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
Objectives  Identify the key elements of a good randomised controlled study  To clarify the process of meta analysis and developing a systematic review.
1 Study Design Issues and Considerations in HUS Trials Yan Wang, Ph.D. Statistical Reviewer Division of Biometrics IV OB/OTS/CDER/FDA April 12, 2007.
1 EEF ‘Making Best Use of Teaching Assistants’ guidance report – summary of recommendations.
Impact of two teacher training programmes on pupils’ development of literacy and numeracy ability: a randomised trial Jack Worth National Foundation for.
1 Module 3 Designs. 2 Family Health Project: Exercise Review Discuss the Family Health Case and these questions. Consider how gender issues influence.
Evaluating Impacts of MSP Grants Ellen Bobronnikov January 6, 2009 Common Issues and Potential Solutions.
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.
REGIONAL EDUCATIONAL LAB ~ APPALACHIA The Effects of Hybrid Secondary School Courses in Algebra 1 on Teaching Practices, Classroom Quality and Adolescent.
Indirect and mixed treatment comparisons Hannah Buckley Co-authors: Hannah Ainsworth, Clare Heaps, Catherine Hewitt, Laura Jefferson, Natasha Mitchell,
Quality First Teaching for All. Quality First Teaching for ALL The most effective way to narrow the gaps! A Top Priority for Schools! Context and Background.
Building an evidence-base from randomised control trials Presentation of the findings of the impact evaluation of the Reading Catch-Up Programme 18 August.
Design of Clinical Research Studies ASAP Session by: Robert McCarter, ScD Dir. Biostatistics and Informatics, CNMC
Characteristics of Studies that might Meet the What Works Clearinghouse Standards: Tips on What to Look For 1.
Effectiveness of Selected Supplemental Reading Comprehension Interventions: Impacts on a First Cohort of Fifth-Grade Students June 8, 2009 IES Annual Research.
Preliminary Results – Not for Citation Investing in Innovation (i3) Fund Evidence & Evaluation Webinar 2015 Update Note: These slides are intended as guidance.
Evaluation in Education: 'new' approaches, different perspectives, design challenges Camilla Nevill Head of Evaluation, Education Endowment Foundation.
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.
Dissemination and scale-up is the key to the EEF’s impact.
The English RCT of ‘Families and Schools Together’
A randomised controlled trial to improve writing quality during the transition between primary and secondary school Natasha Mitchell, Research Fellow Hannah.
Work Scrutiny Charlton Park Academy November 2016.
Conducting Efficacy Trials
Within- Subjects Design
Reporting the evidence:
Narrowing the evaluation gap
Data Literacy Survey results and Data Protocols
Evaluation challenges - Reporting
Presentation transcript:

EEF Evaluators’ Conference 25 th June 2015

Session 1: Interpretation / impact 25 th June 2015

Rethinking the EEF Padlocks Calum Davey Education Endowment Foundation 25 th June 2015

Overview →Background →Problems →Attrition →Power/chance →Testing →Proposal →Discussion

Background Summary of the security of evaluation findings ‘Padlocks’ developed in consultation with evaluators

Background Summary of the security of evaluation findings ‘Padlocks’ developed in consultation with evaluators Group Number of pupils Effect size Estimated months’ progress Evidence strength Literacy intervention (0.03, 0.18)+2

Background Summary of the security of evaluation findings ‘Padlocks’ developed in consultation with evaluators Group Number of pupils Effect size Estimated months’ progress Evidence strength Literacy intervention (0.03, 0.18)+2

Background Summary of the security of evaluation findings ‘Padlocks’ developed in consultation with evaluators Five categories – combined to create overall rating: Group Number of pupils Effect size Estimated months’ progress Evidence strength Literacy intervention (0.03, 0.18)+2 Rating1. Design2. Power (MDES)3. Attrition4. Balance5. Threats to validity 5 Fair and clear experimental design (RCT) < 0.2< 10% Well-balanced on observables No threats to validity 4 Fair and clear experimental design (RCT, RDD) < 0.3< 20% 3 Well-matched comparison (quasi-experiment) < 0.4< 30% 2 Matched comparison (quasi-experiment) < 0.5< 40% 1 Comparison group with poor or no matching < 0.6< 50% 0 No comparator > 0.6> 50% Imbalanced on observables Significant threats

Background

Oxford Improving Numeracy and Literacy Rating1. Design 2. Power (MDES) 3. Attrition4. Balance5. Threats to validity 5 Fair and clear experimental design (RCT) < 0.2< 10% Well-balanced on observables No threats to validity 4 Fair and clear experimental design (RCT, RDD) < 0.3< 20% 3 Well-matched comparison (quasi- experiment) < 0.4< 30% 2 Matched comparison (quasi- experiment) < 0.5< 40% 1 Comparison group with poor or no matching < 0.6< 50% 0 No comparator > 0.6> 50% Imbalanced on observables Significant threats

Act, Sing, Play Rating1. Design 2. Power (MDES) 3. Attrition4. Balance5. Threats to validity 5 Fair and clear experimental design (RCT) < 0.2< 10% Well-balanced on observables No threats to validity 4 Fair and clear experimental design (RCT, RDD) < 0.3< 20% 3 Well-matched comparison (quasi- experiment) < 0.4< 30% 2 Matched comparison (quasi- experiment) < 0.5< 40% 1 Comparison group with poor or no matching < 0.6< 50% 0 No comparator > 0.6> 50% Imbalanced on observables Significant threats

Team Alphie Rating1. Design 2. Power (MDES) 3. Attrition4. Balance5. Threats to validity 5 Fair and clear experimental design (RCT) < 0.2< 10% Well-balanced on observables No threats to validity 4 Fair and clear experimental design (RCT, RDD) < 0.3< 20% 3 Well-matched comparison (quasi- experiment) < 0.4< 30% 2 Matched comparison (quasi- experiment) < 0.5< 40% 1 Comparison group with poor or no matching < 0.6< 50% 0 No comparator > 0.6> 50% Imbalanced on observables Significant threats

Problems : power MDES at baseline MDES changes Confusion with p-values and CIs: –Effect bigger than MDES! E.g. Calderdale ES=0.74, MDES <0.5 –P-value < 0.05! E.g. Butterfly Phonics ES=0.43, p 0.5 Rating 2. Power (MDES) 5< 0.2 4< 0.3 3< 0.4 2< 0.5 1< 0.6 0> 0.6

Problems : attrition Calculated overall at the level of randomisation 10% pupils off school each day Disadvantages individually- randomised: –Act, Sing, Play (pupil): 0% attrition at school or class level, 10% at pupil level –Oxford Science (school): 3% attrition at school level, 16% at pupil level Are the levels right? Rating3. Attrition 5< 10% 4< 20% 3< 30% 2< 40% 1< 50% 0> 50%

Problems : testing Lots of testing administered by teachers Teachers rarely blinded to intervention status What is the threat to validity when effect sizes are small? Rating 5. Threats to validity 5 No threats to validity Significant threats

Potential solution? Assess ‘chance’ as well as MDES in padlock? Assess attrition at pupil level for all trials? Randomise invigilation of testing to assess bias? Number of pupils (number with intervention) Confidence interval for months progress?

Discussion Can p-values, confidence intervals, power, sample size, etc. could be combined in measure of ‘chance’? What are the advantages and disadvantages of reporting confidence intervals alongside the security rating? Is it right to include all attrition in the security rating? What potential disadvantages are there? What is the more appropriate way to ensure unbiasedness in testing? Would it be possible to conduct a trial across evaluations?