How do we know what works? Robert Coe ResearchEd, London, 5 Sept 2015.

Slides:

Advertisements

Similar presentations

Ed-D 420 Inclusion of Exceptional Learners. CAT time Learner-Centered - Learner-centered techniques focus on strategies and approaches to improve learning.

Advertisements

Appraisal of an RCT using a critical appraisal checklist

Designing an impact evaluation: Randomization, statistical power, and some more fun…

What you don’t see can’t matter? The effects of unobserved differences on causal attributions Robert Coe CEM, Durham University Randomised Controlled Trials.

Donald T. Simeon Caribbean Health Research Council

Mywish K. Maredia Michigan State University

#ieGovern Impact Evaluation Workshop Istanbul, Turkey January 27-30, 2015 Measuring Impact 1 Non-experimental methods 2 Experiments Vincenzo Di Maro Development.

Mark Troy – Data and Research Services –

A Guide to Education Research in the Era of NCLB Brian Jacob University of Michigan December 5, 2007.

Sample size issues & Trial Quality David Torgerson.

Robert Coe Neil Appleby Academic mentoring in schools: a small RCT to evaluate a large policy Randomised Controlled trials in the Social Sciences: Challenges.

Research-Led Approaches to Increasing Pupil Learning BOWDEN ROOM.

Michelle O’Reilly. Quantitative research is outcomes driven Qualitative research is process driven Please offer up your definitions.

Adapting Designs Professor David Torgerson University of York Professor Carole Torgerson Durham University.

The counterfactual logic for public policy evaluation Alberto Martini hard at first, natural later 1.

Roger D. Goddard, Ph.D. March 21, Purposes Overview of Major Research Grants Programs Administered by IES; Particular Focus on the Education Research.

Standard setting and maintenance for Reformed GCSEs Robert Coe.

Designing Influential Evaluations Session 5 Quality of Evidence Uganda Evaluation Week - Pre-Conference Workshop 19 th and 20 th May 2014.

Mixed methods in health services research: Pitfalls and pragmatism Robyn McDermott Mixed Methods Seminar JCU October 2014.

The use of administrative data in Randomised Controlled Trials (RCT’s) John Jerrim Institute of Education, University of London.

Summarising findings about the likely impacts of options Judgements about the quality of evidence Preparing summary of findings tables Plain language summaries.

What makes great teaching?

DECO3008 Design Computing Preparatory Honours Research KCDCC Mike Rosenman Rm 279

Math Candel Maastricht University. 1.Internal validity Do your conclusions reflect the “true state of nature” ? 2.External validity or generalizability.

Health care decision making Dr. Giampiero Favato presented at the University Program in Health Economics Ragusa, June 2008.

Evaluation of Math-Science Partnership Projects (or how to find out if you’re really getting your money’s worth)

Measuring Progress: Strategies for Monitoring and Evaluation Rebecca Stoltzfus.

1 Perspectives on the Future of Assessment in England and Internationally Robert Coe CEM conference, 25th January 2012.

The Research Process Interpretivist Positivist

Research and Evaluation Center Jeffrey A. Butts John Jay College of Criminal Justice City University of New York August 7, 2012 How Researchers Generate.

Critical Reading. Critical Appraisal Definition: assessment of methodological quality If you are deciding whether a paper is worth reading – do so on.

Research methods in clinical psychology: An introduction for students and practitioners Chris Barker, Nancy Pistrang, and Robert Elliott CHAPTER 8 Foundations.

AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Non-Experimental Methods Florence Kondylis.

Designing in and designing out: strategies for deterring student plagiarism through course and task design Jude Carroll, Oxford Brookes University 22 April.

Quasi Experimental Methods I Nethra Palaniswamy Development Strategy and Governance International Food Policy Research Institute.

Designing a Random Assignment Social Experiment In the U.K.; The Employment Retention and Advancement Demonstration (ERA)

Causal Inference and Adequate Yearly Progress Derek Briggs University of Colorado at Boulder National Center for Research on Evaluation, Standards, and.

ARROW Trial Design Professor Greg Brooks, Sheffield University, Ed Studies Dr Jeremy Miles York University, Trials Unit Carole Torgerson, York University,

University of Durham D Dr Robert Coe University of Durham School of Education Tel: (+44 / 0) Fax: (+44 / 0)

H860 Reading Difficulties Week 7 Reading Interventions: How Do They Weigh Up?

 2008 Johns Hopkins Bloomberg School of Public Health Evaluating Mass Media Anti-Smoking Campaigns Marc Boulay, PhD Center for Communication Programs.

Problems with the Design and Implementation of Randomized Experiments By Larry V. Hedges Northwestern University Presented at the 2009 IES Research Conference.

STANDARDS OF EVIDENCE FOR INFORMING DECISIONS ON CHOOSING AMONG ALTERNATIVE APPROACHES TO PROVIDING RH/FP SERVICES Ian Askew, Population Council July 30,

Introduction To Evidence Based Nursing By Dr. Hanan Said Ali.

Beyond surveys: the research frontier moves to the use of administrative data to evaluate R&D grants Oliver Herrmann Ministry of Business, Innovation.

How to find a paper Looking for a known paper: –Field search: title, author, journal, institution, textwords, year (each has field tags) Find a paper to.

ResearchED Time for a reality check? Robert Coe ResearchED Research Leads Network Day, 13 December 2014.

Critical Reading. Critical Appraisal Definition: assessment of methodological quality If you are deciding whether a paper is worth reading – do so on.

Comments on Tradeoffs and Issues William R. Shadish University of California, Merced.

1 Health and Disease in Populations 2002 Session 8 – 21/03/02 Randomised controlled trials 1 Dr Jenny Kurinczuk.

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.

1 Lecture 11: Cluster randomized and community trials Clusters, groups, communities Why allocate clusters vs individuals? Randomized vs nonrandomized designs.

The challenge of evidence-based policy Mediation and Alternative Dispute Resolution – Outcomes and Impacts Swansea University 14 th February 2013 Professor.

How to design and deliver a successful evaluation 19 th October 2015 Sarah Lynch, Senior Research Manager National Foundation for Educational Research.

Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.

Guidelines Recommandations. Role Ideal mediator for bridging between research findings and actual clinical practice Ideal tool for professionals, managers,

Bilal Siddiqi Istanbul, May 12, 2015 Measuring Impact: Non-Experimental Methods.

What is Impact Evaluation … and How Do We Use It? Deon Filmer Development Research Group, The World Bank Evidence-Based Decision-Making in Education Workshop.

Evidence-Based Mental Health PSYC 377. Structure of the Presentation 1. Describe EBP issues 2. Categorize EBP issues 3. Assess the quality of ‘evidence’

Evaluation Nicola Bowtell. Some myths about evaluation 2Presentation title - edit in Header and Footer always involves extensive questionnaires means.

Monitoring and evaluation 16 July 2009 Michael Samson UNICEF/ IDS Course on Social Protection.

Cohort Study Evaluation Irina Ibraghimova

Driving Research Group

Randomized Control Trials

Evidence Based Practice Process

Health care decision making

Q&A – studying medicine or health-related topics at university

Building a Strong Outcome Portfolio

How do we know what works?

Better Mental Health for All: Strengthening Public Health Practice

Presentation transcript:

How do we know what works? Robert Coe ResearchEd, London, 5 Sept 2015

∂ 2

∂ How do we know what works?  Progress in evidence-based education  Defining ‘what works’  The case for RCTs  Some standard objections  When ‘what works’ doesn’t work  Practical implications 3

∂ 4

∂ How far have we come? 1999  Very few UK education researchers who had done RCTs  Dominant view: you can’t (or shouldn’t) do RCTs in education  Very limited policy interest in robust evaluation 2015  Growing, sustainable body of UK researchers with education RCT expertise  EEF funding changed those views  Policy interest excellent in parts 5

∂ To claim something ‘works’  Is there a choice between two (or more) plausible options? –Well-defined (inc how to implement) –Repeatable, generalisable, transferable –Feasible, acceptable, equipoise  Can we agree what outcome(s) are important? –Value judgements resolved or explicit –Valid measurement process  Is there rigorous systematic evidence to support one choice? –Systematic review –Overall average difference & ‘moderators’ 6

∂ 7

∂ In a ‘research-based’ profession: Professionals would, for some decisions they need to take, be able to access and understand high- quality evidence that particular courses of action would be likely to lead to better outcomes than others. Coe Professionals would, for the majority of decisions they need to take, be able to find and access credible research studies that provided evidence that particular courses of action that would, implemented as directed, be substantially more likely to lead to better outcomes than others. Wiliam (2014)

∂ 9

∂ From Corder et al (2015) International Journal of Behavioral Nutrition and Physical Activity  Appropriately cautious claims: –“An extra hour of screen time was associated with 9.3(−14·3,-4·3) fewer [GCSE] points” –“it would be impossible to tell whether reductions in screen time caused an increase in academic performance without a randomised controlled trial”  But also some implicit causal claims –“Screen time was associated with lower academic performance, suggesting that strategies to limit screen behaviours among adolescents may benefit academic performance” 10

∂ Media quotes  But even if pupils spent more time studying, more time spent watching TV or online, still harmed their results, the analysis suggested.  "We believe that programmes aimed at reducing screen time could have important benefits for teenagers' exam grades, as well as their health," said Dr Van Sluijs  “We found that TV viewing, computer games and internet use were detrimental to academic performance” 11

∂ Is screen time the cause of poorer GCSEs?  A statistical association can be evidence for causal relationship –if other explanations for the relationship have been systematically generated, tested and discredited –Eg smoking and cancer  But even high correlations, sophisticated models and ‘strong’ controls do not guarantee this –Coe (2009) “What appeared to the original researchers to be substantial and unequivocal causal effects were reduced to tiny and uncertain differences when the effects of plausible unobserved differences were taken into account.”Coe (2009)  In this study –No control for any prior cognitive measure –Weak control for SES (IMD from LSOAs) –Many obvious alternative explanations 12

∂ Is screen time the cause of poorer GCSEs?  It is a meaningless question: –What are the well-defined, feasible, repeatable options for action?  A question you could answer: –Does intervention X to reduce the time 14-year- olds spend on non-educational screen time lead to increases in their GCSEs?  Related questions –Does X actually reduce screen time? –What support factors are required for it to work? –Does it work more/less with some groups? 13

Is the RCT a gold standard? 14

∂ Claim: If you do A, it will improve B Evidence: We did A and it improved B 1.Would B have improved anyway? (counterfactual) 2.Was it really A? (attribution) 3.Did B really improve? (interpretation) 4.Will it work again for me? (generalisation)

∂ 1. Would B have improved anyway? (counterfactual)  Was there an equivalent, randomly allocated comparison group? –Randomisation done properly? –Beware attrition  Was there a comparison group, equivalent on observed measures? –Quality of the measures? –Quantity of the measures (inc repeated measures)? –Unobserved differences (eg enthusiasm, choice)?  Was there a non-equivalent comparison group? –Select an overlapping subset (propensity score matching) –Statistical ‘control’ is problematic  If no direct comparison: impossible to interpret

∂ 2. Was it really A? (attribution)  Could the process of being observed or involved in an experiment have caused it (reactivity/Hawthorne effects)?  Could the involvement of the researcher or developer be a factor?  Contamination: what did control group do?  Did they actually do A (faithfully)?  Did other things change?

∂ 3. Did B really improve? (interpretation)  Was the measure of B adequate? –Validity of measure (biased, unreliable, misinterpreted) –Too narrow/broad –Ceiling/floor effects –Un-blinded judgements  Was the ‘post-test’ timing too soon/late?  Any attrition? (missing data, lost persons, units)  Could it have been just chance? (statistical significance)  Was the reporting comprehensive and unbiased? –data dredging –selective reporting –publication bias

∂ 4. Will it work again for me? (generalisation)  Representativeness –Context (including support factors) –Population (achieved, not just intended)  Intervention not specified or replicable  Will it still work at a large scale?

∂ Claim: If you do A, it will improve B Evidence: We did A and it improved B 1.Would B have improved anyway? (counterfactual) 2.Was it really A? (attribution) 3.Did B really improve? (interpretation) 4.Will it work again for me? (generalisation)  Does RCT help?

∂ Some standard objections to RCTs  Causation –Social world is too complex / Humans are free agents  Values –Positivism requires objective, value-free stance  Generalisation –Every context is unique  Too hard –Problems with RCTs: clustering, power, file-drawer, wrong questions, moderators, wrong outcomes, etc  Not my thing 21

When ‘what works’ doesn’t work … 22

∂ What should have worked  Durham Shared Maths (EEF)  California class size (Cartwright & Hardie, 2012 )  Scale-up (Slavin & Smith 2008)  AfL 23

∂ What should you do?  Don’t ignore the evidence just because it is imperfect: understand the limitations and help to improve it  Simple, superficial knowledge of research evidence may not improve decision making: deep, integrated understanding is required  Routinely monitor the effectiveness of your practice  Evaluate the impact of any changes you make 24 ‘Four Aces’ from rEdScot: Experience, Data, Feedback, Research