Building Evidence in Education: Conference for EEF Evaluators 11th July: Theory 12th July: Practice www.educationendowmentfoundation.org.uk.

Slides:



Advertisements
Similar presentations
Appraisal of an RCT using a critical appraisal checklist
Advertisements

© Institute for Fiscal Studies Evaluation design for Achieve Together Ellen Greaves and Luke Sibieta.
∂ What works…and who listens? Encouraging the experimental evidence base in education and the social sciences RCTs in the Social Sciences 9 th Annual Conference.
Sample size issues & Trial Quality David Torgerson.
Building Evidence in Education: Workshop for EEF evaluators 2 nd June: York 6 th June: London
Experimental evaluation in education Professor Carole Torgerson School of Education, Durham University, United Kingdom International.
Adapting Designs Professor David Torgerson University of York Professor Carole Torgerson Durham University.
Conference for EEF evaluators: Building evidence in education Hannah Ainsworth, York Trials Unit, University of York Professor David Torgerson, York Trials.
Why to Randomize a Randomized Controlled Trial? (and how to do it) John Matthews University of Newcastle upon Tyne.
Building Evidence in Education: Conference for EEF Evaluators 11 th July: Theory 12 th July: Practice
What makes a good quality trial? Professor David Torgerson York Trials Unit.
Reading the Dental Literature
Using evidence to raise the attainment of children facing disadvantage James Richardson Senior Analyst, Education Endowment Foundation 1 st April 2014.
Elements of a clinical trial research protocol
Who are the participants? Creating a Quality Sample 47:269: Research Methods I Dr. Leonard March 22, 2010.
Non-Experimental designs: Developmental designs & Small-N designs
Allocation Methods David Torgerson Director, York Trials Unit
A randomised controlled trial to improve writing quality during the transition between primary and secondary school Natasha Mitchell, Research Fellow Hannah.
Sample size calculations
Sample Size Annie Herbert Medical Statistician Research & Development Support Unit Salford Royal Hospitals NHS Foundation Trust
Chapter 7 Correlational Research Gay, Mills, and Airasian
Experimental Research
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Addressing educational disadvantage, sharing evidence, finding out what works Camilla Nevill Evaluation Manager.
Discussion Gitanjali Batmanabane MD PhD. Do you look like this?
Program Evaluation Using qualitative & qualitative methods.
Non Experimental Design in Education Ummul Ruthbah.
Copyright © 2010 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
Experiments and Observational Studies. Observational Studies In an observational study, researchers don’t assign choices; they simply observe them. look.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Experiments and Observational Studies.
Dr. Tracey Bywater Dr. Judy Hutchings The Incredible Years (IY) Programmes: Programmes for children, teachers & parents were developed by Professor Webster-Stratton,
Building Evidence in Education: Conference for EEF Evaluators 11 th July: Theory 12 th July: Practice
Does Formative Feedback Help or Hinder Students? An Empirical Investigation 2015 DEE Conference Carlos Cortinhas, University of Exeter.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Criteria for Assessing The Feasibility of RCTs. RCTs in Social Science: York September 2006 Today’s Headlines: “Drugs education is not working” “ having.
ARROW Trial Design Professor Greg Brooks, Sheffield University, Ed Studies Dr Jeremy Miles York University, Trials Unit Carole Torgerson, York University,
Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Copyright ©2008 by Pearson Education, Inc. Pearson Prentice Hall Upper Saddle River, NJ Foundations of Nursing Research, 5e By Rose Marie Nieswiadomy.
Correlational Research Chapter Fifteen Bring Schraw et al.
Selecting and Recruiting Subjects One Independent Variable: Two Group Designs Two Independent Groups Two Matched Groups Multiple Groups.
LT 4.2 Designing Experiments Thanks to James Jaszczak, American Nicaraguan School.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
N318b Winter 2002 Nursing Statistics Lecture 2: Measures of Central Tendency and Variability.
1 Chapter Two: Sampling Methods §know the reasons of sampling §use the table of random numbers §perform Simple Random, Systematic, Stratified, Cluster,
Gile Sampling1 Sampling. Fundamental principles. Daniel Gile
Adaptive randomization
Background Treatment fidelity in group based parent training: Predicting change in parent and child behaviour Dr. Catrin Eames, Bangor University, UK
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
One-Way Analysis of Covariance (ANCOVA)
Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.
Impact of two teacher training programmes on pupils’ development of literacy and numeracy ability: a randomised trial Jack Worth National Foundation for.
Developing teaching as an evidence informed profession UCET Annual Conference Kevan Collins - Chief Executive
Pilot and Feasibility Studies NIHR Research Design Service Sam Norton, Liz Steed, Lauren Bell.
HTA Efficient Study Designs Peter Davidson Head of HTA at NETSCC.
1 Chapter 11 Understanding Randomness. 2 Why Random? What is it about chance outcomes being random that makes random selection seem fair? Two things:
Talk Boost A targeted intervention for 4-7 year olds with language delay Wendy Lee Professional Director, The Communication Trust Mary Hartshorne Head.
EEF Evaluators’ Conference 25 th June Session 1: Interpretation / impact 25 th June 2015.
Evaluation in Education: 'new' approaches, different perspectives, design challenges Camilla Nevill Head of Evaluation, Education Endowment Foundation.
The English RCT of ‘Families and Schools Together’
Intervention Development in Elderly Adults (IDEA)
CHAPTER 4 Designing Studies
Conducting Efficacy Trials
12 Inferential Analysis.
Statistical Analysis Plan review
Quasi-Experimental Design
Chapter 4: Designing Studies
12 Inferential Analysis.
Applying to the EEF for funding: What are we looking for
Presentation transcript:

Building Evidence in Education: Conference for EEF Evaluators 11th July: Theory 12th July: Practice www.educationendowmentfoundation.org.uk

£200m estimated spend over lifetime of the EEF The EEF by numbers 1,800 schools participating in projects 33 topics in the Toolkit 300,000 pupils involved in EEF projects 11 members of EEF team 16 independent evaluation teams £200m estimated spend over lifetime of the EEF 3,000 heads presented to since launch 56 projects funded to date

Research Design Stephen Gorard s.a.c.gorard@durham.ac.uk http://www.evaluationdesign.co.uk/

Outline of a full cycle of research

A model of causation in social science Association - For X (a possible cause) and Y (a possible effect) to be in a causal relationship they must be repeatedly associated. This association must be strong and clearly observable. It must be replicable, and it must be specific to X and Y. Sequence – X and Y must proceed in sequence. X must always precede Y (where both appear), and the appearance of Y must be safely predictable from the appearance of X. Intervention - It must have been demonstrated repeatedly that an intervention to change the strength or appearance of X strongly and clearly changes the strength or appearance of Y. Explanatory mechanism - There must a coherent mechanism to explain the causal link. This mechanism must be the simplest available without which the evidence cannot be explained. Put another way, if the proposed mechanism were not true then there must be no simpler or equally simple way of explaining the evidence for it.

Red herrings and real problems Red herrings and real problems. Some reflections on the evaluation of Aimhigher http://www.heacademy.ac.uk/assets/documents/aim_higher/Aspire-Reflections_on_evaluation_of_Aimhigher.doc In an influential review of Widening Participation (WP) research written for the HEFCE and published in July 2006, Gorard et al (2006) have harshly criticised the evaluation of WP initiatives. In their view, to date no convincing evidence of impact has been produced on pre-entry interventions for school pupils and partnership-based interventions, such as Aimhigher. Gorard et al’s criticisms were addressed by the HEFCE in another review of WP research published later in the same year, in November 2006, and based on a survey of the evidence collected by the HEIs. It reasserted the value [of Aimhigher and other WP initiatives] as a monitoring and evaluating device and emphasised that, to date, attitudes of learners and teachers have been consistently and overwhelmingly positive. HEFCE feels satisfied that convincing and precise evidence has been produced on attainment by the national evaluation carried out by the National Foundation for Educational Research (NFER), and, to a lesser extent, on HE participation by the NFER and the HEIs. For example, it has been found that participating in Aimhigher activities was associated with ‘[a]n average improvement of 2.5 points in GCSE total point scores’ and a ‘3.9 percentage point increase in Year 11 pupils intending to progress to HE’ (HEFCE 2006: 23). Moreover, ‘[i]f the ‘evidence bar’ is set too high’, the HEFCE (2006: 6-7) pointed out, ‘we run the risk of discouraging any attempt to estimate the effectiveness of the interventions’. There seems no scope for setting up a social science experiment in which the experiences of a wp group is compared with a control group.

Session 1: Part 2: Trial design (45 mins.) Professor David Torgerson Director, York Trials Unit, University of York david.torgerson@york.ac.uk Professor Carole Torgerson School of Education, Durham University carole.torgerson@durham.ac.uk

2008 Palgrave Macmillan

Key design issues Independent concealed randomisation Type of randomisation Types of trials Sample size Regression discontinuity design

Independent concealed randomisation One of the most important issues is the need to undertake independent allocation. Many methodological studies have shown that unless someone who is disinterested in the trial results undertakes the randomisation there is a serious risk of bias. In health trials it is the source of bias that has the most evidence.

Subversion of a health RCT

Hewitt et al. BMJ;2005:.

Type of randomisation Simple or restricted? Simple, similar to tossing a coin Advantages: difficult to go wrong; with large samples (n > 100) and combined with ANCOVA is efficient Disadvantages: for small samples can produce imbalance and inefficiency in analysis. Restricted, ensures better balance Advantages: gets better balance and more efficient for small samples Disadvantages: more complicated; can go wrong

Restricted allocation Minimisation Not strictly randomisation; uses algorithm to ensure balance in covariates Stratified Using blocks of repeating allocations produces balance on 1 or 2 variables Matched pairs Matches units (e.g., schools) and allocates one to each group; can reduce power in some cases and has other disadvantages

Discussion (5 mins.) Discuss how randomisation was undertaken in your EEF trial(s) and note whether this was independent and concealed, and whether it was restricted. If so, what method was used?

Types of trial Individual randomisation Cluster design Stepped wedge Most powerful design for given sample size Cluster design Randomises groups of individuals (classes; schools; periods of time; geographical areas) Stepped wedge Type of cluster design; randomises order of implementation so all schools eventually receive intervention

Individual allocation Appropriate when it is possible to separate intervention and control conditions DISCOVER summer school evaluation using individual randomisation as control children cannot gain access to intervention Many educational interventions are delivered at class or school level – so can’t use individual allocation

Variations on a theme Factorial designs Unequal allocation Two trials for the price of one Unequal allocation When the sample size is fixed equal allocation best; when costs are fixed unequal best – DISCOVER using unequal allocation for intervention to ensure efficient use of summer school resources.

Individual RCT: key points Trial registration Pre-test BEFORE randomisation Independent allocation Spill over/contamination must not exceed 30% or cluster allocation more efficient Post-testing done blindly or in exam conditions, marking done blindly Primary outcome specified before analysis Statistical analysis plan written and approved before data are examined

Cluster allocation More complex to design than individual RCT Many educational interventions need to use cluster allocation Cluster allocation usually avoids contamination and can make intervention delivery logistically easier

Cluster allocation: additional key points Small number of clusters – so usually need to use restricted randomisation Need to recruit participants and pre-test BEFORE cluster allocation Teachers must be linked to class BEFORE randomisation Analysis and sample size need to take clustering into account Best to have large numbers of clusters with small numbers per cluster than few clusters with large numbers

Variations on a theme What level of randomisation? Balanced design Pupil > class > year > school Balanced design An efficient design is a balanced approach – Year 7 gets intervention in half schools and Year 8 gets intervention in other schools with each school’s adjacent year acting as control Or Year 7 in intervention schools get literacy intervention and Year 7s in control get maths Split plot Cluster level allocation followed by individual randomisation. A form of factorial. Exeter evaluation using partial split plot

Stepped wedge A form of cluster design, which may be more efficient than standard cluster design If we have 12 schools all are pre-tested; 4 randomised for first 6 months and all tested; another 4 are given intervention and all tested; final 4 given intervention and all tested Requires testing at every point

Discussion (5 mins.) Discuss the trial designs that have been used and the challenges associated with them.

Sample size calculation Most interventions will not work very well. Effect sizes of 0.20 to 0.3 – likely Effect sizes 0.30 to 0.50 – unusual Effect sizes >0.50 – very unlikely Need large sample sizes to detect modest differences. Example: 512 for 0.25; 800 for 0.20 (not clustered design) Powerful covariate can reduce this 0.70 correlation reduces sample size by 50%

How to do it? Free programmes on line PSPower; Optimal Design Software In your head (back of envelope) using approximation formula (i.e., 32/Effect Size squared) Fixed sample size Still good practice to estimate likelihood of difference.

Pilot trials – sample size Modelling study suggests that a study with 10% of the main study’s sample will produce a 1 sided 80% confidence interval that will include the ‘true’ estimate if it exists Cocks K, Torgerson DJ. Sample size calculations for pilot randomised trials: a confidence interval approach. Journal of Clinical Epidemiology 2013;66:197-201

Discussion Discuss how sample size calculations were undertaken and whether sample sizes are large enough to detect modest differences between groups.

Regression discontinuity Theoretically the most robust, non-randomised approach, is the RD design Rediscovered several times since Thistlewaite and Campbell first described it in the 1960s

What is it? Regression discontinuity, sometimes known as risk based cut-off design, selects people into a group on the basis of a measureable continuous variable For example, age, test scores, waiting list, income

How does it work? Selecting on a pre-test variable we then correlate post test outcomes with the pre-test variable and test to see if there is an interruption, break or discontinuity in the regression line

Effective treatment

Ineffective treatment

Jacob and Lefgren, Rev of Economics and Statistics, 2004,86:226-44. Do summer schools work? Some states in the USA mandate summer schools for children who fall below a certain score in a high stakes test But will sending children off to have extra tuition during their summer break be effective? Because the children chosen are chosen in the basis of a cut point on a quantitative scale this ideal RD territory Jacob and Lefgren, Rev of Economics and Statistics, 2004,86:226-44.

Proportion treated by test scores

Treatment against outcomes

Evaluation of SHINE on secondaries Randomised controlled trial design not possible Regression discontinuity design with ‘tie-breaker randomisation’ Advantages of this design Challenges of this design