1. Statistical power in educational settings Workshop at Wellcome seminar on educational research, May 2008 Dylan Wiliam Institute of Education, University.

Slides:



Advertisements
Similar presentations
Improving outcomes and closing achievement gaps: the role of assessment Dylan Wiliam UCET Symposium March 2009, Belfast
Advertisements

1 Developing a Research Question Partially adapted from The Research Methods Knowledge Base, William Trochim (2006). & Methods for Social Researchers in.
1 Valid and Invalid arguments. 2 Definition of Argument Sequence of statements: Statement 1; Statement 2; Therefore, Statement 3. Statements 1 and 2 are.
Getting serious…and realistic…about school improvement Dylan Wiliam GTC(E) Connect National Conference 19 June 2008, London
Adapting Designs Professor David Torgerson University of York Professor Carole Torgerson Durham University.
Roger D. Goddard, Ph.D. March 21, Purposes Overview of Major Research Grants Programs Administered by IES; Particular Focus on the Education Research.
Formative Assessment and Student Achievement: Two Years of Implementation of the Keeping Learning on Track® Program Courtney Bell (ETS) Jonathan Steinberg.
Pre-analysis plans Module 8.3. Recap on statistics If we find a result is significant at the 5% level, what does this mean? – there is a 5% or less probability.
Trends in International Mathematics and Science Study (TIMSS)
Once you know what they’ve learned, what do you do next? Designing curriculum and assessment for growth Dylan Wiliam Institute of Education, University.
Wednesday, November 14 Statistical Power. Wednesday, November 14 Why Statistical Power? It teaches you about the importance of effect size.  = d x f.
Funding Opportunities at the Institute of Education Sciences Elizabeth R. Albro, Ph.D. Associate Commissioner Teaching and Learning Division National Center.
Statistics By Z S Chaudry. Why do I need to know about statistics ? Tested in AKT To understand Journal articles and research papers.
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Science Achievement and Student Diversity Okhee Lee School of Education University of Miami National Science Foundation (Grant No. REC )
Program Evaluation Debra Spielmaker, PhD Utah State University School of Applied Sciences, Technology & Education - Graduate Program Advisor USDA-NIFA,
WHAT DO WE KNOW ABOUT SOCIAL INCLUSION?. SOCIAL INCLUSION Social inclusion is a process which ensures that those at risk of poverty and social exclusion.
Chapter 10, Part C. III. Matched Samples This test is conducted twice with the same sample and results are compared. For example, you might have two production.
Sarah McManus, Ph.D. Director, Learning Systems NC Department of Public Instruction.
Estimation and Hypothesis Testing Now the real fun begins.
Office of Institutional Research, Planning and Assessment January 24, 2011 UNDERSTANDING THE DIAGNOSTIC GUIDE.
Reflections on pedagogy Dylan Wiliam Pedagogy, Space, Place Conference November
Moving from Development to Efficacy & Intervention Fidelity Topics National Center for Special Education Research Grantee Meeting: June 28, 2010.
Academic Viva POWER and ERROR T R Wilson. Impact Factor Measure reflecting the average number of citations to recent articles published in that journal.
CategoryAdvancedProficientDevelopingNeeds Improvement 5.6D Design an experiment that tests the effect of force on an object. In addition to the proficient.
Association for Behavior Analysis International 36 th Annual Convention Teacher Induction: Where the Rubber Meets the Road Randy Keyworth.
What Was Learned from a Second Year of Implementation IES Research Conference Washington, DC June 8, 2009 William Corrin, Senior Research Associate MDRC.
Individual values of X Frequency How many individuals   Distribution of a population.
The Effect of Student Teaching on Pre-service PE Teachers’ Efficacy Beliefs Zan Gao, University of Utah, Salt Lake City, UT Zan Gao, University of Utah,
Laying the Foundation for Scaling Up During Development.
Improving Equity & Quality of Education in Thailand Using the results from International and National Assessments [PISA, TIMSS, & National Test]
Sampling for an Effectiveness Study or “How to reject your most hated hypothesis” Mead Over, Center for Global Development and Sergio Bautista, INSP Male.
Chapter 16 Data Analysis: Testing for Associations.
1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of.
Goals of the Workshop BUILD A COMMUNITY INTERESTED IN EVALUATION DEVELOP APPROACHES TO EVALUATION THAT COMBINE ACCOUNTABILITY AND LEARNING USE EVALUATIONS.
Critical Issues in Formative Assessment NCME conference, April 2013, San Francisco, CA.
 Descriptive Methods ◦ Observation ◦ Survey Research  Experimental Methods ◦ Independent Groups Designs ◦ Repeated Measures Designs ◦ Complex Designs.
Chapter 10 The t Test for Two Independent Samples
Impact of two teacher training programmes on pupils’ development of literacy and numeracy ability: a randomised trial Jack Worth National Foundation for.
CREATE – National Evaluation Institute Annual Conference – October 8-10, 2009 The Brown Hotel, Louisville, Kentucky Research and Evaluation that inform.
ARG symposium discussion Dylan Wiliam Annual conference of the British Educational Research Association; London, UK:
Developing an evaluation of professional development Webinar #2: Going deeper into planning the design 1.
LAPS symposium discussion Dylan Wiliam Annual conference of the British Educational Research Association; London, UK:
Effectiveness of Selected Supplemental Reading Comprehension Interventions: Impacts on a First Cohort of Fifth-Grade Students June 8, 2009 IES Annual Research.
Open Forum: Scaling Up and Sustaining Interventions Moderator: Carol O'Donnell, NCER
NAEP 2005 Reading and Mathematics Results. Overview of the 2005 Reading and Mathematics Assessment 1.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
11 PIRLS The Trinidad and Tobago Experience Regional Policy Dialogue on Education 2-3 December 2008 Harrilal Seecharan Ministry of Education Trinidad.
Project VIABLE - Direct Behavior Rating: Evaluating Behaviors with Positive and Negative Definitions Rose Jaffery 1, Albee T. Ongusco 3, Amy M. Briesch.
Introduction to Power and Effect Size  More to life than statistical significance  Reporting effect size  Assessing power.
1 Testing Various Models in Support of Improving API Scores.
Lauren Drelicharz & Brian McKevitt
Parameter Estimation.
Johan Mouton & Cornel Hart 2006 & 2007
Chapter 5: Assessment and Accountability
Assessment for Learning (AfL)
Seminar on the importance of Education Research and Innovation
Layers of Evaluation & Meaningfulness of Results
Dylan Wiliam, Institute of Education, University of London
Monday, November 8 Statistical Power.
Wednesday, November 16 Statistical Power
Monday, November 30 Statistical Power.
Designing an assessment system
Making Good Progress? “But the PISA tests of mathematics, literacy and science are drawing from the same domains as the GCSEs in those subjects. If pupils.
Designing an assessment system
Data is not a dirty word! Rebecca Mead.
Why We Should be Skeptical about the Common Core
An Introduction to Evaluating Federal Title Funding
If there is any case in which true premises lead to a false conclusion, the argument is invalid. Therefore this argument is INVALID.
If there is any case in which true premises lead to a false conclusion, the argument is invalid. Therefore this argument is INVALID.
Presentation transcript:

1

Statistical power in educational settings Workshop at Wellcome seminar on educational research, May 2008 Dylan Wiliam Institute of Education, University of London

3 The argument… Premise 1  Learning is insensitive to instruction  Measures of learning even more so  So even small system-wide gains in learning are educationally important Premise 2  Education systems are inherently multi-levelled  Taking account of clustering in data lowers statistical power  Educational experiments are inherently weak Conclusion  RCTs in education frequently need to be very large, and therefore expensive

4 Learning is slow… Source: Leverhulme Numeracy Research Programme =?

5 …especially for deep learning… Hart, 1981

6 …and measures are insensitive… Sequential tests of educational progress (ETS, 1957)

7 …and measures are insensitive… NAEP TIMSS

8 …so small gains in learning are worthwhile Average rate of progress of cohorts is 0.3 standard deviations per year Average cost of one year’s education for a cohort in England is £3bn An effect size of 0.05 sd might be regarded as “small” But system-wide, is worth £6bn

9 …but hard to detect… Statistical power: The likelihood that a statistical test will reject a false null hypothesis  Depends on  The level set for statistical significance  The size of the difference between compared groups (effect size)  The sensitivity of the measures Clustering reduces statistical power, but is an inherent feature of educational settings, and especially for school-wide interventions  Teacher quality  Ability grouping

10 …especially in educational settings (Konstantopoulos,2006) p = #students n = #classrooms  = effect size  c = classroom clustering  s = school clustering

11 So… The most important question is not “Are RCTs good?” but “When are RCTs good?” How should we answer?

12 Institute of Education Sciences (USA) Five goals 1.identify existing programs, practices, and policies that may have an impact on student outcomes and the factors that may mediate or moderate the effects of these programs, practices, and policies; 2.develop programs, practices, and policies that are theoretically and empirically based; 3.evaluate the efficacy of fully developed programs, practices, and policies; 4.evaluate the impact of programs, practices, and policies implemented at scale; 5.develop and/or validate data and measurement systems and tools.