Dylan Wiliam (@dylanwiliam) Why teaching isn’t—and probably never will be—a research-based profession (and why that’s a good thing) Dylan Wiliam (@dylanwiliam)

Slides:



Advertisements
Similar presentations
Performance Assessment
Advertisements

Understanding meta-analysis: I think youll find its a bit more complicated than that (Goldacre, 2008) 1.
Designs to Estimate Impacts of MSP Projects with Confidence. Ellen Bobronnikov March 29, 2010.
Formative Assessment and Student Achievement: Two Years of Implementation of the Keeping Learning on Track® Program Courtney Bell (ETS) Jonathan Steinberg.
Why teaching will never be a research-based profession and why that’s a Good Thing Dylan Wiliam 1.
Doing Social Psychology Research
Personality, 9e Jerry M. Burger
Assessment Report Department of Psychology School of Science & Mathematics D. Abwender, Chair J. Witnauer, Assessment Coordinator Spring, 2013.
Goal Understand the impact on student achievement from effective use of formative assessment, and the role of principals, teachers, and students in that.
McGraw-Hill © 2006 The McGraw-Hill Companies, Inc. All rights reserved. The Nature of Research Chapter One.
Applying Educational Research A Practical Guide EdAd 692 Research in Educational Leadership.
1 Historical Perspective... Historical Perspective... Science Education Reform Efforts Leading to Standards-based Science Education.
CT 854: Assessment and Evaluation in Science & Mathematics
Nursing research Is a systematic inquiry into a subject that uses various approach quantitative and qualitative methods) to answer questions and solve.
QUANTITATIVE RESEARCH Presented by SANIA IQBAL M.Ed Course Instructor SIR RASOOL BUKSH RAISANI.
Developing a Review Protocol. 1. Title Registration 2. Protocol 3. Complete Review Components of the C2 Review Process.
Tier III Implementation. Define the Problem  In general - Identify initial concern General description of problem Prioritize and select target behavior.
Experiments. The essential feature of the strategy of experimental research is that you… Compare two or more situations (e.g., schools) that are as similar.
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.
Enriching Assessment of the Core Albert Oosterhof, Faranak Rohani, & Penny J. Gilmer Florida State University Center for Advancement of Learning and Assessment.
Case Studies and Review Week 4 NJ Kang. 5) Studying Cases Case study is a strategy for doing research which involves an empirical investigation of a particular.
Dylan Wiliam Why and How Assessment for Learning Works
Why teaching will never be a research-based profession (and why that’s a Good Thing) Dylan Wiliam 1.
A. Strategies The general approach taken into an enquiry.
Research methods revision The next couple of lessons will be focused on recapping and practicing exam questions on the following parts of the specification:
Exploring Philosophy During a Time of Reform in Mathematics Education Dr. Kimberly White-Fredette Gordon State College Barnesville, GA.
April Center for Open Fostering openness, integrity, and reproducibility of scientific research.
Practical Steps for Increasing Openness and Reproducibility Courtney Soderberg Statistical and Methodological Consultant Center for Open Science.
Research And Evaluation Differences Between Research and Evaluation  Research and evaluation are closely related but differ in four ways: –The purpose.
Deciphering “Evidence” in the New Era of Education Research Standards Ben Clarke, Ph.D. Research Associate - Center for Teaching and Learning, University.
CHAPTER ONE EDUCATIONAL RESEARCH. THINKING THROUGH REASONING (INDUCTIVELY) Inductive Reasoning : developing generalizations based on observation of a.
Qualitative Research Quantitative Research. These are the two forms of research paradigms (Leedy, 1997) which are qualitative and quantitative These paradigms.
UNIVERSITY OF LUSAKA (UNILUS)
Towards a Comprehensive Meaning for Formative Assessment: The Case of Mathematics Athanasios Gagatsis, Theodora Christodoulou, Paraskevi Michael-Chrysanthou,
Issues in Evaluating Educational Research
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.
Peeter Mehisto, UCL Institute of Education
March 13, 2014 RS and GISc Institute of Space Technology
Jeff Pavlacic, Nicholas Maxwell Faculty Advisor: Dr. Erin Buchanan
Cognitive feedback Public Administration and Policy
Are Evidence-Based Practice Websites Trustworthy and Accessible?
VALIDITY by Barli Tambunan/
Using Cognitive Science To Inform Instructional Design
Principles of Quantitative Research
Quarterly Meeting Focus
Bryanston Education Summit; 7 June 2017
Intro to Research Methods
Designing Professional Development for Elementary School Teachers
Assist. Prof.Dr. Seden Eraldemir Tuyan
ASSESSMENT OF STUDENT LEARNING
Qualitative Research Quantitative Research.
CHAPTER 2 Research Methods in Industrial/Organizational Psychology
Classroom Assessment Validity And Bias in Assessment.
On the relevance of (more) data-based decision making in education
Teaching with Instructional Software
© 2012 The McGraw-Hill Companies, Inc.
Research proposal MGT-602.
Investigating science
Applying Psychology to Teaching
Assessment of Learning and Assessment for Learning
Research in Psychology
Chapter 1: Introduction to Research on Physical Activity
Consider the Evidence Evidence-driven decision making
Researcher Panel on AFL May 13, 2016
Applying Psychology to Teaching
Features of a Good Research Study
Big Ideas in Behavior Management
Assessment Practices in a Balanced Assessment System
Chapter 4 Summary.
Meta-analysis, systematic reviews and research syntheses
Presentation transcript:

Dylan Wiliam (@dylanwiliam) Why teaching isn’t—and probably never will be—a research-based profession (and why that’s a good thing) Dylan Wiliam (@dylanwiliam) www.dylanwiliamcenter.com www.dylanwiliam.org

Outline What does it mean for a profession to be “research-based”? Why educational research falls short What educational research should do, and how it should do it The role of teachers (and researchers) in educational research

What does it mean to be research-based? In a ‘research-based’ profession: Professionals would, for the majority of decisions they need to take, be able to find and access credible research studies that provided evidence that particular courses of action would, if implemented as directed, be substantially more likely to lead to better outcomes. This may be the case for aspects of medicine This is certainly not the case for teaching

Meta-analysis in education: “I think you’ll find it’s a bit more complicated than that” (Goldacre, 2008)

Understanding meta-analysis A technique for aggregating results from different studies by converting empirical results to a common measure (usually effect size) Standardized effect size is defined as: Problems with meta-analysis The “file drawer” problem Variations in intervention quality Variation in population variability Selection of studies Sensitivity of outcome measures

The file-drawer problem

The importance of statistical power The statistical power of an experiment is the probability that the experiment will yield an effect that is large enough to be statistically significant. This matters because significant results are much more likely to get published In single-level designs, power depends on significance level set (e.g., p<0.05) magnitude of effect size of experiment The power of most social studies experiments is low Psychology: 0.4 (Sedlmeier & Gigerenzer, 1989) Neuroscience: 0.2 (Burton et al., 2013) Education: 0.4 Only lucky experiments get published… And that’s “Why most published research findings are false” (Ioannidis, 2005)

Significant (XKCD 2011)

Consequences of low statistical power fMRI scan of an Atlantic salmon shown 15 photos of humans in social situations for 10 seconds and asked to determine the emotion being displayed Voxelwise statistics on the salmon data were calculated through an ordinary least-squares estimation of the general linear model (GLM) Areas of significant blood oxygen level changes (p<0.001) shown The salmon was not alive at the time of the scanning t-value 4.5 4.0 3.5 3.0 2.5 Bennett, Baird, Miller, and Wolford (2013)

The reproducibility project Attempt to replicate 100 studies published in three leading US psychology journals Psychological Science Journal of Personality and Social Psychology Journal of Experimental Psychology: Learning, Memory, & Cognition Replications conducted in collaboration with original researchers Results 47% of results significant (97% for original results) 39% of experiments replicated Effect sizes on average half of those in original studies Open Science Collaboration (2015)

Effect sizes

Variation in intervention quality

Quality Interventions vary in their Duration Intensity class size reduction by 20%, 30%, or 50% response to intervention Collateral effects assignment of teachers

Variation in variability

Annual growth in achievement, by age A 50% increase in the rate of learning for six-year-olds is equivalent to an effect size of 0.76 A 50% increase in the rate of learning for 15-year-olds is equivalent to an effect size of 0.1 Bloom, Hill, Black, and Lipsey (2008)

Variation in variability Studies with younger children will produce larger effect size estimates Studies with restricted populations (e.g., children with special needs, gifted students) will produce larger effect size estimates

Selection of studies

Feedback in STEM subjects Review of 9000 papers on feedback in mathematics, science and technology Only 238 papers retained Background papers 24 Descriptive papers 79 Qualitative papers 24 Quantitative papers 111 Mathematics 60 Science 35 Technology 16 Ruiz-Primo and Li (2013)

Main findings Characteristic of studies included Math Science Feedback treatment is a single event lasting minutes 85% 72% Reliability of outcome measures 39% 63% Validity of outcome measures 24% 3% Dealing only or mainly with declarative knowledge 12% 36% Schematic knowledge (e.g., knowing why) 9% 0% Multiple feedback events in a week 14% 17%

Sensitivity to instruction

Sensitivity of outcome measures Distance of assessment from the curriculum Immediate e.g., science journals, notebooks, and classroom tests Close e.g., where an immediate assessment asked about number of pendulum swings in 15 seconds, a close assessment asks about the time taken for 10 swings Proximal e.g., if an immediate assessment asked students to construct boats out of paper cups, the proximal assessment would ask for an explanation of what makes bottles float Distal e.g., where the assessment task is sampled from a different domain and where the problem, procedures, materials and measurement methods differed from those used in the original activities Remote standardized national achievement tests. Ruiz-Primo, Shavelson, Hamilton, and Klein (2002)

Impact of sensitivity to instruction Effect size Close Proximal

Meta-analysis in education Some problems are unavoidable: Selection of studies Sensitivity to instruction Some problems are avoidable: File-drawer problem Variation in quality Variation in variability Unfortunately, most of the people doing meta-analysis in education: don’t discuss the unavoidable problems, and don’t avoid the avoidable ones

Effects of feedback Kluger and DeNisi (1996) review of 3000 research reports Excluding those: without adequate controls with poor design with fewer than 10 participants where performance was not measured without details of effect sizes left 131 reports, 607 effect sizes, involving 23,663 observations on 12,652 individuals On average, feedback increases achievement Effect sizes highly variable 38% (231 out of 607) of effect sizes were negative

Getting feedback right is hard Response type Feedback indicates performance… falls short of goal exceeds goal Change behavior Increase effort Exert less effort Change goal Reduce aspiration Increase aspiration Abandon goal Decide goal is too hard Decide goal is too easy Reject feedback Feedback is ignored Kluger and DeNisi (1996)

Limitations of educational research The naturalistic fallacy: “You can’t deduce an ‘ought’ from an ‘is’” (Hume, 1739) Educational research can only tell you what was, not what might be Ability grouping Homework Class-size reduction Teachers’ aides Teacher incentives

Limitations of educational research In education, “What works?” is rarely the right question, because everything works somewhere, and nothing works everywhere, which is why in education, the right question is, “Under what conditions does this work?” All teachers and leaders need to be critical consumers of educational research

Looking at educational research The results may not be reproducible May not be applicable to your context May solve a problem you don’t have May require resources you don’t have

What should educational research do?

What should educational research do? The purpose of educational research is the pursuit of knowledge that causes real, lasting changes not only in the way people think about learning and teaching, but also in how they act (Wiliam & Lester, 2008)

Approaches to research (Shotter, 1993) (Monological) scientific rationalism True knowledge begins in doubt and distrust Proper knowledge is found by following rules Knowledge is a possession and an individual knower is in an ownership relation to that knowledge In justifying claims to knowledge there can be no appeal other than to reason (Dialogical) communicative rationalism The world can be studied only from involvement in it Knowledge of world is practical-moral knowledge We embody, rather than own knowledge

The knowledge-creating spiral Nonaka and Takeuchi (1995)

So, where should we focus our efforts? Intervention Plausible theory of action? Evidence of actual impact? Evidence likely in the future? Brain gym No Intervention Plausible theory of action? Evidence of actual impact? Evidence likely in the future? Brain gym No Learning styles Yes Intervention Plausible theory of action? Evidence of actual impact? Evidence likely in the future? Brain gym No Learning styles Yes Lesson study Maybe Intervention Plausible theory of action? Evidence of actual impact? Evidence likely in the future? Brain gym No Learning styles Yes Lesson study Maybe Educational neuroscience Intervention Plausible theory of action? Evidence of actual impact? Evidence likely in the future? Brain gym No Learning styles Yes Lesson study Maybe Educational neuroscience Formative assessment

Combining scientific and communicative rationalism: A case study

Classroom formative assessment Background Reviews of research on formative assessment Fuchs and Fuchs (1986) Natriello (1987) Crooks (1988) Black and Wiliam (1998a) Dissemination in professional journals Black and Wiliam (1998b) Field experiments in implementation

The KMOFAP* project 24 (then 36, then 48) teachers working on developing their formative assessment practice Regular meetings to discuss progress “Polyexperiment” design Each teacher chose which class to work with Each teacher suggested best possible comparison group Effect size calculated for each teacher “Jack-knife” estimate of overall effect size *King’s-Medway-Oxfordshire Formative Assessment Project

Publication strategy Technical outputs Professional outputs Empirical results Wiliam, Lee, Harrison and Black (2004) Theoretical frameworks Wiliam and Thompson (2008) Black and Wiliam (2009) Professional outputs Books Black, Harrison, Lee, Marshall and Wiliam (2003) Journal articles Black, Harrison, Lee, Marshall and Wiliam (2004) Professional development resources Wiliam and Leahy (2013)

The roles of teachers and researchers The role of teachers All teachers should be seeking to improve their practice through a process of ‘disciplined inquiry,’ focusing on what we already know is effective but are not doing. Some may wish to share their work with others Some may wish to write their work up for publication Some may wish to pursue research degrees Some may even wish to undertake research The role of education researchers Helping teachers, leaders and policymakers identify productive directions for developing practice Working with teachers to make their findings applicable in contexts other than the context of data collection

Thank You www.dylanwiliam.net

References Bennett, C. M., Baird, A. A., Miller, M. B., & Wolford, G. L. (2014). Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: An argument for multiple comparisons correction. Journal of Serendipitous and Unexpected Results, 1(1), 1-5. Black, P., Harrison, C., Lee, C., Marshall, B., & Wiliam, D. (2003). Assessment for learning: Putting it into practice. Buckingham, UK: Open University Press. Black, P., Harrison, C., Lee, C., Marshall, B., & Wiliam, D. (2004). Working inside the black box: assessment for learning in the classroom. Phi Delta Kappan, 86(1), 8-21. Black, P. J., & Wiliam, D. (1998a). Assessment and classroom learning. Assessment in Education: Principles, Policy and Practice, 5(1), 7-74. Black, P. J., & Wiliam, D. (1998b). Inside the black box: raising standards through classroom assessment. Phi Delta Kappan, 80(2), 139-148. Black, P. J., & Wiliam, D. (2009). Developing the theory of formative assessment. Educational Assessment, Evaluation and Accountability, 21(1), 5-31. Bloom, H. S., Hill, C. J., Black, A. R., & Lipsey, M. W. (2008). Performance trajectories and performance gaps as achievement effect-size benchmarks for educational interventions. Journal of Research on Educational Effectiveness, 1(4), 289–328. Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafo, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, advance online publication. doi: 10.1038/nrn3475. Crooks, T. J. (1988). The impact of classroom evaluation practices on students. Review of Educational Research, 58(4), 438-481. Fuchs, L. S., & Fuchs, D. (1986). Effects of systematic formative evaluation: A meta-analysis. Exceptional Children, 53(3), 199-208. Goldacre, B. (2008). Bad science. London, UK: Fourth Estate. Hume, D. (1748). An enquiry concerning human understanding. London, UK: Andrew Millar. Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124 Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: a historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119(2), 254-284. Natriello, G. (1987). The impact of evaluation processes on students. Educational Psychologist, 22(2), 155-175. Nonaka, I., & Takeuchi, H. (1995). The knowledge-creating company: how Japanese companies create the dynamics of innovation. New York, NY: Oxford University Press. Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), 943, aac4716-4711-aac4716-4718. doi: 10.1126/science.aac4716 Ruiz-Primo, M. A., & Li, M. (2013). Examining formative feedback in the classroom context: New research perspectives. In J. H. McMillan (Ed.), Sage handbook of research on classroom assessment (2 ed., pp. 215-232). Thousand Oaks, CA: Sage. Ruiz-Primo, M. A., Shavelson, R. J., Hamilton, L., & Klein, S. (2002). On the evaluation of systemic science education reform: searching for instructional sensitivity. Journal of Research in Science Teaching, 39(5), 369-393. Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105(2), 309-316. doi: 10.1037/0033-2909.105.2.309 XKCD [Munroe, R.] (2011, April 6). Significant. Retrieved December 9, 2014, from http://xkcd.com/882/ Wiliam, D., Lee, C., Harrison, C., & Black, P. J. (2004). Teachers developing assessment for learning: impact on student achievement. Assessment in Education: Principles Policy and Practice, 11(1), 49-65. Wiliam, D., & Thompson, M. (2008). Integrating assessment with instruction: what will it take to make it work? In C. A. Dwyer (Ed.), The future of assessment: shaping teaching and learning (pp. 53-82). Mahwah, NJ: Lawrence Erlbaum Associates. Wiliam, D., & Leahy, S. (2014). Embedding formative assessment professional development pack. West Palm Beach, FL: Learning Sciences International.