The effect of testing on student achievement:

Slides:

Advertisements

Similar presentations

Briefing: NYU Education Policy Breakfast on Teacher Quality November 4, 2011 Dennis M. Walcott Chancellor NYC Department of Education.

Advertisements

Effect Size and Meta-Analysis

Testing in the classroom: Using tests to promote learning Richard P. Phelps Universidad Finis Terrae, Santiago, Chile January 7, 2014.

Program Evaluation. Program evaluation Methodological techniques of the social sciences social policy public welfare administration.

© 2012, Richard P PHELPSInternational Test Commission, 8th Conference, Amsterdam, July, The effect of testing on student achievement:

Evaluating a Research Report

WP5 - Evaluation. Experiment Random assignment of universities to treatment group and control group Treatment group – import & export of data Control.

The Achievement Benefits of Standardized Testing (c) Richard P. Phelps (c) 2003, by Richard P. Phelps.

Research Problem In one sentence, describe the problem that is the focus of your classroom research project about student learning: That students do not.

Evaluation Results MRI’s Evaluation Activities: Surveys Teacher Beliefs and Practices (pre/post) Annual Participant Questionnaire Data Collection.

CAROLE GALLAGHER, PHD. CCSSO NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 26, 2015 Reporting Assessment Results in Times of Change:

ED 589: Educational Research Methods Quantitative Research Elements.

Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.

McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.

National Research Conference on Strengthening School- Based Management

Evaluating Impacts of MSP Grants Ellen Bobronnikov January 6, 2009 Common Issues and Potential Solutions.

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.

Evaluation Results MRI’s Evaluation Activities: Surveys Teacher Beliefs and Practices (pre/post) Annual Participant Questionnaire Data Collection.

Chapter 11 Meta-Analysis. Meta-analysis  Quantitative means of reanalyzing the results from a large number of research studies in an attempt to synthesize.

Action Research for School Leaders by Dr. Paul A. Rodríguez.

Research and Evaluation

Chapter 1: Statistics, Data and Statistical Thinking

BED210 Basics of Research S.Y.B.Ed

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.

Writing Survey Questions

Planning my research journey

Do Adoptees Have Lower Self Esteem?

Implementation of Quality indicators for administrative data

Merit & Incentive Pay Based on High Stakes Testing

Research Methodologies

Experienced Faculty Rate Distance Education Most Effective for Achieving Many Student and Administrative Outcomes ePoster Presented Wednesday July 26,

Disseminating Research Findings Shawn A. Lawrence, PhD, LCSW SOW 3401

Chapter 4 Nonexperimental Methods

Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides

Assessing and summarizing research

A Meta Analysis of the Impact of SBI on Healthcare Utilization

Evaluation of Information Literacy Education

AN INTRODUCTION TO EDUCATIONAL RESEARCH.

Chapter Statistics and Probability

Research Designs Social Sciences.

Program Evaluation Essentials-- Part 2

Conducting Research in the Social Sciences

Office of Education Improvement and Innovation

Defining and Collecting Data

Research Methods Opinion Polls 11/13/2018

Chapter Eight: Quantitative Methods

Causal Analysis in Qualitative Inquiry

2018 Surveys and Evaluations

Variables and Measurement (2.1)

The effect of standardized testing on student achievement: Meta-analyses and summary of the research Richard P. PHELPS International Test Commission,

Timeline for STAAR EOC Standard Setting Process

How did we come to know … Different sources of knowledge: Experience

To Survey or Not To Survey? That is the Question

Theoretical Perspectives

A Meta Analysis of the Impact of SBI on Healthcare Utilization

Secondary Data Analysis Lec 10

Doing Educational Research By: T

Worse than Plagiarism? Firstness Claims & Dismissive Reviews

Experimental Software Engineering (ESE)

Unit 5. Qualitative and quantitative information, data and research

Developing and using questionnaires

Defining and Collecting Data

Unit 2 – Methods Objective 1 Describe quantitative and qualitative methods such as surveys, polls, and statistics used in sociological research. Objective.

Meta-analysis, systematic reviews and research syntheses

2.7 Annex 3 – Quality reports

Defining and Collecting Data

Doing Research in Applied Linguistics April 22, 2011

STEPS Site Report.

Chapter 1: Statistics, Data and Statistical Thinking

Defining and Collecting Data

Presentation transcript:

The effect of testing on student achievement: 1910-2010 Richard P. PHELPS © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

International Test Commission, 8th Conference, Amsterdam, July, 2012 Meta-analysis A method for summarizing a large research literature, with a single, comparable measure. © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

The effect of testing on student achievement 12-year long study analyzed close to 700 separate studies, and more than 1,600 separate effects 2,000 other studies were reviewed and found incomplete or inappropriate lacking sufficient time and money, hundreds of other studies will not be reviewed © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Looking for studies to include in the meta-analyses Included only those studies that found an effect from testing on student achievement or on teacher instruction… © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Studies included in the meta-analyses …when: a test is newly introduced, or newly removed quantity of testing is increased or reduced test stakes are introduced or increased, or removed or reduced © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

International Test Commission, 8th Conference, Amsterdam, July, 2012 Studies included in the meta-analyses 3. …plus previous research summaries (e.g.) Kulik, Kulik, Bangert-Drowns, & Schwalb (1983-1991) on: mastery testing, frequency of testing, and programs for high-risk university students Basol & Johanson (2009) on testing frequency Jaekyung Lee (2007) on cross-state studies W.J. Haynie (2007) in career-tech ed © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Number of studies of effects, by methodology type Number of effects Quantitative 177 640 Surveys and public opinion polls (US & Canada) 247 813 Qualitative 245 TOTAL 669 1698 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

International Test Commission, 8th Conference, Amsterdam, July, 2012 Effect size: Cohen’s d d = (YE - YC) / Spool YE = mean, experimental group YC = mean, control group Spooled = standard deviation © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Effect size: Other formulae d = t*((n1+n2/n1*n2)^0.5 d = 2r/(1-r²)^0.5 d = (YE pre-YE post-YC pre+ YC post)/Spooled post © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Effect size: Interpretation d between 0.25 & 0.50  weak effect d between 0.50 et 0.75  medium effect d more than 0.75  strong effect © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Quantitative studies (population coverage ≈ 7 million persons) © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Quantitative studies: Effect size “Bare bones” calculation: d ≈ +0.55 …a medium effect Bare bones effect size adjusted for measurement error d ≈ +0.71 …a stronger effect Using same-study-author aggregation d ≈ +0.88 …a strong effect © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Which predictors matter? Treatment Group… Mean Effect Size …is made aware of performance, and control group is not +0.98 …receives targeted instruction (e.g., remediation) +0.96 …is tested with higher stakes than control group +0.87 …is tested more frequently than control group +0.85 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

More Moderators – Source of Test Number of Studies Mean Effect Size Researcher or Teacher 87 0.93 National 24 0.87 Commercial 38 0.82 State or District 11 0.72 Total 160 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

More Moderators – Sponsor of Test Number of Studies Mean Effect Size International 5 1.02 Local 99 0.93 National 45 0.81 State 11 0.64 Total 160 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

More Moderators - Study Design Number of Studies Mean Effect Size Pre-post 12 0.97 Experiment, Quasi-experiment 107 0.94 Multivariate 26 0.80 Experiment, posttest only 7 0.60 Pre-post (with shadow test) 8 0.58 Total 160 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

More Moderators – Scale of Analysis Number of Studies Mean Effect Size Aggregated 9 1.60 Small-scale 118 0.91 Large-scale 33 0.57 Total 160 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

More Moderators – Scale of Administration Number of Studies Mean Effect Size Classroom 115 0.95 Mid-scale 6 0.72 Large-scale 39 0.71 Total 160 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Surveys and opinion polls © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Percentage of survey items, by respondent group and type of survey © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Number and percent of survey items, by test stakes and target group % Target group High 507 62 Students 393 46 Medium 184 23 Schools 281 33 Low 4 Teachers 116 14 Unknown 89 11 No stakes 64 7 TOTAL 813 854 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

International Test Commission, 8th Conference, Amsterdam, July, 2012 Opinion polls, by year 244 between 1958--2008, in the U.S. & Canada 813 unique question-response combinations close to 700,000 individual respondents © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

International Test Commission, 8th Conference, Amsterdam, July, 2012 Surveys and opinion polls: Regular standardized tests, performance tests Regular tests (N ≈125) Performance tests (N ≈ 50) Respondent opinion d Achievement is increased 1.2 1.0 …weighted by size of study population 1.9 0.5 Instruction is improved 1.4 0.9 Tests help align instruction © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Qualitative studies: Summary (One cannot calculate an effect size.) © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Qualitative studies, by methodology type Number of studies % Case study 120 43 Experiment or pre-post study 21 7 Interviews (individual or group) 75 27 Journal 2 1 Review of official records, documents, reports 33 12 Research review 8 3 Survey 22 TOTAL 281 100 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Qualitative studies: Effect on student achievement 244 studies conducted in the past century in over 30 countries Direction of effect Number of studies Percent of studies Percent without the inferred Positive 204 84 93 Positive inferred 24 10 Mixed 5 2 No change 8 3 4 Negative 1 TOTAL 244 100 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

International Test Commission, 8th Conference, Amsterdam, July, 2012 Qualitative studies: Testing improves student achievement and teacher instruction Achievement is improved Number of studies % Yes 200 95 Mixed results 1 <1 No 10 5 TOTAL 211 100 Instruction is improved Number of studies % Yes 158 96 No 7 4 TOTAL 165 100 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Qualitative studies: Variation by rigor and test stakes Direction of effect Level of rigor Total high medium low Positive 95 67 42 204 Positive inferred 10 8 6 24 Mixed 3 1 5 No change 4 Negative TOTAL 113 80 51 244 Direction of effect Stakes Total high medium low unknown Positive 133 27 38 6 204 Positive inferred 12 5 7 24 Mixed 4 1 No change 2 8 Negative 3 TOTAL 154 33 51 244 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Qualitative studies: Regular standardized tests and performance tests Regular tests (N =176) Performance tests (N = 69) Study results % Generally positive 93 95 High-stakes tests 71 42 High level of study rigor 46 48 Student attitudes toward test positive 60 Teacher attitudes toward test positive 55 80 Student achievement improved Instruction improved 92 100 Large-scale testing 86 68 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

An enormous research literature But, assertions that it does not exist at all are common Some claims are made by those who oppose standardized testing, and may be wishful thinking Others are “firstness” claims © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Dismissive research reviews With a dismissive research literature review, a researcher assures all that no other researcher has studied the same topic © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

International Test Commission, 8th Conference, Amsterdam, July, 2012 Firstness claims With a firstness claim, a researcher insists that he or she is the first to ever study a topic © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

Social costs are enormous Research conducted by those without power or celebrity is dismissed -- ignored and lost Public policies are skewed, based exclusively on the research results of those with power or celebrity Society pays again and again for research that has already been done © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012

The effect of testing on student achievement: 1910-2010 Richard P. PHELPS © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012