Download presentation
Presentation is loading. Please wait.
1
The effect of testing on student achievement: 1910-2010
Richard P. PHELPS © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
2
International Test Commission, 8th Conference, Amsterdam, July, 2012
Meta-analysis A method for summarizing a large research literature, with a single, comparable measure. © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
3
The effect of testing on student achievement
12-year long study analyzed close to 700 separate studies, and more than 1,600 separate effects 2,000 other studies were reviewed and found incomplete or inappropriate lacking sufficient time and money, hundreds of other studies will not be reviewed © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
4
Looking for studies to include in the meta-analyses
Included only those studies that found an effect from testing on student achievement or on teacher instruction… © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
5
Studies included in the meta-analyses
…when: a test is newly introduced, or newly removed quantity of testing is increased or reduced test stakes are introduced or increased, or removed or reduced © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
6
International Test Commission, 8th Conference, Amsterdam, July, 2012
Studies included in the meta-analyses 3. …plus previous research summaries (e.g.) Kulik, Kulik, Bangert-Drowns, & Schwalb ( ) on: mastery testing, frequency of testing, and programs for high-risk university students Basol & Johanson (2009) on testing frequency Jaekyung Lee (2007) on cross-state studies W.J. Haynie (2007) in career-tech ed © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
7
Number of studies of effects, by methodology type
Number of effects Quantitative 177 640 Surveys and public opinion polls (US & Canada) 247 813 Qualitative 245 TOTAL 669 1698 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
8
International Test Commission, 8th Conference, Amsterdam, July, 2012
Effect size: Cohen’s d d = (YE - YC) / Spool YE = mean, experimental group YC = mean, control group Spooled = standard deviation © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
9
Effect size: Other formulae
d = t*((n1+n2/n1*n2)^0.5 d = 2r/(1-r²)^0.5 d = (YE pre-YE post-YC pre+ YC post)/Spooled post © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
10
Effect size: Interpretation
d between 0.25 & 0.50 weak effect d between 0.50 et medium effect d more than strong effect © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
11
Quantitative studies (population coverage ≈ 7 million persons)
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
12
Quantitative studies: Effect size
“Bare bones” calculation: d ≈ …a medium effect Bare bones effect size adjusted for measurement error d ≈ …a stronger effect Using same-study-author aggregation d ≈ …a strong effect © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
13
Which predictors matter?
Treatment Group… Mean Effect Size …is made aware of performance, and control group is not +0.98 …receives targeted instruction (e.g., remediation) +0.96 …is tested with higher stakes than control group +0.87 …is tested more frequently than control group +0.85 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
14
More Moderators – Source of Test
Number of Studies Mean Effect Size Researcher or Teacher 87 0.93 National 24 0.87 Commercial 38 0.82 State or District 11 0.72 Total 160 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
15
More Moderators – Sponsor of Test
Number of Studies Mean Effect Size International 5 1.02 Local 99 0.93 National 45 0.81 State 11 0.64 Total 160 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
16
More Moderators - Study Design
Number of Studies Mean Effect Size Pre-post 12 0.97 Experiment, Quasi-experiment 107 0.94 Multivariate 26 0.80 Experiment, posttest only 7 0.60 Pre-post (with shadow test) 8 0.58 Total 160 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
17
More Moderators – Scale of Analysis
Number of Studies Mean Effect Size Aggregated 9 1.60 Small-scale 118 0.91 Large-scale 33 0.57 Total 160 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
18
More Moderators – Scale of Administration
Number of Studies Mean Effect Size Classroom 115 0.95 Mid-scale 6 0.72 Large-scale 39 0.71 Total 160 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
19
Surveys and opinion polls
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
20
Percentage of survey items, by respondent group and type of survey
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
21
Number and percent of survey items, by test stakes and target group
% Target group High 507 62 Students 393 46 Medium 184 23 Schools 281 33 Low 4 Teachers 116 14 Unknown 89 11 No stakes 64 7 TOTAL 813 854 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
22
International Test Commission, 8th Conference, Amsterdam, July, 2012
Opinion polls, by year 244 between , in the U.S. & Canada 813 unique question-response combinations close to 700,000 individual respondents © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
23
International Test Commission, 8th Conference, Amsterdam, July, 2012
Surveys and opinion polls: Regular standardized tests, performance tests Regular tests (N ≈125) Performance tests (N ≈ 50) Respondent opinion d Achievement is increased 1.2 1.0 …weighted by size of study population 1.9 0.5 Instruction is improved 1.4 0.9 Tests help align instruction © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
24
Qualitative studies: Summary (One cannot calculate an effect size.)
© 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
25
Qualitative studies, by methodology type
Number of studies % Case study 120 43 Experiment or pre-post study 21 7 Interviews (individual or group) 75 27 Journal 2 1 Review of official records, documents, reports 33 12 Research review 8 3 Survey 22 TOTAL 281 100 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
26
Qualitative studies: Effect on student achievement
244 studies conducted in the past century in over 30 countries Direction of effect Number of studies Percent of studies Percent without the inferred Positive 204 84 93 Positive inferred 24 10 Mixed 5 2 No change 8 3 4 Negative 1 TOTAL 244 100 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
27
International Test Commission, 8th Conference, Amsterdam, July, 2012
Qualitative studies: Testing improves student achievement and teacher instruction Achievement is improved Number of studies % Yes 200 95 Mixed results 1 <1 No 10 5 TOTAL 211 100 Instruction is improved Number of studies % Yes 158 96 No 7 4 TOTAL 165 100 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
28
Qualitative studies: Variation by rigor and test stakes
Direction of effect Level of rigor Total high medium low Positive 95 67 42 204 Positive inferred 10 8 6 24 Mixed 3 1 5 No change 4 Negative TOTAL 113 80 51 244 Direction of effect Stakes Total high medium low unknown Positive 133 27 38 6 204 Positive inferred 12 5 7 24 Mixed 4 1 No change 2 8 Negative 3 TOTAL 154 33 51 244 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
29
Qualitative studies: Regular standardized tests and performance tests
Regular tests (N =176) Performance tests (N = 69) Study results % Generally positive 93 95 High-stakes tests 71 42 High level of study rigor 46 48 Student attitudes toward test positive 60 Teacher attitudes toward test positive 55 80 Student achievement improved Instruction improved 92 100 Large-scale testing 86 68 © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
30
An enormous research literature
But, assertions that it does not exist at all are common Some claims are made by those who oppose standardized testing, and may be wishful thinking Others are “firstness” claims © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
31
Dismissive research reviews
With a dismissive research literature review, a researcher assures all that no other researcher has studied the same topic © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
32
International Test Commission, 8th Conference, Amsterdam, July, 2012
Firstness claims With a firstness claim, a researcher insists that he or she is the first to ever study a topic © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
33
Social costs are enormous
Research conducted by those without power or celebrity is dismissed -- ignored and lost Public policies are skewed, based exclusively on the research results of those with power or celebrity Society pays again and again for research that has already been done © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
34
The effect of testing on student achievement: 1910-2010
Richard P. PHELPS © 2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.