Download presentation
Presentation is loading. Please wait.
Published byLeslie Eaton Modified over 9 years ago
1
Do the Scores Mean the Same Thing If We Use the Computer? Randy Bennett rbennett@ets.org
2
Chapter I: Online testing in the states II: What is “comparability?” III: Comparability of delivery modes IV: What can we do? V: Conclusion
3
The States and Online Testing Source: Education Week survey of state technology contacts, Technology Counts, 2004
4
Generalizations from the State Initiatives State efforts: Are being pursued at multiple grade levels, in all key content areas, and for a variety of populations Involve both low- and high-stakes assessments Vary widely in progress and target implementation dates Initially use multiple-choice items almost exclusively Are in some cases an explicit part of an integrated plan
5
Reasons for Delivering Tests Online Speed of scoring and reporting Mass customization Promise of being able to measure things that can’t be measured on paper Eventual reduction in costs
6
Major Issues Near-term cost Timelines Equipment, software, and network availability and dependability Security Measurement and fairness
7
Chapter I: Online testing in the states II: What is “comparability?” III: Comparability of delivery modes IV: What can we do? V: Conclusion
8
What is “Comparability?” Definition Commonality of score meaning across testing “conditions” Scores are comparable when they can be used interchangeably
9
What Testing Conditions Might Affect Comparability? Delivery modes Computer platforms CR scoring Under different presentation methods By different processing mechanisms
10
What is “Comparability?” Criteria Highly similar rank-ordering of individuals across conditions Highly similar score distributions across conditions
11
When is Comparability Important? When scores need to have common meaning with respect to: One another Some reference group A content standard If scores are not comparable across “conditions,” then decisions may be wrong
12
What Kinds of Decisions Could Be Wrong? Wrong decisions could be made: About individuals or groups In high- or low-stakes situations Examples Promotion or graduation Diagnosis or learning progress School effectiveness Group proficiency
13
Chapter I: Online testing in the states II: What is “comparability?” III: Comparability of delivery modes IV: What can we do? V: Conclusion
14
Comparability of Delivery Modes Do the scores from paper and computer tests mean the same thing? Differences in presentation characteristics Differences in response requirements Differences in general administration characteristics
15
Comparability of Delivery Modes Most states need to deliver in both modes Not all schools have enough computers Some students don’t have computer skills
16
Research on Comparability of Delivery Modes Among Adults (mostly) Mead & Drasgow (1993) Meta-analysis of studies that compared paper and computer versions of the same tests with respect to: Rank ordering of individuals The difference in mean scores
17
Research on Comparability of Delivery Modes Mead & Drasgow (1993) Across 159 correlations, found values of:.97 for timed power tests.72 for speeded tests For the timed power tests, the standardized mean difference was -.03
18
Research on Comparability of Delivery Modes Gallagher, Bridgeman, & Cahalan (2000) Does delivery mode differentially affect particular groups? Analyzed data from the GRE, GMAT, SAT I, Praxis, and TOEFL Found that delivery mode consistently changed the size of the differences between some groups, but only by small amounts
19
Large-Sample Studies in Mathematics K-12 students score higher on P&P than online versions Choi & Tinkler (2002): Grade 3 Coon, McLeod, & Thissen (2002): Grade 5 Ito & Sykes (2004): Grades 4-12 Davis & Gardner (2004): Grade 10 Sandene, Bennett, Braswell, & Oranje (in press) No difference between modes Poggio et al. (2004): Grade 7
20
NAEP Math Online Study 8 th grade students scored higher on P&P than online versions In general, there was no differential impact of mode for population groups Computer facility predicted online test score
21
Large-Sample Studies in Reading and Verbal Skills K-12 students score higher on P&P than online versions of MC tests Choi & Tinkler (2002): Grades 3 and 10 Coon, McLeod, & Thissen (2002): Grade 3 Ito & Sykes (2004): Grades 4-12 Davis & Gardner (2004): Grade 10 No difference or higher on online than P&P versions of MC tests Pommerich (2004): Grades 11-12
22
Research on Comparability of Delivery Modes Constructed response Uncommon in online state assessment Should produce larger mode effects than MC items CR items require more responding More responding on computer suggests need for greater technology skill
23
Large-Sample Studies in Writing Adults score higher on P&P than online versions of essay tests of writing skill Bridgeman & Cooper (1998): GMAT Yu et al. (2004): Praxis Wolfe et al. (2004): TOEFL
24
NAEP Writing Online Study No significant difference in mean scores for 8 th grade students between P&P and online versions In general, there was no differential impact of mode for population groups Computer facility predicted online test score
25
Chapter I: Online testing in the states II: What is “comparability?” III: Comparability of delivery modes IV: What can we do? V: Conclusion
26
What Can We Do? General steps Increase research efforts to identify likely sources of irrelevant score variation Publish the results in high-quality assessment journals Allows results to be: Vetted through peer review Challenged in a rejoinder Disseminated to the field
27
What Can We Do? Comparability of delivery modes If the rank orders of scores are very high but the distributions don’t match, consider equating scores If equating is not possible: Use two score scales Separate cut-scores and/or norms Use one delivery mode
28
Chapter I: Online testing in the states II: What is “comparability?” III: Comparability of delivery modes IV: What can we do? V: Conclusion
29
Conclusion Score comparability may be affected by variation in delivery mode Such effects may have undesirable consequences for institutions and for individuals Agencies should: Increase efforts to study the impact of variation in delivery Take steps to manage sources of variation found to affect performance
30
Do the Scores Mean the Same Thing If We Use the Computer? Randy Bennett rbennett@ets.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.