Presentation is loading. Please wait.

Presentation is loading. Please wait.

Do the Scores Mean the Same Thing If We Use the Computer? Randy Bennett

Similar presentations


Presentation on theme: "Do the Scores Mean the Same Thing If We Use the Computer? Randy Bennett"— Presentation transcript:

1 Do the Scores Mean the Same Thing If We Use the Computer? Randy Bennett rbennett@ets.org

2 Chapter I: Online testing in the states II: What is “comparability?” III: Comparability of delivery modes IV: What can we do? V: Conclusion

3 The States and Online Testing Source: Education Week survey of state technology contacts, Technology Counts, 2004

4 Generalizations from the State Initiatives State efforts: Are being pursued at multiple grade levels, in all key content areas, and for a variety of populations Involve both low- and high-stakes assessments Vary widely in progress and target implementation dates Initially use multiple-choice items almost exclusively Are in some cases an explicit part of an integrated plan

5 Reasons for Delivering Tests Online Speed of scoring and reporting Mass customization Promise of being able to measure things that can’t be measured on paper Eventual reduction in costs

6 Major Issues Near-term cost Timelines Equipment, software, and network availability and dependability Security Measurement and fairness

7 Chapter I: Online testing in the states II: What is “comparability?” III: Comparability of delivery modes IV: What can we do? V: Conclusion

8 What is “Comparability?” Definition Commonality of score meaning across testing “conditions” Scores are comparable when they can be used interchangeably

9 What Testing Conditions Might Affect Comparability? Delivery modes Computer platforms CR scoring Under different presentation methods By different processing mechanisms

10 What is “Comparability?” Criteria Highly similar rank-ordering of individuals across conditions Highly similar score distributions across conditions

11 When is Comparability Important? When scores need to have common meaning with respect to: One another Some reference group A content standard If scores are not comparable across “conditions,” then decisions may be wrong

12 What Kinds of Decisions Could Be Wrong? Wrong decisions could be made: About individuals or groups In high- or low-stakes situations Examples Promotion or graduation Diagnosis or learning progress School effectiveness Group proficiency

13 Chapter I: Online testing in the states II: What is “comparability?” III: Comparability of delivery modes IV: What can we do? V: Conclusion

14 Comparability of Delivery Modes Do the scores from paper and computer tests mean the same thing? Differences in presentation characteristics Differences in response requirements Differences in general administration characteristics

15 Comparability of Delivery Modes Most states need to deliver in both modes Not all schools have enough computers Some students don’t have computer skills

16 Research on Comparability of Delivery Modes Among Adults (mostly) Mead & Drasgow (1993) Meta-analysis of studies that compared paper and computer versions of the same tests with respect to: Rank ordering of individuals The difference in mean scores

17 Research on Comparability of Delivery Modes Mead & Drasgow (1993) Across 159 correlations, found values of:.97 for timed power tests.72 for speeded tests For the timed power tests, the standardized mean difference was -.03

18 Research on Comparability of Delivery Modes Gallagher, Bridgeman, & Cahalan (2000) Does delivery mode differentially affect particular groups? Analyzed data from the GRE, GMAT, SAT I, Praxis, and TOEFL Found that delivery mode consistently changed the size of the differences between some groups, but only by small amounts

19 Large-Sample Studies in Mathematics K-12 students score higher on P&P than online versions Choi & Tinkler (2002): Grade 3 Coon, McLeod, & Thissen (2002): Grade 5 Ito & Sykes (2004): Grades 4-12 Davis & Gardner (2004): Grade 10 Sandene, Bennett, Braswell, & Oranje (in press) No difference between modes Poggio et al. (2004): Grade 7

20 NAEP Math Online Study 8 th grade students scored higher on P&P than online versions In general, there was no differential impact of mode for population groups Computer facility predicted online test score

21 Large-Sample Studies in Reading and Verbal Skills K-12 students score higher on P&P than online versions of MC tests Choi & Tinkler (2002): Grades 3 and 10 Coon, McLeod, & Thissen (2002): Grade 3 Ito & Sykes (2004): Grades 4-12 Davis & Gardner (2004): Grade 10 No difference or higher on online than P&P versions of MC tests Pommerich (2004): Grades 11-12

22 Research on Comparability of Delivery Modes Constructed response Uncommon in online state assessment Should produce larger mode effects than MC items CR items require more responding More responding on computer suggests need for greater technology skill

23 Large-Sample Studies in Writing Adults score higher on P&P than online versions of essay tests of writing skill Bridgeman & Cooper (1998): GMAT Yu et al. (2004): Praxis Wolfe et al. (2004): TOEFL

24 NAEP Writing Online Study No significant difference in mean scores for 8 th grade students between P&P and online versions In general, there was no differential impact of mode for population groups Computer facility predicted online test score

25 Chapter I: Online testing in the states II: What is “comparability?” III: Comparability of delivery modes IV: What can we do? V: Conclusion

26 What Can We Do? General steps Increase research efforts to identify likely sources of irrelevant score variation Publish the results in high-quality assessment journals Allows results to be: Vetted through peer review Challenged in a rejoinder Disseminated to the field

27 What Can We Do? Comparability of delivery modes If the rank orders of scores are very high but the distributions don’t match, consider equating scores If equating is not possible: Use two score scales Separate cut-scores and/or norms Use one delivery mode

28 Chapter I: Online testing in the states II: What is “comparability?” III: Comparability of delivery modes IV: What can we do? V: Conclusion

29 Conclusion Score comparability may be affected by variation in delivery mode Such effects may have undesirable consequences for institutions and for individuals Agencies should: Increase efforts to study the impact of variation in delivery Take steps to manage sources of variation found to affect performance

30 Do the Scores Mean the Same Thing If We Use the Computer? Randy Bennett rbennett@ets.org


Download ppt "Do the Scores Mean the Same Thing If We Use the Computer? Randy Bennett"

Similar presentations


Ads by Google