Download presentation
Presentation is loading. Please wait.
Published byScott Chambers Modified over 9 years ago
1
Automated Scoring: Smarter Balanced Studies CCSSO- NCSA San Diego, CA June, 2015
2
Smarter Pilot and Field Test Studies Moved the field forward –Big data sets –Many methods, spectacular researchers Immediate, practical results –Field test improved on pilot findings –We learned a lot about advantages and limitations
3
The Field Test Items in Study 683 English language arts (ELA)/literacy short-text, constructed-response, items –Reading short text - CAT –Writing brief writes - CAT –Research PT questions 238 mathematics short-text, constructed- response items –Includes 40 mathematical reasoning items 66 ELA/literacy essay items.
4
Criteria Quadratic weighted kappa for engine score and human score less than 0.70 Pearson correlation between engine score and human score less than 0.70 Standardized difference between engine score and human score greater than 0.12 in absolute value Degradation in quadratic weighted kappa or correlation from human-human to engine-human >= 0 Standardized difference between engine score and human score for a subgroup greater than 0.10 in absolute value Notable reduction in perfect agreement rates from human-human to engine-human equal to or greater than 0.05
5
Read-Behind Studies Costs limit the number of responses getting a second human read. Can using scoring engines as a second rater improve scoring? Results: Scoring scenarios where an Automated Scoring system serves as a second rater (“read-behind”) behind a human rater produce high quality scores. M-H and H-H results are similar.
6
Targeting Responses for Human Review Can scoring engines detect responses most likely to be rated differently by humans and machines so they can be routed to second raters? Result: Using scoring engines to identify candidates for a second human read yielded major reliability improvements over random assignment of responses.
7
Item Characteristics that Correlate with Agreement for Human and Automated Scoring - ELA Reading short text items –item specific rubrics yield higher reliability than generic rubrics –There was higher agreement when the text was fictional Essays – generic rubrics are associated with higher reliability for the conventions trait –For the other traits, prompt specific engine training is preferred Brief Writes: Significantly higher agreement for narrative stimuli All trends above were true for both human and machine scoring.
8
Item Characteristics that Correlate with Agreement for Human and Automated Scoring Mathematics Using an Automated Scoring system as a read-behind improves score quality, provided non-exact adjudication is used. In mathematics, hand-scoring agreement was statistically significantly higher than the best engine scores. –Mathematics responses could be expressed in a large number of ways. –Student responses tended to be short.
9
Moving forward Summative tests –Use as second rater –target second human reads –Smarter rules allow vendors to use scoring engines, but none are currently doing so Interim –Provide to teachers to score specific tasks Classroom Assessment –Provide to teachers to allow assignment of more writing tasks
10
Policy Issues Resistance to AI use: –The Chinese Room –Threat to training, understanding Inflated expectations lead to disappointment –Doesn’t always work –Requires planning and coordination –Is not cheap
11
Moving forward Platform integration –Current engines use batch or stand- alone processing –Need trained engine apps that work with online delivery engines in real time Item development –Studies gave better info about what kinds of items are likely to succeed –It is desirable to have scoring engine experts involved in task development
12
The Field Test Scoring Study and appendices have been posted to SmarterApp: http://www.smarterapp.org/deployment/FieldTest_Aut omatedScoringResearchStudies.html http://www.smarterapp.org/deployment/FieldTest_Aut omatedScoringResearchStudies.html An updated version of the pilot study is on the Smarter website: http://www.smarterbalanced.org/pub-n- res/pilot-test-automated-scoring-research-studies/http://www.smarterbalanced.org/pub-n- res/pilot-test-automated-scoring-research-studies/ Want Details?
13
Thank you for your attention
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.