Download presentation
Presentation is loading. Please wait.
Published byRosalyn Armstrong Modified over 9 years ago
1
Measured Progress ©2012 Combined Human and Automated Scoring of Writing Stuart Kahl Measured Progress
2
Measured Progress ©2012 The Challenges of Performance Tasks
3
Measured Progress ©2012 Contention The number of independent score points should reflect the amount of evidence.
4
Measured Progress ©2012 1. Second reads (double scoring) 2. Additional independent scores 3. Both Uses of Automated Essay Scoring
5
Measured Progress ©2012 1.Are a couple human scores plus computer- generated trait scores “better than” many human analytic scores? 2. When humans focus on fewer traits, are their agreement rates higher? Questions
6
Measured Progress ©2012 From Grade 8 NECAP * 1 long and 1 short essay (based on 2 common prompts) and 10 MC responses from each of 1694 students. From Grade 11 NECAP * 2 long essays (based on 2 common prompts) from each of 590 students. The Student Work We Used
7
Measured Progress ©2012 MP/Gates human – 1 holistic, 5 traits (organization, support, focus, language, conventions), double- scored MP/Gates human – 1 holistic, 1 trait (support), double-scored Computer-generated trait scores (word choice, mechanics, style, organization, development) NECAP human – 1 holistic, double-scored The Essay Score Data
8
Measured Progress ©2012 Scorer agreement – discrepancy (>1) rates Decision accuracy – estimate of proportion of categorization decisions that would match decisions that would result if scores contained no measurement error Decision consistency – estimate of proportion of categorization decisions that would match decisions based on scores from a parallel form Standard error at cut points Statistics
9
Measured Progress ©2012 MP/Gates Scorer Agreement – # Discrepancies (>1)
10
Measured Progress ©2012 Decision Accuracy (and Consistency) – Grade 8 Score Combination (1 long+1 short essay) Overall Near Proficient vs Proficient MP/Gates human holistic+5 traits.86(.80).93(.91) MP/Gates human holistic+1 trait+ automated 5 traits.84(.77).93(.91) NECAP holistic+MC + automated 5 traits.82(.74).93(.90)
11
Measured Progress ©2012 Decision Accuracy (and Consistency) – Grade 8, continued Score CombinationOverall Near Proficient vs Proficient NECAP holistic+auto. traits (1 long essay w/MC).76(.67).91(.88) NECAP holistic+auto. traits (1 long essay w/o MC).75(.66).91(.88) NECAP holistic+auto. traits (1 long+1 short essay w/MC).82(.74).93(.90) NECAP holistic+auto. traits (1 long+1 short essay w/o MC).81(.73).93(.90)
12
Measured Progress ©2012 Decision Accuracy (and Consistency) – Grade 11 Score Combination (2 long essays) Overall Near Proficient vs Proficient MP/Gates human holistic+5 traits.73(.65).92(.89) MP/Gates human holistic+1 trait+ automated 5 traits.69(.60).90(.86) NECAP holistic+MC + automated 5 traits.63(.54).88(.84)
13
Measured Progress ©2012 Standard Errors at Cuts – Grade 8 Score Combination (1 long+1 short essay) C1C2C3 MP/Gates human holistic +5 traits.871.011.40 MP/Gates human holistic+1 trait + automated 5 traits 1.361.441.57 NECAP holistic+MC + automated 5 traits 1.681.702.04
14
Measured Progress ©2012 Standard Errors at Cuts – Grade 8, continued Score CombinationC1C2C3 NECAP holistic+auto. traits (1 long essay w/MC) 1.841.852.28 NECAP holistic+auto. traits (1 long essay w/o MC) 2.132.042.45 NECAP holistic+auto. traits (1 long+1 short essay w/MC) 1.681.702.04 NECAP holistic+auto. traits (1 long+1 short essay w/o MC) 1.891.802.13
15
Measured Progress ©2012 Standard Errors at Cuts – Grade 11 Score Combination (2 long essays) C1C2C3 MP/Gates human holistic +5 traits 1.091.101.24 MP/Gates human holistic +1 trait + auto. 5 traits 1.39 1.54 NECAP holistic+MC + auto. 5 traits 1.73 1.83
16
Measured Progress ©2012 Primary The approach (human holistic + 5 traits vs human holistic + 1 trait + automated 5 traits) did not make a difference with respect to decision accuracy/consistency, but did with respect to standard error, the first approach associated with lower standard errors. Scorer discrepancy rates were lower when scorers evaluated fewer traits. Secondary The inclusion of MC items with student essays did not make a difference with respect to decision accuracy/consistency, but did reduce standard errors at the cuts. The addition of a second essay both improved decision accuracy/consistency and reduced standard errors at the cuts. Preliminary Findings
17
Measured Progress ©2012 investigate other score combinations relative to the ones we looked at, especially “holistic alone.” understand why approach (the ones investigated) and MC items made no difference with respect to decision accuracy/consistency, but did with respect to standard errors at the cuts. test significance. Still need to:
18
Measured Progress ©2012 Human holistic and limited analytic scores + “trained” automated holistic scores as second read and as check of human scores to determine need for arbitration + “untrained” automated analytic trait scores What Might Be
19
Measured Progress ©2012 P.O. Box 1217, Dover, NH 03821-1217 | Web: measuredprogress.org | Office: 603.749.9102It’s all about student learning. Period.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.