Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measured Progress ©2012 Combined Human and Automated Scoring of Writing Stuart Kahl Measured Progress.

Similar presentations


Presentation on theme: "Measured Progress ©2012 Combined Human and Automated Scoring of Writing Stuart Kahl Measured Progress."— Presentation transcript:

1 Measured Progress ©2012 Combined Human and Automated Scoring of Writing Stuart Kahl Measured Progress

2 Measured Progress ©2012 The Challenges of Performance Tasks

3 Measured Progress ©2012 Contention The number of independent score points should reflect the amount of evidence.

4 Measured Progress ©2012 1. Second reads (double scoring) 2. Additional independent scores 3. Both Uses of Automated Essay Scoring

5 Measured Progress ©2012 1.Are a couple human scores plus computer- generated trait scores “better than” many human analytic scores? 2. When humans focus on fewer traits, are their agreement rates higher? Questions

6 Measured Progress ©2012 From Grade 8 NECAP * 1 long and 1 short essay (based on 2 common prompts) and 10 MC responses from each of 1694 students. From Grade 11 NECAP * 2 long essays (based on 2 common prompts) from each of 590 students. The Student Work We Used

7 Measured Progress ©2012  MP/Gates human – 1 holistic, 5 traits (organization, support, focus, language, conventions), double- scored  MP/Gates human – 1 holistic, 1 trait (support), double-scored  Computer-generated trait scores (word choice, mechanics, style, organization, development)  NECAP human – 1 holistic, double-scored The Essay Score Data

8 Measured Progress ©2012  Scorer agreement – discrepancy (>1) rates  Decision accuracy – estimate of proportion of categorization decisions that would match decisions that would result if scores contained no measurement error  Decision consistency – estimate of proportion of categorization decisions that would match decisions based on scores from a parallel form  Standard error at cut points Statistics

9 Measured Progress ©2012 MP/Gates Scorer Agreement – # Discrepancies (>1)

10 Measured Progress ©2012 Decision Accuracy (and Consistency) – Grade 8 Score Combination (1 long+1 short essay) Overall Near Proficient vs Proficient MP/Gates human holistic+5 traits.86(.80).93(.91) MP/Gates human holistic+1 trait+ automated 5 traits.84(.77).93(.91) NECAP holistic+MC + automated 5 traits.82(.74).93(.90)

11 Measured Progress ©2012 Decision Accuracy (and Consistency) – Grade 8, continued Score CombinationOverall Near Proficient vs Proficient NECAP holistic+auto. traits (1 long essay w/MC).76(.67).91(.88) NECAP holistic+auto. traits (1 long essay w/o MC).75(.66).91(.88) NECAP holistic+auto. traits (1 long+1 short essay w/MC).82(.74).93(.90) NECAP holistic+auto. traits (1 long+1 short essay w/o MC).81(.73).93(.90)

12 Measured Progress ©2012 Decision Accuracy (and Consistency) – Grade 11 Score Combination (2 long essays) Overall Near Proficient vs Proficient MP/Gates human holistic+5 traits.73(.65).92(.89) MP/Gates human holistic+1 trait+ automated 5 traits.69(.60).90(.86) NECAP holistic+MC + automated 5 traits.63(.54).88(.84)

13 Measured Progress ©2012 Standard Errors at Cuts – Grade 8 Score Combination (1 long+1 short essay) C1C2C3 MP/Gates human holistic +5 traits.871.011.40 MP/Gates human holistic+1 trait + automated 5 traits 1.361.441.57 NECAP holistic+MC + automated 5 traits 1.681.702.04

14 Measured Progress ©2012 Standard Errors at Cuts – Grade 8, continued Score CombinationC1C2C3 NECAP holistic+auto. traits (1 long essay w/MC) 1.841.852.28 NECAP holistic+auto. traits (1 long essay w/o MC) 2.132.042.45 NECAP holistic+auto. traits (1 long+1 short essay w/MC) 1.681.702.04 NECAP holistic+auto. traits (1 long+1 short essay w/o MC) 1.891.802.13

15 Measured Progress ©2012 Standard Errors at Cuts – Grade 11 Score Combination (2 long essays) C1C2C3 MP/Gates human holistic +5 traits 1.091.101.24 MP/Gates human holistic +1 trait + auto. 5 traits 1.39 1.54 NECAP holistic+MC + auto. 5 traits 1.73 1.83

16 Measured Progress ©2012  Primary  The approach (human holistic + 5 traits vs human holistic + 1 trait + automated 5 traits) did not make a difference with respect to decision accuracy/consistency, but did with respect to standard error, the first approach associated with lower standard errors.  Scorer discrepancy rates were lower when scorers evaluated fewer traits.  Secondary  The inclusion of MC items with student essays did not make a difference with respect to decision accuracy/consistency, but did reduce standard errors at the cuts.  The addition of a second essay both improved decision accuracy/consistency and reduced standard errors at the cuts. Preliminary Findings

17 Measured Progress ©2012  investigate other score combinations relative to the ones we looked at, especially “holistic alone.”  understand why approach (the ones investigated) and MC items made no difference with respect to decision accuracy/consistency, but did with respect to standard errors at the cuts.  test significance. Still need to:

18 Measured Progress ©2012 Human holistic and limited analytic scores + “trained” automated holistic scores as second read and as check of human scores to determine need for arbitration + “untrained” automated analytic trait scores What Might Be

19 Measured Progress ©2012 P.O. Box 1217, Dover, NH 03821-1217 | Web: measuredprogress.org | Office: 603.749.9102It’s all about student learning. Period.


Download ppt "Measured Progress ©2012 Combined Human and Automated Scoring of Writing Stuart Kahl Measured Progress."

Similar presentations


Ads by Google