1 Measure Up! Benchmark Assessment Quality Assurance Process RCAN September 10, 2010
2 Measure Up! Objective To monitor and improve Benchmark Assessments in order to attain the most accurate possible measurement of student achievement with respect to California Content Standards Tests.
Measure Up! Components Content—Structure & Course Guides Predictability – Correlation of Benchmark Exams to CST scores – Association of Benchmark Exams with CST Performance Levels Item Analysis – Difficulty – Discrimination – Representative of CST items Evolving
4 PSUSD Benchmark Structure CST ~153 ID 60 to 75 Items Blueprint Aligned Benchmark #1 ~45 ID ~20-35 Items 1 st 45 ID Paced Standards Partial Match to CST Benchmark #2 ~90 ID ~20-35 Items 2 nd 45 ID Paced Standards Partial Match to CST Benchmark #3 ~135 ID ~20-35 Items 3 rd 45 ID Paced Standards Partial Match to CST
Algebra I (8 th Grade) Aggregation of 3 Benchmarks Algebra I (8 th Grade) CST Predictability
6 Prof = Algebra I BM (8 th Grade) Algebra I CST 2009 (8 th Grade)
7 N = 23 N = 119N = 125 N = 51 N = 5
8
ELA 10 th Grade Aggregation of 3 Benchmarks ELA 10 th Grade CST Predictability
10 ELA 10 th Gd BM ELA 10 th Gd CST 2009 Prof = 53 53
11 N = 26 N = 151 N = 326 N = 249 N = 36
12
US History 11 th Grade Aggregation of 3 Benchmarks US History 11 th Grade CST Predictability
14 r =.46 US His 11 th Gd BM US His 11 th Gd CST 2009 Prof = 40 63
15 N = 68 N = 116 N = 142 N = 72 N = 7
16
Science 8 th Grade Aggregation of 3 Benchmarks Science 8 th Grade CST Predictability
18 r =.77 Science 8 th Gd BM Science 8 th Gd CST 2009 Prof = 39 44
19 N = 84 N = 307 N = 494 N = 275 N = 24
20
Math 6 th Grade Aggregation of 3 Benchmarks Math 6 th Grade CST Predictability
22 r =.84 Math 6 th Gd BM Math 6 th Gd CST 2009 Prof = 43 48
23 N = 164 N = 393 N = 460 N = 224 N = 31
24
ELA 4 th Grade Aggregation of 3 Benchmarks ELA 4 th Grade CST Predictability
26 r =.80 Prof =
27 N = 88 N = 176 N = 334 N = 388 N = 93
28
29 Item Level Analysis
30 Item Difficulty The p value for any item – percentage of correct answers – usually in decimal form Ideally p value range is.30 to.80 for most items For example – p value of.28 = 28% of the test takers got the item right – p value of.75 = 75% of the test takers got the item right – P value of.95 = 95% of the test takers got the item right
31 Item Difficulty Monitoring Item NumberN StudentsStandardp Value 1853NS NS NS MG MG MG PS PS PS PS3.3.95
32 Item Discrimination Is item “discriminating” appropriately between higher & lower scoring students Discrimination Index (DI) = difference between how upper half and lower half of students score on an item DI ranges between -1 and +1 We want items to discriminate positively
33 Item Discrimination Monitoring Item Number N Students Standardp ValueDIMax DI 1853NS NS NS MG MG MG PS PS PS PS
th Grade Math 3 rd Benchmark th Grade Math 3 rd Benchmark Representative of CST Items and our continual Revision Process
35 7NS Item p =.33 DI =.34
36 RTQ for 7NS1.6 CST = 1 RTQ = 1
37 7NS Item p =.42 DI =.31
38 Additionally… Item #3 replaced (as #3) with item modeled after RTQ # 16 (7NS1.7*)—CST = 5, RTQ = 7 Item #15 replaced (as #13) with item modeled after RTQ #47 (7AF1.5) Item #s 23 & 24 replaced (as #s 21 & 22) with items modeled after RTQ #s 89 & 88 (7MG3.4*)
39 Measure Up! Next Steps Benchmark Exam Structure Institutionalize System CAHSEE Writing Prompt Discrimination Distractor Shaping
40 Questions?