California Educational Research Association Appropriate Use of Benchmark Data for Program Evaluation—Going Beyond Raw Scores California Educational Research Association Anaheim, CA December 5, 2013
Appropriate Use of Benchmark Data for Program Evaluation—Going Beyond Raw Scores OBJECTIVES: Comparison of statistically defensible ways of establishing performance level cutoffs Show how simple scaling of the benchmark scores can lead to better use for evaluation purposes
Commonly Used Methods for Setting Cutoffs on District Benchmarks (not recommended): Use default settings on assessment platform (e.g. 20%, 40%, 60%, 80%) Ask curriculum experts for their opinion as to where cutoffs should be set Determine percent correct corresponding to performance levels on CSTs and apply to benchmarks
Statistically Defensible Methods of Establishing Performance Level Bands IRT Scaling—UC Berkeley Bear Center/Ed. Data Systems/EADMS/SchoolCity Equipercentile Equating Method (option in Illuminate and EADMS) Linear Equating Method (sets cutoffs at same z-score as CST cutoffs) Regression (predicts benchmark scores from CST scores or vice versa)
Equipercentile Equating at the Performance Level Cut-points Establishes cutoffs for benchmarks at equivalent local percentile ranks as cutoffs for CSTs Results in better correspondence with CST performance levels By applying same local percentile cutoffs to each within grade benchmark, comparisons across tests within a grade level are more defensible
Equipercentile Comparison to Other Approaches Regression (predicting benchmark cutoff from CST scaled score or vice versa) Z-score (establishing benchmark cutoff from CST cutoff Z-score) 6
40 Item Algebra Benchmark #1 (Frequencies) Actual CST Regression Z-score Equipercentile Far Below Basic 86 64 139 93 Below Basic 299 195 156 276 Basic 259 356 233 246 Proficient 293 380 252 Advanced 62 31 72 7
40 Item Algebra Benchmark #2 (Frequencies) Actual CST Regression Z-score Equipercentile Far Below Basic 93 50 141 105 Below Basic 333 343 287 323 Basic 291 267 201 276 Proficient 243 355 369 245 Advanced 60 5 22 71 8
40 Item Algebra Benchmark #3 (Frequencies) Actual CST Regression Z-score Equipercentile Far Below Basic 75 36 125 64 Below Basic 282 273 211 272 Basic 288 315 253 Proficient 239 321 296 248 Advanced 63 2 62 9
40 Item Algebra Benchmark #4 (Frequencies) Actual CST Regression Z-score Equipercentile Far Below Basic 76 46 106 80 Below Basic 285 258 240 266 Basic 293 337 252 295 Proficient 243 318 333 247 Advanced 64 2 30 73 10
Limitations of Raw Scores Cannot combine across tests Cannot compute gain scores Not equal interval (questionable use of inferential statistics)
Creating Useful Scales for Benchmarks Z-scores & T-scores Normalized Z-scores &T-scores (equal interval) Normal Curve Equivalent (equal interval)
Z-scores & T-scores Z-score=(score - mean)/standard deviation T-score=(Z-score * 10) + 50
Normalized Z-scores & T-scores Step 1 compute percentile rank Step 2convert percentile to normalized Z-score (from table of areas under normal curve) Step 3convert to normalized T-score (optional)
Normal Curve Equivalent (NCE) Step 1 compute normalized Z-score (see prior slide) Step 2convert to NCE with formula: (normalized Z-score * 21.06) + 50
Some Benefits of Scaling Simple Z-score or T-score allows for combining of scores across different tests (e.g. grade levels) Normalized Z-score or T-score (or NCE) may allow for more defensible use of inferential statistical tests (e.g. t-tests, ANOVA, ANCOVA)
Caveats for Use of These Derived Scores Use of scales to compute growth across years is limited to subsets of data within the district - (i.e. scales should be developed with district-wide data and then smaller groups being evaluated can be compared--Why? districtwide data across years will always have equivalent Z-score means and standard deviations (i.e. 0 and 1) Normalized Z-scores, T-scores, and NCEs should be used only when the population can reasonably be assumed to be normal
Master of Arts in Educational Evaluation Co-concentration: School of Social Science, Policy and Evaluation and School of Educational Studies Courses include: Applied research and assessment methods (4 units) Evaluation theory and methods (14 units) Education courses (16 units) Statistical methods (8 units) Electives (8 units) Contact: Dr. Nazanin Zargarpour, Program Director 909-607-1916 cec.zargarpour@cgu.edu
Questions/Comments? Contact: Tom Barrett, Ph.D., President Barrett Enterprises LLC 951-905-5367 (office) 951-237-9452 (cell) www.BarrettEnterprisesLLC.com