California Educational Research Association

Slides:



Advertisements
Similar presentations
Measuring Growth Using the Normal Curve Equivalent
Advertisements

Data Driven Decisions Moving from 3D to D 3. Data Driven Decisions Moving from 3D to D 3 Malcolm Thomas Director, Evaluation Services Escambia School.
Hypothesis Testing with z Tests Chapter 7. The z Table >Benefits of standardization: allowing fair comparisons >z table: provides percentage of scores.
Cutoff Points Patrick Traynor, Ph.D., Director of Assessment and Evaluation Colton Joint Unified School District
1 Effective Use of Benchmark Test and Item Statistics and Considerations When Setting Performance Levels California Educational Research Association Anaheim,
1 Benefits, Drawbacks, and Pitfalls of z-Score Weighting Joel P. Wiesen, Ph.D. 30th Annual IPMAAC Conference Las Vegas,
Chapter 9: The Normal Distribution
Reports and Scores Fen Chou, Ph.D. Louisiana Department of Education August 2006.
Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,
The Normal Curve Z Scores, T Scores, and Skewness.
Z - SCORES standard score: allows comparison of scores from different distributions z-score: standard score measuring in units of standard deviations.
Probability & Using Frequency Distributions Chapters 1 & 6 Homework: Ch 1: 9-12 Ch 6: 1, 2, 3, 8, 9, 14.
The Normal distribution and z-scores:
Introduction to Educational Statistics
Understanding Quick Scores & This Year’s ChangeUnderstanding Quick Scores & This Year’s Change Dr. Nakia TownsDr. Nakia Towns Assistant Commissioner for.
z-Scores What is a z-Score? How Are z-Scores Useful? Distributions of z-Scores Standard Normal Curve.
Chapter 5 DESCRIBING DATA WITH Z-SCORES AND THE NORMAL CURVE.
Chapter 6: Probability.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
12 Ways MAP Data Can Be Used in a School. 12 Ways To Use MAP Data Monitor Academic Growth Using National Norms Identify Individual Reading Pathway using.
Valentine Elementary School San Marino Unified School District Standardized Testing and Reporting (STAR) Spring 2009 California Standards Test.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Basic Statistics Standard Scores and the Normal Distribution.
Institute of Education Sciences (IES) 25 th Annual Management Information Systems Conference (Feb , 2012) Useful and Fair Accountability Data in.
FCAT 2.0 and End-of-Course Assessments 1 Kris Ellington Deputy Commissioner Division of Accountability, Research and Measurement 850/
1 Paul Tuss, Ph.D., Program Manager Sacramento Co. Office of Education August 17, 2009 California’s Integrated Accountability System.
Descriptive Statistics And related matters. Two families of statistics Descriptive statistics – procedures for summarizing, organizing, graphing, and,
Review and Validation of ISAT Performance Levels for 2006 and Beyond MetriTech, Inc. Champaign, IL MetriTech, Inc. Champaign, IL.
Scores & Norms Derived Scores, scales, variability, correlation, & percentiles.
Points in Distributions n Up to now describing distributions n Comparing scores from different distributions l Need to make equivalent comparisons l z.
Diagnostics Mathematics Assessments: Main Ideas  Now typically assess the knowledge and skill on the subsets of the 10 standards specified by the National.
Introduction to GREAT for ELs Office of Student Assessment Wisconsin Department of Public Instruction (608)
Employing Empirical Data in Judgmental Processes Wayne J. Camara National Conference on Student Assessment, San Diego, CA June 23, 2015.
Spring 2012 Testing Results. GRANT API HISTORY
© 2005 McGraw-Hill Ryerson Ltd. 5-1 Statistics A First Course Donald H. Sanders Robert K. Smidt Aminmohamed Adatia Glenn A. Larson.
ELA & Math Scale Scores Steven Katz, Director of State Assessment Dr. Zach Warner, State Psychometrician.
© 2007 Board of Regents of the University of Wisconsin System, on behalf of the WIDA Consortium WIDA Focus on Growth H Gary Cook, Ph.D. WIDA.
NECAP 2007: District Results Office of Research, Assessment, and Evaluation February 25, 2008.
Z-Scores Standardized Scores. Standardizing scores With non-equivalent assessments it is not possible to develop additive summary statistics. –e.g., averaging.
Chapter 6 The Normal Distribution. 2 Chapter 6 The Normal Distribution Major Points Distributions and area Distributions and area The normal distribution.
Scaling and Equating Joe Willhoft Assistant Superintendent of Assessment and Student Information Yoonsun Lee Director of Assessment and Psychometrics Office.
1 Maximizing Predictive Accuracy of District Benchmarks Illuminate Education, Inc. User’s Conference Aliso Viejo, California June 4&5, 2012.
Aligning Assessments to Monitor Growth in Math Achievement: A Validity Study Jack B. Monpas-Huber, Ph.D. Director of Assessment & Student Information Washington.
Chapter 3 Percentiles. Standard Scores A standard score is a score derived from raw data and has a known basis for comparison. A standard score is a score.
Describing a Score’s Position within a Distribution Lesson 5.
Inferential Statistics. Population Curve Mean Mean Group of 30.
Understanding ERB Scores
How the CAP Science and Social Studies Tests Measure Student Growth.
Nonequivalent Groups: Linear Methods Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2 nd ed.). New.
Applied Regression Analysis BUSI 6220
STAR Reading. Purpose Periodic progress monitoring assessment Quick and accurate estimates of reading comprehension Assessment of reading relative to.
1 Testing Various Models in Support of Improving API Scores.
Statistics & Evidence-Based Practice
Standardized Test Reporting
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Hypothesis Testing: One Sample Cases
Comparability of Assessment Results in the Era of Flexibility
Chapter 5  NORMAL DISTRIBUTION AND Z-SCORE
The All-important Placement Cut Scores
NWEA Measures of Academic Progress (MAP)
MANA 4328 Dr. Jeanne Michalski
SCORING TERMINOLOGY USED IN ASSESSMENT
EVAAS Overview.
Introduction to Statistics
15.1 The Role of Statistics in the Research Process
Percentile to Percentages Your goal is to have more gains than losses!
Figure 5-3 The relationship between z-score values and locations in a population distribution. One S.D. Two S.D.
BOY Composite ↔ MOY Composite
Standard Normal Table Area Under the Curve
Presentation transcript:

California Educational Research Association Appropriate Use of Benchmark Data for Program Evaluation—Going Beyond Raw Scores California Educational Research Association Anaheim, CA December 5, 2013

Appropriate Use of Benchmark Data for Program Evaluation—Going Beyond Raw Scores OBJECTIVES: Comparison of statistically defensible ways of establishing performance level cutoffs Show how simple scaling of the benchmark scores can lead to better use for evaluation purposes

Commonly Used Methods for Setting Cutoffs on District Benchmarks (not recommended): Use default settings on assessment platform (e.g. 20%, 40%, 60%, 80%) Ask curriculum experts for their opinion as to where cutoffs should be set Determine percent correct corresponding to performance levels on CSTs and apply to benchmarks

Statistically Defensible Methods of Establishing Performance Level Bands IRT Scaling—UC Berkeley Bear Center/Ed. Data Systems/EADMS/SchoolCity Equipercentile Equating Method (option in Illuminate and EADMS) Linear Equating Method (sets cutoffs at same z-score as CST cutoffs) Regression (predicts benchmark scores from CST scores or vice versa)

Equipercentile Equating at the Performance Level Cut-points Establishes cutoffs for benchmarks at equivalent local percentile ranks as cutoffs for CSTs Results in better correspondence with CST performance levels By applying same local percentile cutoffs to each within grade benchmark, comparisons across tests within a grade level are more defensible

Equipercentile Comparison to Other Approaches Regression (predicting benchmark cutoff from CST scaled score or vice versa) Z-score (establishing benchmark cutoff from CST cutoff Z-score) 6

40 Item Algebra Benchmark #1 (Frequencies) Actual CST Regression Z-score Equipercentile Far Below Basic 86 64 139 93 Below Basic 299 195 156 276 Basic 259 356 233 246 Proficient 293 380 252 Advanced 62 31 72 7

40 Item Algebra Benchmark #2 (Frequencies) Actual CST Regression Z-score Equipercentile Far Below Basic 93 50 141 105 Below Basic 333 343 287 323 Basic 291 267 201 276 Proficient 243 355 369 245 Advanced 60 5 22 71 8

40 Item Algebra Benchmark #3 (Frequencies) Actual CST Regression Z-score Equipercentile Far Below Basic 75 36 125 64 Below Basic 282 273 211 272 Basic 288 315 253 Proficient 239 321 296 248 Advanced 63 2 62 9

40 Item Algebra Benchmark #4 (Frequencies) Actual CST Regression Z-score Equipercentile Far Below Basic 76 46 106 80 Below Basic 285 258 240 266 Basic 293 337 252 295 Proficient 243 318 333 247 Advanced 64 2 30 73 10

Limitations of Raw Scores Cannot combine across tests Cannot compute gain scores Not equal interval (questionable use of inferential statistics)

Creating Useful Scales for Benchmarks Z-scores & T-scores Normalized Z-scores &T-scores (equal interval) Normal Curve Equivalent (equal interval)

Z-scores & T-scores Z-score=(score - mean)/standard deviation T-score=(Z-score * 10) + 50

Normalized Z-scores & T-scores Step 1  compute percentile rank Step 2convert percentile to normalized Z-score (from table of areas under normal curve) Step 3convert to normalized T-score (optional)

Normal Curve Equivalent (NCE) Step 1  compute normalized Z-score (see prior slide) Step 2convert to NCE with formula: (normalized Z-score * 21.06) + 50

Some Benefits of Scaling Simple Z-score or T-score allows for combining of scores across different tests (e.g. grade levels) Normalized Z-score or T-score (or NCE) may allow for more defensible use of inferential statistical tests (e.g. t-tests, ANOVA, ANCOVA)

Caveats for Use of These Derived Scores Use of scales to compute growth across years is limited to subsets of data within the district - (i.e. scales should be developed with district-wide data and then smaller groups being evaluated can be compared--Why? districtwide data across years will always have equivalent Z-score means and standard deviations (i.e. 0 and 1) Normalized Z-scores, T-scores, and NCEs should be used only when the population can reasonably be assumed to be normal

Master of Arts in Educational Evaluation Co-concentration: School of Social Science, Policy and Evaluation and School of Educational Studies Courses include: Applied research and assessment methods (4 units) Evaluation theory and methods (14 units) Education courses (16 units) Statistical methods (8 units) Electives (8 units) Contact: Dr. Nazanin Zargarpour, Program Director 909-607-1916 cec.zargarpour@cgu.edu

Questions/Comments? Contact: Tom Barrett, Ph.D., President Barrett Enterprises LLC 951-905-5367 (office) 951-237-9452 (cell) www.BarrettEnterprisesLLC.com