Copyright © 2007 Pearson Education, inc. or its affiliates. All rights reserved. 1 Evaluation of Distributed Scoring with the Texas Assessment of Knowledge.

Slides:

Advertisements

Similar presentations

WMS-IV Wechsler Memory Scale - Fourth Edition

Advertisements

Second Information Technology in Education Study (SITES) A Project of the International Association for the Evaluation of Educational Achievement (IEA)

Cognitive Modelling – An exemplar-based context model Benjamin Moloney Student No:

Chapter 5 Reliability Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright ©2006.

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.

Part 4 Staffing Activities: Selection

Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.

Chapter 12 Goodness-of-Fit Tests and Contingency Analysis

Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.

Iowa’s Application of Rubrics to Evaluate Screening and Progress Tools John L. Hosp, PhD University of Iowa.

Statistical Issues in Research Planning and Evaluation

Julie Coiro, Ph.D. University of Rhode Island Creating Performance-Based Measures of Online.

Developing Rubrics Presented by Frank H. Osborne, Ph. D. © 2015 EMSE 3123 Math and Science in Education 1.

Measuring Student Learning March 10, 2015 Cathy Sanders Director of Assessment.

Florida Interim Assessment Item Bank and Test Platform (FL IBTP) Scoring Overview.

Making Your Assessments More Meaningful Flex Day 2015.

Chapter 5: Improving and Assessing the Quality of Behavioral Measurement Cooper, Heron, and Heward Applied Behavior Analysis, Second Edition.

4/16/07 Assessment of the Core – Social Inquiry Charlyne L. Walker Director of Educational Research and Evaluation, Arts and Sciences.

Making Numbers Work… NHSAA: Living with the NCLB Act Ann Remus September 21, 2004 To Improve Instruction.

MSc Applied Psychology PYM403 Research Methods Validity and Reliability in Research.

ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.

FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,

CONCEPT OF SELECTION The next step after requirement is the selection of candidates for the vacant position from among the applicants. This is the most.

Classroom Assessment A Practical Guide for Educators by Craig A. Mertler Chapter 9 Subjective Test Items.

National Partnerships Primary Mathematics Specialists Initiative

Region 10 Accountability and Assessment Updates March 07, 2014 Jana Schreiner, Accountability and State Assessment Consultant.

ENHANCE Update Research Underway on the Validity of the Child Outcomes Summary (COS) Process ECO Center Advisory Board Meeting March 8, 2012 Arlington,

Ohio’s Assessment Future The Common Core & Its Impact on Student Assessment Evidence by Jim Lloyd Source doc: The Common Core and the Future of Student.

1 6-8 Smarter Balanced Assessment Update English Language Arts February 2012.

Descriptive and Causal Research Designs

Smarter Balanced Assessment Update English Language Arts February 2012.

Copyright © 2010, 2007, 2004 Pearson Education, Inc Chapter 12 Analysis of Variance 12.2 One-Way ANOVA.

Developing Electronic Marking and Management Solutions Brian Carbarns International Business Development Manager, DRS Developing Electronic Marking and.

ELA & Math Scale Scores Steven Katz, Director of State Assessment Dr. Zach Warner, State Psychometrician.

The Four P’s of an Effective Writing Tool: Personalized Practice with Proven Progress April 30, 2014.

Training Individuals to Implement a Brief Experimental Analysis of Oral Reading Fluency Amber Zank, M.S.E & Michael Axelrod, Ph.D. Human Development Center.

Standard Setting Results for the Oklahoma Alternate Assessment Program Dr. Michael Clark Research Scientist Psychometric & Research Services Pearson State.

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

Qualifications Update: Human Biology Qualifications Update: Human Biology.

VALUE/Multi-State Collaborative (MSC) to Advance Learning Outcomes Assessment Pilot Year Study Findings and Summary These slides summarize results from.

© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.

Introduction Results and Conclusions On counselor background variables, no differences were found between the MH and SA COSPD specialists on race/ethnicity,

1 Scoring Provincial Large-Scale Assessments María Elena Oliveri, University of British Columbia Britta Gundersen-Bryden, British Columbia Ministry of.

Chapter 5 Assessment: Overview INTRODUCTION TO CLINICAL PSYCHOLOGY 2E HUNSLEY & LEE PREPARED BY DR. CATHY CHOVAZ, KING’S COLLEGE, UWO.

Copyright © 2010, SAS Institute Inc. All rights reserved. How Do They Do That? EVAAS and the New Tests October 2013 SAS ® EVAAS ® for K-12.

Developing an evaluation of professional development Webinar #2: Going deeper into planning the design 1.

Overview of Student Learning Objectives (SLOs) for

Quick and flexible access to Stanford 10 results.

RELIABILITY BY DONNA MARGARET. WHAT IS RELIABILITY?  Does this test consistently measure what it’s supposed to measure?  The more similar the scores,

Measured Progress ©2012 Combined Human and Automated Scoring of Writing Stuart Kahl Measured Progress.

Reliability EDUC 307. Reliability  How consistent is our measurement?  the reliability of assessments tells the consistency of observations.  Two or.

Rational Unified Process Fundamentals Module 5: Implementing Rational Unified Process Rational Unified Process Fundamentals Module 5: Implementing Rational.

An Employee-Owned Company © 2013 Edvance Research An Evaluation of Teach For America (TFA) in Texas Schools: Findings, Issues and Challenges Lauren Decker,

Copyright © 2010 Pearson Education, inc. or its affiliates. All rights reserved. Texas Assessment Management System.

Overview of Types of Measures Margaret Kasimatis, PhD VP for Academic Planning & Effectiveness.

1 Innovative Teaching and Learning (ITL) Research Corinne Singleton SRI International.

STAAR Alternate in the Texas TrainingCenter Copyright © 2013 Pearson Education, Inc. or its affiliates. All rights reserved.

Chapter 4 Nonexperimental Methods

EDD 724 Innovative Education- -snaptutorial.com

9-12 Smarter Balanced Assessment Update

Measuring 21st Century Skills in a Common Core Era

Chapter 13 Goodness-of-Fit Tests and Contingency Analysis

The first test of validity

Extra Nomenclature Practice Answers

Deputy Commissioner Jeff Wulfson Associate Commissioner Michol Stapel

Copyright © 2014, 2010, 2007 Pearson Education, Inc.

Chemistry Ch. 10 Review and worksheets

Chapter 13 Goodness-of-Fit Tests and Contingency Analysis

Section 10.5 The Dot Product

Unit 4 Review Answers.

Presentation transcript:

Copyright © 2007 Pearson Education, inc. or its affiliates. All rights reserved. 1 Evaluation of Distributed Scoring with the Texas Assessment of Knowledge and Skills (TAKS) Laurie Laughlin Davis, Leslie Keng, & Shelley Ragland

Copyright © 2007 Pearson Education, inc. or its affiliates. All rights reserved. 2 Overview of Studies Part of a set of 4 research studies designed to iteratively build on and extend the results of the previous studies to inform a decision by TEA to move to distributed scoring –Study 1: 2008 Online Training Demonstration –Study 2: 2009 Online Training Pilot –Study 3: 2008 Distributed Scoring Research –Study 4: 2009 Distributed Scoring Research in TX context

Copyright © 2007 Pearson Education, inc. or its affiliates. All rights reserved. 3 Study Features Use of experienced Texas scorers Use of potential “regional” scorers for both regional and distributed scoring conditions –Mitigates influence of scorer demographic characteristics on study outcome 2009 TAKS Grade 11 ELA essay prompt Two populations of Texas students –Online retesters who type their essays –Paper primary testers who hand write their essays

Copyright © 2007 Pearson Education, inc. or its affiliates. All rights reserved. 4 Study Features 3,999 handwritten primary student essay responses selected to be representative of statewide population 1,291 online retest student essay responses 2 scoring conditions –Online Training with Distributed Scoring –Stand-Up Training with Regional Scoring All essay responses scored through both conditions

Copyright © 2007 Pearson Education, inc. or its affiliates. All rights reserved. 5 Study Features Both scoring conditions replicated (week 1 and week 2) –Week 1 Regional (41 scorers) –Week 1 Distributed (41 scorers) –Week 2 Regional (37 scorers) –Week 2 Distributed (38 scorers) How different are Distributed and Regional Scoring from each other given the normal variation we might see within a scoring condition due to scorer effects, etc?

Copyright © 2007 Pearson Education, inc. or its affiliates. All rights reserved. 6 Study Features Use of operational scoring practices to the extent feasible –157 total scorers –Each essay received scores from at least 2 scorers –Resolution and adjudication readings were applied –Not all scorers scored all essays –However, analytics not conducted Evaluation of impact of scoring model to classification decisions

Copyright © 2007 Pearson Education, inc. or its affiliates. All rights reserved. 7 *Percentages may not sum to 100% due to rounding **Essay scores from study did not include specialists or analytics

Copyright © 2007 Pearson Education, inc. or its affiliates. All rights reserved. 8 *Percentages may not sum to 100% due to rounding **Essay scores from study did not include specialists or analytics

Copyright © 2007 Pearson Education, inc. or its affiliates. All rights reserved. 9

10 *Percentages may not sum to 100% due to rounding **Essay scores from study did not include specialists or analytics

Copyright © 2007 Pearson Education, inc. or its affiliates. All rights reserved. 11 *Percentages may not sum to 100% due to rounding **Essay scores from study did not include specialists or analytics

Copyright © 2007 Pearson Education, inc. or its affiliates. All rights reserved. 12 Conclusions Perfect agreement across distribution of final essay scores for primary testers Near perfect agreement across distribution of final essay scores for retesters 95-98% consistency of student classification at the total test level between regional and distributed scoring in the study

Copyright © 2007 Pearson Education, inc. or its affiliates. All rights reserved. 13 Conclusions This study continues to support the use of distributed scoring These results show a high degree of similarity between regional and distributed scoring methods Similarity of results could be even stronger if analytics implemented –Especially true for retesters where higher percentage of students earn essay scores of ‘1’

Copyright © 2007 Pearson Education, inc. or its affiliates. All rights reserved. 14 Limitations Study was conducted on a sample of student essay papers (for primary testers) Study duration was short relative to operational scoring projects –Some metrics (i.e. inter-rater reliability, Rate of Scoring, etc.) may be artificially suppressed Scoring process during study was similar to operational context, but did not perfectly replicate it. –i.e. did not include specialists or analytics