Unraveling the Mysteries of Setting Standards and Scaled Scores Julie Miles PhD, 10.31.2013.

Slides:

Advertisements

Similar presentations

Psychometrics to Support RtI Assessment Design Michael C. Rodriguez University of Minnesota February 2010.

Advertisements

Standard Setting.

Unraveling the Mysteries of Setting Standards and Scaled Scores Julie Miles PhD,

Establishing Performance Standards for PARCC Assessments Initial Discussion PARCC Governing Board Meeting April 3,

You can use this presentation to: Gain an overall understanding of the purpose of the revised tool Learn about the changes that have been made Find advice.

What is a CAT?. Introduction COMPUTER ADAPTIVE TEST + performance task.

Christina Schneider and Karla Egan Center for Assessment Developing Target Student Descriptors.

M AKING A PPROPRIATE P ASS- F AIL D ECISIONS D WIGHT H ARLEY, Ph.D. DIVISION OF STUDIES IN MEDICAL EDUCATION UNIVERSITY OF ALBERTA.

Achievement Level Descriptors & College Content-Readiness Webinar 1 November 15, 2012.

STAAR/EOC Overview of Assessment Program HISD Professional Support & Development High School Science Team.

Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.

1 New England Common Assessment Program (NECAP) Setting Performance Standards.

Fall 2014 MAP NWEA Score Comparison Alliance Dr. Olga Mohan High School October 22, 2014.

ALTERNATE/ALTERNATIVE ASSESSMENTS VGLA AND VMAST UPDATES VIRGINIA DEPARTMENT OF EDUCATION AUGUST Regional Administrators Update Training.

MCAS Standards Validation: High School Introductory Physics Sheraton Hotel Braintree, MA September 17-18, 2007.

Preparing for New Test Scores  Smarter Balanced assessments measure the full range of the Common Core State Standards. They are designed to let teachers.

NYS Assessment Updates & Processes for New Social Studies Regents Exams September 18, 2014 Candace Shyer Assistant Commissioner for Assessment, Standards.

Setting Performance Standards Grades 5-7 NJ ASK NJDOE Riverside Publishing May 17, 2006.

COURSE PREVIEW Course Name Course Director: Course Coordinator:

Consistency of Assessment

Chapter 4 Validity.

VERTICAL SCALING H. Jane Rogers Neag School of Education University of Connecticut Presentation to the TNE Assessment Committee, October 30, 2006.

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Standard Setting Inclusive Assessment Seminar Marianne.

Staar Trek The Next Generation STAAR Trek: The Next Generation Performance Standards.

New York State Education Department Understanding The Process: Science Assessments and the New York State Learning Standards.

Prototypical Level 4 Performances Students use a compensation strategy, recognizing the fact that 87 is two less than 89, which means that the addend coupled.

Interpreting Assessment Results using Benchmarks Program Information & Improvement Service Mohawk Regional Information Center Madison-Oneida BOCES.

Out with the Old, In with the New: NYS Assessments “Primer” Basics to Keep in Mind & Strategies to Enhance Student Achievement Maria Fallacaro, MORIC

Benchmark Data. World History Average Score: 56% Alliance: 96%

Overview of Standard Setting Leslie Wilson Assistant State Superintendent Accountability and Assessment August 26, 2008.

Setting Performance Standards for the Hawaii State Alternate Assessments: Reading, Mathematics, and Science Presentation for the Hawaii State Board of.

1 New England Common Assessment Program (NECAP) Setting Performance Standards.

 Closing the loop: Providing test developers with performance level descriptors so standard setters can do their job Amanda A. Wolkowitz Alpine Testing.

Writing Modified Achievement Level Descriptors Presented at OSEP Conference January 16, 2008 by Marianne Perie Center for Assessment.

An Overview of Virginia Standards of Learning Item and Test Development.

Fall 2010 Mathematics Standards of Learning Assessment Update - 1 -

Setting Cut Scores on Alaska Measures of Progress Presentation to Alaska Superintendents Marianne Perie, AAI July 27, 2015.

ELA & Math Scale Scores Steven Katz, Director of State Assessment Dr. Zach Warner, State Psychometrician.

Guide to Test Interpretation Using DC CAS Score Reports to Guide Decisions and Planning District of Columbia Office of the State Superintendent of Education.

Classroom Diagnostic Tools. Pre-Formative Assessment of Current CDT Knowledge.

Standard Setting Results for the Oklahoma Alternate Assessment Program Dr. Michael Clark Research Scientist Psychometric & Research Services Pearson State.

Grading and Analysis Report For Clinical Portfolio 1.

0 PARCC Performance Level Setting Place your logo here.

Scaling and Equating Joe Willhoft Assistant Superintendent of Assessment and Student Information Yoonsun Lee Director of Assessment and Psychometrics Office.

Using the Many-Faceted Rasch Model to Evaluate Standard Setting Judgments: An IllustrationWith the Advanced Placement Environmental Science Exam Pamela.

AzMERIT Debuts Joe O’Reilly, Ph.D. Mesa Public Schools.

ASSESSMENT TOOLS DEVELOPMENT: RUBRICS Marcia Torgrude

2009 Report Card and TVAAS Update Recalibration 2009 October 26, 2009.

Understanding AzMERIT Results and Score Reporting An Overview.

Assessment at CPS A new way of working. Background - No more levels New National Curriculum to be taught in all schools from September 2014 (apart from.

PARCC Field Test Study Comparability of High School Mathematics End-of- Course Assessments National Conference on Student Assessment San Diego June 2015.

Vertical Articulation Reality Orientation (Achieving Coherence in a Less-Than-Coherent World) NCSA June 25, 2014 Deb Lindsey, Director of State Assessment.

NAEP Achievement Levels Michael Ward, Chair of COSDAM Susan Loomis, Assistant Director NAGB Christina Peterson, Project Director ACT.

How was LAA 2 developed?  Committee of Louisiana educators (general ed and special ed) Two meetings (July and August 2005) Facilitated by contractor.

Development of “College Path” Cut Scores for the Virginia Assessment Program Shelley Loving-Ryder Assistant Superintendent for Student Assessment and School.

Proposed End-of-Course (EOC) Cut Scores for the Spring 2015 Test Administration Presentation to the Nevada State Board of Education March 17, 2016.

Presentation to the Nevada Council to Establish Academic Standards Proposed Math I and Math II End of Course Cut Scores December 22, 2015 Carson City,

Review of Cut Scores and Conversion Tables (Angoff Method)

IRT Equating Kolen & Brennan, 2004 & 2014 EPSY

How the CAP Science and Social Studies Tests Measure Student Growth.

End of KS2 Tests “Show off week”.

What is a CAT? What is a CAT?.

Assessments for Monitoring and Improving the Quality of Education

Next-Generation MCAS: Update and review of standard setting

Session 4 Objectives Participants will:

Updates on the Next-Generation MCAS

Standard Setting for NGSS

OREGON’S STANDARDS SETTING PROCESS

Welcome Reporting: Individual Student Report (ISR), Student Roster Report, and District Summary of Schools Report Welcome to the Reporting: Individual.

Virginia Board of Education’s

Presentation transcript:

Unraveling the Mysteries of Setting Standards and Scaled Scores Julie Miles PhD,

Unravelling the Mysteries...3 Overview of the Session 1. What is Standard Setting? – Basic Vocabulary – Definition – Performance Level Descriptions – Threshold Descriptions – When Does It Occur? – Methods Used in Virginia 2. The Connection to Scaled Scores – Converting Raw Scores to Scaled Scores – Example Conversion 3. From Scaled Scores to Equated Forms – How Are Scaled Scores Connected to Equating? – The Basics of Equating – Recap of How It All Comes Together

Unravelling the Mysteries...4 What Is Standard Setting? 1

Unravelling the Mysteries...5 What is Standard Setting? Basic Vocabulary Content Standards: the content and skills that students are expected to know and be able to do. Performance Levels (Achievement Levels, Performance Categories): Labels for levels of student achievement (e.g., below basic, basic, proficient and advanced). Performance Level Descriptors (PLDs): Descriptions of the competencies associated with each level of achievement. Cut Scores (Performance Standards): Scores on an assessment that separate one level of achievement from another.

Unravelling the Mysteries...6 What is Standard Setting? Definition A judgmental process which has a variety of steps and includes relevant stakeholders throughout. Steps in this process typically include: 1. Identifying the relevant knowledge and skills to be taught and assessed at each grade/content area to support the goals of the state 2. Defining the expectations associated with each Performance Level 3. Convening a committee of educators to provide content-based recommendations for cut scores at each grade or subject area 4. Review of cut score recommendations and adoption by the State Board of Education

Unravelling the Mysteries...7 What is Standard Setting? Performance Level Descriptors (PLDs) Define the knowledge, skills, and abilities (KSAs) that are expected of the students to gain entry into specific performance levels (e.g., Proficient or Advanced) The main goal of standard setting is to quantify or operationalize the Performance Level Descriptors. EXAMPLE Proficient PLD: Explain the role of geography in the political, cultural, and economic development of Virginia and the United States

Unravelling the Mysteries...8 What is Standard Setting? Threshold Descriptions (TDs) Define what students who are “just over the threshold” in a performance level (e.g., a student scoring a 400 or 401 or 500 or 501) should be able to demonstrate in terms of KSAs. These are the borderline or minimally qualified students in terms of performance EXAMPLE Proficient PLD: Explain the role of geography in the political, cultural, and economic development of Virginia and the United States EXAMPLE “Just-Barely” Proficient TD: Identify and explain major geographic features on maps. Interpret charts based on background geographic information.

Unravelling the Mysteries...9 What is Standard Setting? When Does It Occur?

Unravelling the Mysteries...10 What is Standard Setting? Methods Used in Virginia Virginia predominantly uses “Modified Angoff” (SOL and VMAST),“Body of Work” (VAAP), and “Reasoned Judgment” (VGLA) methods. All methods typically have similar components: 1. Overview of standard setting 2. Review of test blueprint and performance level descriptions 3. Creation of the threshold descriptions 4. Overview of actual test administered to students 5. Three rounds of judgments by committee: MC Tests: should a ‘just-barely’ student get the item correct 2 out of 3 times? VGLA: how many points should a ‘just-barely’ student earn on this SOL? VAAP: which performance level does a COE represents? 6. Final round results in cut score recommendations that are provided to the SBOE. The number of correct answers needed to gain entry into each performance level.

Unravelling the Mysteries...11 The Connection to Scaled Scores 2

Unravelling the Mysteries...12 The Connection to Scaled Scores Converting Raw Scores to Scaled Scores The recommendations for a cut score from standard setting are in a raw score metric. But this is not helpful from year-to-year. Student ability is different from student to student Test forms change from year-to-year (and within year) – A raw score of 36 on a slightly easier test does not indicate the same level of achievement as a raw score of 36 on a slightly more difficult test. Need a metric that is stable from year-to-year! This is where my team earns their keep The metric is based on item response theory (IRT) and it is called “theta.” This theta value (associated with raw score) is converted to a scaled score that remains stable from year-to-year so that 400 is comparable to 400 regardless of the student, year, or form.

Unravelling the Mysteries...13 The Connection to Scaled Scores Example Conversion to Scaled Scores Algebra II where θ a is the value of theta (2.616) corresponding to the raw score (45) at the pass/advanced level and θ p is the value of theta (.6416) corresponding to the raw score (30) at the pass/proficient level. Solving for a yields: And substituting the values of theta corresponding to the raw score cuts gives: Solving for b yields: And substituting the values of θp and a gives

Unravelling the Mysteries...14 From Scaled Scores to Equated Forms 3

Unravelling the Mysteries...15 From Scaled Scores to Equated Forms How are Scaled Scores Connected to Equating? When a test is built, the item difficulties (in the Rasch metric) are known from the field test statistical analyses. The tests are built to Rasch difficulty targets for the overall test and all reporting categories based on the standard setting form. Even though an attempt is made to construct test forms of equal Rasch-based difficulty from form to form and year to year, there will be small variations in difficulty. When building tests, the IRT model makes it possible to estimate the raw score that corresponds to a scale score of 400. Each core form of a test is equated to the established scale so that the scores indicate the same level of achievement regardless of the core form taken.

Unravelling the Mysteries...16 From Scaled Scores to Equated Forms The Basics of Equating Common-Item Nonequivalent Groups Design The common-item set is constructed as a “mini version” of the total test. Year 1Year 2 Test XTest Y Item C1CommonItem C1 Item …ItemsItem … Item C10 Item X1Item Y1 Item X2Item Y2 Item … Item X50Item Y50

Unravelling the Mysteries...17 From Scaled Scores to Equated Forms The Basics of Equating Year 1 (more difficult)Year 2 (less difficult) bTest XTest Yb Mean b = 0.5 Item C1Common ItemsItem C1-1.3 Mean b = 0.2 …Item …Difference =Item …… 0.8Item C = 0.3Item C100.5 …Item X1Item Y1-1.5 …Item X2Item Y2-0.6 …Item … … …Item X50Item Y501.3

Unravelling the Mysteries...18 Recap of How It All Comes Together Scores Test is Equated Test is Scaled Cut Scores are adopted by SBOE Cut Scores are Recommended Test Is Developed

Questions?