Brian Gong Center for Assessment

Slides:

Advertisements

Similar presentations

Iowa Assessment Update School Administrators of Iowa November 2013 Catherine Welch Iowa Testing Programs.

Advertisements

PA Core Implementation Overview. Begin with the end in mind Overview of the PA Core Standards Planning for Implementation of the PA Core Standards Rigor,

Designing Content Targets for Alternate Assessments in Science: Reducing depth, breadth, and/or complexity Brian Gong Center for Assessment Web seminar.

VERTICAL SCALING H. Jane Rogers Neag School of Education University of Connecticut Presentation to the TNE Assessment Committee, October 30, 2006.

MCAS-Alt: Alternate Assessment in Massachusetts Technical Challenges and Approaches to Validity Daniel J. Wiener, Administrator of Inclusive Assessment.

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Standard Setting Inclusive Assessment Seminar Marianne.

Consistency/Reliability

The State of the State TOTOM Conference September 10, 2010 Jim Leigh Office of Assessment and Information Services Oregon Department of Education.

C R E S S T / U C L A Improving the Validity of Measures by Focusing on Learning Eva L. Baker CRESST National Conference: Research Goes to School Los Angeles,

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Alignment Inclusive Assessment Seminar Brian Gong Claudia.

1 The New York State Education Department New York State’s Student Reporting and Accountability System.

Challenges in Developing a University Admissions Test & a National Assessment A Presentation at the Conference On University & Test Development in Central.

NEXT GENERATION BALANCED ASSESSMENT SYSTEMS ALIGNED TO THE CCSS Stanley Rabinowitz, Ph.D. WestEd CORE Summer Design Institute June 19,

Building Effective Assessments. Agenda  Brief overview of Assess2Know content development  Assessment building pre-planning  Cognitive factors  Building.

Better Assessment for Students with Disabilities by Design: Going beyond inclusion in the “Common Assessment” RFP Brian Gong Center for Assessment Presentation.

Confidential and Proprietary. Copyright © 2010 Educational Testing Service. All rights reserved. 10/7/2015 A Model for Scaling, Linking, and Reporting.

Including Quality Assurance Within The Theory of Action Presented to: CCSSO 2012 National Conference on Student Assessment June 27, 2012.

CCSSO Criteria for High-Quality Assessments Technical Issues and Practical Application of Assessment Quality Criteria.

Idaho State Department of Education Accessing Your ISAT by Smarter Balanced Data Using the Online Reporting System (ORS) Angela Hemingway Director, Assessment.

Copyright © 2013 by Educational Testing Service. All rights reserved. Vertical Articulation in the Context of States’ Transition to the Common Core State.

A Principled Approach to Accountability Assessments for Students with Disabilities CCSSO National Conference on Student Assessment Detroit, Michigan June.

Smarter Balanced Assessment System March 11, 2013.

Issues in Selecting Assessments for Measuring Outcomes for Young Children Issues in Selecting Assessments for Measuring Outcomes for Young Children Dale.

Guide to Test Interpretation Using DC CAS Score Reports to Guide Decisions and Planning District of Columbia Office of the State Superintendent of Education.

Understanding Alaska Measures of Progress Results: Reports 1 ASA Fall Meeting 9/25/2015 Alaska Department of Education & Early Development Margaret MacKinnon,

Using the Iowa Assessments Interpretation Workshops Session 3 Using the Iowa Assessments to Track Readiness.

Summary of Assessments By the Big Island Team: (Sherry, Alan, John, Bess) CCSS SBAC PARCC AP CCSSO.

NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.

1 Science, Learning, and Assessment: (Eats, Shoots, and Leaves) Choices for Comprehensive Assessment Design Eva L. Baker UCLA Graduate School of Education.

Options for Participation and Related IEP Decisions

Building an Interim Assessment System: A Workbook for School Districts CCSSO National Conference on Student Assessment Detroit, MI June 22, 2010.

Next Generation Iowa Assessments.  Overview of the Iowa Assessments ◦ Purpose of Forms E/F and Alignment Considerations ◦ Next Generation Iowa Assessments.

NAEP What is it? What can I do with it? Kate Beattie MN NAEP State Coordinator MN Dept of Education This session will describe what the National Assessment.

1 Testing Various Models in Support of Improving API Scores.

New Survey Questionnaire Indicators in PISA and NAEP

EVALUATING EPP-CREATED ASSESSMENTS

You Can’t Afford to be Late!

Next Generation Iowa Assessments

What is a CAT? What is a CAT?.

Assessment to Support Competency-Based Pathways

Quarterly Meeting Focus

Test Blueprints for Adaptive Assessments

Assessment Framework and Test Blueprint

Comparability of Assessment Results in the Era of Flexibility

Validity and Reliability

ASSESSMENT OF STUDENT LEARNING

Principles to Actions: Establishing Goals and Tasks

Bursting the assessment mythology: A discussion of key concepts

Interim Assessments: Do you know what you are buying and why

Validating Interim Assessments

Louisiana’s Comprehensive Assessment System

Common Core Update May 15, 2013.

Comparability of Assessment Results in the Era of Flexibility

Assessment Directors WebEx March 29, :00-3:00 pm ET

PARCC Assessments Overview

Considerations of Content Alignment in CAT

Standard Setting for NGSS

SAT and Accountability Evidence and Information Needed and Provided for Using Nationally Recognized High School Assessments for ESSA Kevin Sweeney,

Shasta County Curriculum Leads November 14, 2014 Mary Tribbey Senior Assessment Fellow Interim Assessments Welcome and thank you for your interest.

Connecting OSAS Math Results to Instruction and Program Evaluation

School Readiness and the Assessment of Children with Disabilities

Assessing Academic Programs at IPFW

School Readiness and the Assessment of Children with Disabilities

SUPPORTING THE Progress Report in MATH

Assessment Literacy: Test Purpose and Use

CCSSO National Conference on Student Assessment June 21, 2010

AACC Mini Conference June 8-9, 2011

Assessing Students With Disabilities: IDEA and NCLB Working Together

Student Learning Objectives (slos)

Presentation transcript:

Comparability Challenges and Solution Approaches for Emerging Assessment and Accountability Options Brian Gong Center for Assessment Presentation in the session on “Establishing Comparability of Assessment Results in an Era of Flexibility ” CCSSO National Conference on Student Assessment June 28, 2017 Austin, TX

Overview Description of context of new calls for comparability Three issues, some possible solution approaches Summary Comparability - Gong

Comparability & interpretation The key to comparability is interpretation and use – we want enough comparability to support our intended interpretations and uses Deep knowledge in measurement field about what affects comparability, what types of interpretations can be supported, what methods may be used to promote and evaluate comparability of scores and interpretations However, new desired uses/context and new types of assessments challenge us to consider what we mean by “comparable” and how to support interpretations of comparability with new methods Comparability - Gong

“Comparability” – We assume it In almost every test interpretation and use today, we assume that test scores are comparable We aggregate scores We interpret trend in performance over time We compare individuals and groups to each other We produce derivative scores that assume we can mathematically operate on multiple scores (e.g., index, growth, value-added) We make policy decisions and take practical actions upon the basis of these test score interpretations (e.g., school accountability, teacher evaluation, instructional intervention) Comparability - Gong

BUT… we are uneasy Because we also want many conditions that are not strictly the same (standardized) Different test forms for security Different test forms for efficiency (e.g., CAT) Different test forms for validity (sampling of domain) Different items, translations/formats, cognitive demand, and administration conditions for validity (accommodations, special populations) Different tests for policy and other reasons (each state; ACT/SAT; NAEP; TIMSS/PISA; AP/IB; Common Core?) Different tests across time Different tests across populations Different tests across time and populations Comparability - Gong

In addition, we want Different content/skills as grades progress Individual choice for production, application, and specialization Individualized information for diagnosis and program evaluation for individuals, subgroups, and programs Comparability - Gong

Our dilemma We want to act as though test scores were strictly comparable, but We also want a lot of conditions that prohibit making the tests and/or testing conditions the same, and in some cases we know the same items are invalid for different individuals So… How can we conceptually understand dimensions that inform our interpretations and uses? What technical tools and approaches are available to support us in making interpretations that involves “comparability of test scores”? Comparability - Gong

New options, new flexibility Multiple tests that are sort of the same purpose, but share no items and use special studies to make comparable (e.g., state high school test and college entrance exams) Multiple tests that are quite different in purpose and share no items (e.g., state test and commercial interim assessment, or other commercial assessment, e.g., OECD District-level PISA with state) Tests that may allow references from one testing program to another by sharing items (e.g., drawing on item banks with sufficient information to link to scales and/or performance levels) – openly available Comparability - Gong

Why might a state want this type of flexibility? Researchers have mapped state proficiency cuts to NAEP, and will likely continue to do so, enabling state-to-NAEP and indirectly state-to-state comparisons of proficiency State might want item-level linking because it wants: Comparisons to a scale other than NAEP Comparisons at the scale-score level Control over and detailed knowledge of the technical aspects Control over the timing, interpretation, publicity Needs trusted resources to do linking to external test because cannot develop on own Comparability - Gong

Comparability Continua Content Comparability less more Content Basis of Test Variations Same content area Same content standards Same test specs Same test items Score Comparability less more Pass/fail score/ decision Achievement level score Scale score Raw score Score Level Comparability - Gong 10

Comparability Continua – 2 Population Comparability less more Population characteristics Adjusted in interpretation s Adjusted characteristics Similar characteristics Same students Reporting-level Comparability less more State District School Student Level of reporting unit Comparability - Gong 11

Context of interpretation and use We can solve some of our problems by better specification of what we mean Don’t always need to create comparability at the “more” end of the continuum for content or scores Example: accountability is social valuing; may not need comparable test scores from assessment (e.g., 1%, 2%, ELP, very-low on- grade assessments) Example: claim about comparable achievement level performance at the state level Comparability - Gong

Item-bank linking: researchable task “Extreme linking” (Dorans) is commonly done, with appropriate safeguards and checks Other challenges to traditional linking— notably CAT—have been researched and acceptable solutions have led to wide use (e.g., parameter invariance over item order, test length, time of administration, etc.) Similarly, other item-bank solutions will need to specify under which conditions what types of comparability can be maintained, and show that empirically—but this is an exciting option Comparability - Gong

Summary Use the flexibility available to achieve your policy goals and intended uses Specify, specify, specify – so you know what is comparable and what is not, by intention or by constraint (where are you on the continua of content comparability, score comparability, population comparability, reporting unit comparability?) Validity and strict comparability may not go together Use the tools available to support appropriate comparability Focus on valid interpretations, as well as technical demands Empirically check your results Comparability - Gong

Questions? Comments? Thank you! Comparability - Gong

Brian Gong bgong@nciea.org Comparability - Gong