Test co-calibration and equating

Slides:

Advertisements

Similar presentations

Project Based Learning

Advertisements

An Introduction to Test Construction

Implications and Extensions of Rasch Measurement.

BCH Codes Hsin-Lung Wu NTPU.

Common Core Initiative FAQ Who is leading the Common Core State Standards Initiative? The Council of Chief State School Officers.

IRT Equating Kolen & Brennan, IRT If data used fit the assumptions of the IRT model and good parameter estimates are obtained, we can estimate person.

Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.

VERTICAL SCALING H. Jane Rogers Neag School of Education University of Connecticut Presentation to the TNE Assessment Committee, October 30, 2006.

Writing the Research Paper

LinearRelationships Jonathan Naka Intro to Algebra Unit Portfolio Presentation.

IMSS005 Computer Science Seminar

Welcome LEAP Math Sessions. Test Specifications Reporting Category% of Total Points Number and Operations55 Fractions30 Measurement, Data, and Geometry15.

Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.

Sociology 5811: Lecture 14: ANOVA 2

Rational/Theoretical Cognitive Task Analysis Ken Koedinger Key reading: Zhu, X., & Simon, H. A. (1987). Learning mathematics from examples and by doing.

A Change in Assessment Procedures.  Looking at Infrastructure  Computer updates  Working with IT department (more than pre id)  Integrating technology.

ASSESMENT IN OPEN AND DISTANCE LEARNING Objectives: 1.To explain the characteristic of assessment in ODL 2.To identify the problem and solution of assessment.

Understanding Alaska Measures of Progress Results: Reports 1 ASA Fall Meeting 9/25/2015 Alaska Department of Education & Early Development Margaret MacKinnon,

Measures of Variability. Why are measures of variability important? Why not just stick with the mean?  Ratings of attractiveness (out of 10) – Mean =

Grading and Analysis Report For Clinical Portfolio 1.

1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.

Scaling and Equating Joe Willhoft Assistant Superintendent of Assessment and Student Information Yoonsun Lee Director of Assessment and Psychometrics Office.

McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:

Moving the Evidence Review Process Forward Alex R. Kemper, MD, MPH, MS September 22, 2011.

University of Ostrava Czech republic 26-31, March, 2012.

1 AMP Results Overview for Educators October 30, 2015.

Curriculum Services Team USING THE ITEM INFORMATION REPORTS.

Alaska Measures of Progress (AMP) Summative Assessment Framework 1.

DIF and cross-cultural measurement of cognitive functioning Paul K. Crane, MD MPH Laura B. Gibbons, PhD.

Math Instructional Coaches Meeting 9/19/2013. Myth #1 If your lessons are engaging, you won’t have discipline problems.

LISA A. KELLER UNIVERSITY OF MASSACHUSETTS AMHERST Statistical Issues in Growth Modeling.

ClassScape Benchmark Review Quarter 1. Teacher Resource For Formative Assessments.

IRT Equating Kolen & Brennan, 2004 & 2014 EPSY

Comments on the low education  dementia link Paul K. Crane, MD MPH.

Instrument Development and Psychometric Evaluation: Scientific Standards May 2012 Dynamic Tools to Measure Health Outcomes from the Patient Perspective.

Intro to Data Structures Concepts ● We've been working with classes and structures to form linked lists – a linked list is an example of something known.

Unit 2 Lesson #1 Derivatives 1 Interpretations of the Derivative 1. As the slope of a tangent line to a curve. 2. As a rate of change. The (instantaneous)

Chapter 1 Assessment in Elementary and Secondary Classrooms

Reliability and Validity

Reading and Writing to Succeed on the EAS (Educating All Students) Exam: The “Constructed Response” or Short Essay A Student Workshop by Writing Across.

Assessment of Dispositions

Properties of Operations

Taking the TEAM Approach: Writing with a Purpose

Test Blueprints for Adaptive Assessments

Smarter Balanced Assessment Results

The Impact of Item Response Theory in Educational Assessment: A Practical Point of View Cees A.W. Glas University of Twente, The Netherlands University.

EBPM Status Much research is funded annually

The Impact of Item Response Theory in Educational Assessment: A Practical Point of View Cees A.W. Glas University of Twente, The Netherlands University.

Boston Tutoring Services: The Redesigned SAT

Paul K. Crane, MD MPH Dan M. Mungas, PhD

Booklet Design and Equating

Statistics for the Social Sciences

The psychometrics of Likert surveys: Lessons learned from analyses of the 16pf Questionnaire Alan D. Mead.

Preparing for the Verbal Reasoning Measure

Statistics and Research Desgin

Becoming Familiar with the GRE® General Test

Systematic reviews, meta analyses, and cost effectiveness studies

Differentiating “Combined” Functions ---Part I

OAKS Online Science & Optional Social Sciences

Practical Introduction to PARSCALE

Differentiating “Combined” Functions ---Part I

Standard Deviation (SD) & Standard Error of the Mean (SEM)

Notes Over 7.4 Finding an Inverse Relation

Investigating item difficulty change by item positions under the Rasch model Luc Le & Van Nguyen 17th International meeting of the Psychometric Society,

Margaret Wu University of Melbourne

DIF detection using OLR

Introduction to IRT for non-psychometricians

Multiplication Properties

AGED 570: Teaching H.S. Agricultural Economics

Presentation transcript:

Test co-calibration and equating Paul K. Crane, MD MPH General Internal Medicine University of Washington

Outline Definitions and motivation Educational testing literature Concurrent administration designs Separate administration designs PARSCALE coding considerations Illustration with CSI ‘D’ and CASI Coming attractions; comments

Definition Distinction between “equating” and “co-calibration” We almost always mean “co-calibration” General idea is to get all tests of a kind on the same metric Error terms will likely differ, but tests are trying to measure the same thing

5 things needed for “equating” Scale measures same concept Scales have same level of precision Procedures from scale A to B are inverse of scale B to A Distribution of scores should be identical for individuals of a given level Equating function should be population invariant (Linn, 1993; Mislevy, 1992; Dorans, 2000)

Motivation for co-calibration Many tests measure “the same thing” MMSE, 3MS, CASI, CSI ‘D’, Hasegawa, Blessed…. PRIME-MD, CESD, HAM-D, BDI, SCID…. Literature only interpretable if one is familiar with the nuances of the test(s) used Studies that employ multiple measures (such as the CHS) face difficulty in incorporating all their data into their analyses In sum: facilitates interpretation and analysis

Educational literature Distinct problems: Multiple levels of same topic, e.g. 4th grade math, 5th grade math, etc. (“vertical” equating) Multiple forms of same test, e.g. dozens of forms of SAT, GRE to prevent cheating (“horizontal” equating) Making sure item difficulty is constant year to year (item drift analyses)

Strategies are the same Either need to have common items in different populations, or common people with different tests Analyze big dataset that contains all items and people Verify that common (people or items) are acting as expected

Concurrent administration Common population design: Population Test 1 Test 2 Test 3

Separate administration Anchor test design – e.g., McHorney Pop. 1 Pop. 2 Pop.3 Anchor items Pop. 1 items (missing) Pop. 2 items Pop. 3 items

Item bank development 1 2 3 A unique A  B B unique B  C A  C C unique

Comments Fairly simple; God is in the details! Afternoon workgroup will address the details Illustration to follow

PARSCALE code For concurrent administration, it’s as if there is a single longer test For separate administration, basically a lot of missing data Once data are in correct format, PARSCALE does the rest

Illustration: CSI‘D’ and CASI

Information curves

SEM

Relative information

Coming attractions Optimizing screening tests from a pool of items (on Friday) Item banking and computer adaptive testing (PROMIS initiative) Incorporation of DIF assessment (tomorrow) Comments and questions