Adventures in Equating Land: Facing the Intra-Individual Consistency Index Monster * *Louis Roussos retains all rights to the title.

Adventures in Equating Land: Facing the Intra-Individual Consistency Index Monster * *Louis Roussos retains all rights to the title

Overview of Equating Designs and Methods Designs –Single Group –Random Groups –Common Item Nonequivalent Groups (CING) Methods –Mean –Linear –Equipercentile –IRT True or Observed

Guidelines for Selecting Common Items for Multiple-Choice (MC) Only Exams Representative of the total test (Kolen & Brennan, 2004) 20% of the total test Same item positions Similar average/spread of item difficulties (Durans, Kubiak, & Melican, 1997) Content representative (Klein & Jarjoura, 1985)

Challenges in Equating Mixed-Format Tests (Kolen & Brennan, 2004; Muraki, Hombo, & Lee, 2000) Constructed Response (CR) scored by raters Small number of tasks –Inadequate sampling of construct –Changes in construct across forms Common Items –Content/difficulty balance of common items –MC only may result in inadequate representation of groups/construct IRT –Small number of tasks may result in unstable parameter estimates –Typically assume a single dimension underlies both item types Format Effects

Current Research Number of CR Items –Smaller RMSD with larger numbers of items and/or score points (Li and Yin, 2008; Fitzpatrick and Yen, 2001) –Misclassification (Fitzpatrick and Yen, 2001) Fewer than 12 items, more score points resulted in smaller error rates Greater than 12 items, error rates less than 10% regardless of score points Trend Scoring (Tate, 1999, 2000; Kim, Walker, McHale, 2008) –Rescoring samples of CR items –Smaller bias and equating error

Cont. Format Effects (FE) –MC and CR measure similar constructs (Ercikan et al., 1993; Traub, 1993) –Males scored higher on MC; females higher on CR ( DeMars, 1998; Garner & Engelhard, 1999) –Kim and Kolen, 2006 Narrow-range tests (e.g., credentialing) Wide-range tests (e.g., achievement) Individual Consistency Index (Tatsuoka & Tatsuoka, 1982) –Detecting aberrant response patterns –Not specifically in the context of mixed-format tests

Purpose and Research Questions Purpose: Examine the impact of equating mixed format tests when student subscores differ across item types. Specifically, To what extent does the intra-individual consistency of examinee responses across item formats impact equating results? How does the selection of common items differentially impact equating results with varying levels of intra-individual consistency?

Data “Old Form” (OL) treated as “truth” –Large-scale 6 th grade testing program –Mathematics –54 point test 34 multiple choice (MC) 5 short answer (SA) 5 constructed response (CR) worth 4 points each –Approx. 70,000 examinees “New Form” (NE) –Exactly the same items as OL –Samples of examinees from OL

2006-07 Scoring Test 39 Items OL (old form) All Examinees NE (new form) Samples of 3,000 Examinees 2006-07 Scoring Test 39 Items Both OL and NE contain the exact same items Only difference between the forms are the examinees

Intra-Individual Consistency Consistency of student responses across formats Regression of dichotomous item subscores (MC and SA) onto polytomous item subscores (CR) Standardized residuals –Range from approximately -4.00 to +8.00 –Example: Index of +2.00 Student subscores on CR under-predicted by two standard deviations based on MC subscores

Samples Three groups of examinees based on intra- individual consistency index –Below -1.50 (NEG) –-1.50 to +1.50 (MID) –Above +1.50 (POS) 3,000 examinees per sample Sampled from each group based on percentages Samples selected to have same quartiles and median as whole group of examinees

Sampling Conditions 60/20/20 –60% sampled from one of the groups (i.e., NEG, MID, POS) –20% sample from each of the remaining groups –Repeated for each of the three groups 40/30/30

Common Items Six sets of common items –MC only (12 points) –CR only (12 points) –MC (4) and CR (8) –MC (8) and CR (4) –MC (4), CR (4), and SA (4) –MC (7), CR (4), and SA (1) Representative of total test in terms of content, difficulty and length

Equating Common-item nonequivalent groups design Item parameters calibrated using Parscale 4.1 –3-parameter logistic model (3PL) for MC items –2PL model for SA items –Graded Response Model for CR items IRT scale transformation –Mean/mean, mean/sigma, Stocking-Lord, and Haebara IRT true score equating

OLNE “Common” Items Equating conducted using only a selection of items treated as common Equating OL and NE All items shared in common “Truth” established by equating NE to OL using all items as common items

Evaluation Bias and RMSE –At each score point –Averaged over score points Classification Consistency

Results: 60% Mid

Results: 40% Mid

In the extreme…

Across the Score Scale: Average Bias

Across the Score Scale: Average RMSE

Across the Score Scale: Misclassification Rates

Classification Consistency: Proficient

Discussion Different equating results based on sampling conditions Differences more exaggerated when using common items sets with mostly CR items Mid 60 most similar to data, small differences across common item selections

Limitations and Implications Limitations –Sampling conditions –Common item selections –Only one equating method Implications for future research –Sampling conditions, common item selections, additional equating methods –Other content areas and grade levels –Other testing programs –Simulation studies

Thanks! Rob Keller Mike, Louis, Won, Candy, and Jessalyn

Adventures in Equating Land: Facing the Intra-Individual Consistency Index Monster * *Louis Roussos retains all rights to the title.

Similar presentations

Presentation on theme: "Adventures in Equating Land: Facing the Intra-Individual Consistency Index Monster * *Louis Roussos retains all rights to the title."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Adventures in Equating Land: Facing the Intra-Individual Consistency Index Monster * *Louis Roussos retains all rights to the title.

Similar presentations

Presentation on theme: "Adventures in Equating Land: Facing the Intra-Individual Consistency Index Monster * *Louis Roussos retains all rights to the title."— Presentation transcript:

Similar presentations

About project

Feedback