Download presentation
Presentation is loading. Please wait.
1
Aligned to Common Core State Standards
Effects of Item Bank Design and Item Selection Methods on Content Balance and Efficiency in Computerized Adaptive Reading Tests with Mix of Short and Long Passages Aligned to Common Core State Standards Shudong Wang Northwest Evaluation Association Liru Zhang Delaware Department of Education Paper presented at the NCSA National Conference on Student Assessment June 22-25, 2015, San Diego, California.
2
I. Introduction Common Core State Standards (CCSS) includes a set of consistent learning goals Any assessment that claims measuring the CCSS has to provide evidences to prove it measure the CCSS The Smarter Balanced Assessment Consortium (SBAC) program has adopted a Cognitive Rigor Matrix (CRM) The CRM includes content standards (Sub-contents) and cognitive complexity (Depth-of-Knowledge Levels, DOK ) Assessments aligned to CCSS (Such as SBAC) could capitalize on the efficiency of the computerized adaptive testing Test efficiency is defined as the mean number of items required to reach the certain reliability level Reading Comprehension (RC) as part of English Language Arts and Literacy specified by the CCSS is one of mandatory accountability measures A short passage could associate with one item (individual item). Along passage usually associate with a group of items (testlet) The constraints that are used to select RC items in CAT have two levels Passage (testlet) level: Topic, number of words, complexity measured by Lexile, etc. Individual item level: Item type, content standards (sub-content), cognitive complexity (DOK), etc. One of major issue in using CAT to measure RC is the feasibility of constraints on both levels and the impact on the test The danger of exhausting the item bank before certain test attributes can be satisfied always exists and could lead to less optimal measurement precision.
3
The CAT design can vary from adapting at individual item level as classical CAT to adapting at group (testlet) of items level as computer-adaptive multi-stage testing (ca-MST, Luecht & Nungester, 1998; Luecht & Sireci, 2011).
4
The major purpose of this study is to investigate the effects of item selection method and different pool distributions on the content validity and efficiency of mixed CAT and ca-MST(i-CAT) design.
5
Independent Variables
II. Method Design of Study: One balanced factorial experiment design with 100 replications Table 1. Plan of Design Dependent Variables Independent Variables Conditional Overall Distribution of Item Bank (B) Testlet Selection Method (T) Individual Item Selection Method (I) Bias 1, 2, 3, 4 1 (Yes), 2 (No) 1 (IM), 2 (ICM), 3 (ICDM) SE RMSE Conditional Indices Overall Indices Where T-true theta, N-number of examinee, R-replication
6
Item Bank % of DOK Level % of Content Area rb-DOK 1 2 3 4 0.2 0.5 0.3
IRT Model: Rasch (testlet effect was ignored) Item Banks and Test There total 2380 items in each of 4 banks 900 discrete items and 200 passage (testlets) items and # within testlet items from minimum 5 to maximum 10 Table 2. Characteristics of Item Bank and Targeted Test Property Ability Estimation Method Owen Bayesian estimation (OBE, Divgi, 1986) provides provisional ability estimation to select items Maximum Likelihood estimation (MLE) provides final scores. Method for Selecting the First Item Item difficulty value is 0.5 logit lower than examinee’s true ability from N(0,1) Item Bank % of DOK Level % of Content Area rb-DOK 1 2 3 4 0.2 0.5 0.3 0.25 0.6 0.4 0.1 Test Property
8
Item Selection Methods
Testlet Level (Testlet Selection Method) 1. |Theta Estimate – Median of Testlet Items Difficulty|<0.3 or 0.5 logit (Yes) 2. |Theta Estimate – Median of Testlet Items Difficulty|<5.0 logit (No) Item Level (Individual Item Selection Method) Item information (IM) IM + sub-content balance (ICM) ICM + DOK balance (ICDM) Each of methods (combination of testlet and individual item) has the same three tiers of procedure Tier 1. Selecting a group items (such as 5 to 15 items) from item bank based on criteria level one ( Inf > = /subc-ontent/DOK) if none item can be found, then go to tier 2. Tier 2. Selecting a group items from item bank based on criteria level two (Inf > = 0.216/sub-content/DOK), if none item can be found, then go to tier 3. Tier 3. Selecting a group items from item bank that have the best (by sorting) values of criteria in Tier 2.
9
III. Results and Discussion
1. Overall Accuracy Among three independent variables (item bank, testlet selection, item selection), only testlet selection and item selection methods have statistically significant impacts on bias, SE, and RMSE.
10
2. Conditional Accuracy (Item Bank 1 only)
11
3. Sub-Content (Item Bank 2 only)
12
4. DOK (Item Banks 1 and 3 only)
13
5. Item Exposure Rate at Eight True Theta
B=1, M=1, T=1 B=1, M=1, T=2
14
III. Conclusions In general, regardless of the distribution of item bank, Individual item selection has statistically significant impact on the accuracy of test. The IM method has the highest accuracy comparing to ICM and ICDM. Testlet selection has statistically significant impact on the accuracy of test. Not using testlet constraint will decrease accuracy of the test. In general, both distribution of item bank and individual item selection method have direct impact on the test content balance. Across different item banks, the test with attribution that is controlled by the item selection method will have better content quality. The ICM improves sub-content quality and the ICDM improves the DOK quality in test. For limited size of item bank, the number of item selection criterion should be kept as minimum as possible. Item exposure rate for testlet items is more difficulty to control than that for discrete items. Mixing discrete and testlet items produces reasonable results.
15
Thank you ! For any question: Shudong.wang@NWEA.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.