A comparison of exposure control procedures in CATs using the 3PL model.

Slides:

Advertisements

Similar presentations

Introduction Simple Random Sampling Stratified Random Sampling

Advertisements

Mathematics1 Mathematics 1 Applied Informatics Štefan BEREŽNÝ.

CSE 330: Numerical Methods

Multidimensional Adaptive Testing with Optimal Design Criteria for Item Selection Joris Mulder & Wim J. Van Der Linden 1.

Fast Algorithms For Hierarchical Range Histogram Constructions

Stratification (Blocking) Grouping similar experimental units together and assigning different treatments within such groups of experimental units A technique.

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.

1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.

Chapter 7. Statistical Estimation and Sampling Distributions

Exploring the Full-Information Bifactor Model in Vertical Scaling With Construct Shift Ying Li and Robert W. Lissitz.

Experimental Design, Response Surface Analysis, and Optimization

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Psychology 202b Advanced Psychological Statistics, II February 22, 2011.

Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Mutual Information Mathematical Biology Seminar

1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.

Chapter 11 Multiple Regression.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

+ A New Stopping Rule for Computerized Adaptive Testing.

The Analysis of Variance

Examing Rounding Rules in Angoff Type Standard Setting Methods Adam E. Wyse Mark D. Reckase.

SIMULATION. Simulation Definition of Simulation Simulation Methodology Proposing a New Experiment Considerations When Using Computer Models Types of Simulations.

1 Marketing Research Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides.

© UCLES 2013 Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh.

Maintenance of Selective Editing in ONS Business Surveys Daniel Lewis.

A Comparison of Progressive Item Selection Procedures for Computerized Adaptive Tests Brian Bontempo, Mountain Measurement Gage Kingsbury, NWEA Anthony.

Classical and Bayesian Computerized Adaptive Testing Algorithms Richard J. Swartz Department of Biostatistics

Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences

Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010.

Copyright 2010, The World Bank Group. All Rights Reserved. Agricultural Census Sampling Frames and Sampling Section A 1.

MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 13.

Crop area estimates with area frames in the presence of measurement errors Elisabetta Carfagna University of Bologna Department.

Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.

PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.

Applications The General Linear Model. Transformations.

Rasch trees: A new method for detecting differential item functioning in the Rasch model Carolin Strobl Julia Kopf Achim Zeileis.

Measurement Bias Detection Through Factor Analysis Barendse, M. T., Oort, F. J. Werner, C. S., Ligtvoet, R., Schermelleh-Engel, K.

Ying (“Alison”) Cheng 1 John Behrens 2 Qi Diao 3 1 Lab of Educational and Psychological Measurement Department of Psychology, University of Notre Dame.

Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.

Multiple Perspectives on CAT for K-12 Assessments: Possibilities and Realities Alan Nicewander Pacific Metrics 1.

McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. 1.

PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.

NCLEX ® is a Computerized Adaptive Test (CAT) How Does It Work?

ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.

Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Sampling Fundamentals 2 Sampling Process Identify Target Population Select Sampling Procedure Determine Sampling Frame Determine Sample Size.

Item pocket method to allow response review and change in CAT Kyung T. Han

NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.

ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.

Matching. Objectives Discuss methods of matching Discuss advantages and disadvantages of matching Discuss applications of matching Confounding residual.

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

Tutorial I: Missing Value Analysis

The Design of Statistical Specifications for a Test Mark D. Reckase Michigan State University.

Genetic algorithms: A Stochastic Approach for Improving the Current Cadastre Accuracies Anna Shnaidman Uri Shoshani Yerach Doytsher Mapping and Geo-Information.

Computacion Inteligente Least-Square Methods for System Identification.

Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.

Reducing Burden on Patient- Reported Outcomes Using Multidimensional Computer Adaptive Testing Scott B. MorrisMichael Bass Mirinae LeeRichard E. Neapolitan.

12. Principles of Parameter Estimation

CONCEPTS OF ESTIMATION

2. Stratified Random Sampling.

2. Stratified Random Sampling.

Sampling Distribution

Sampling Distribution

Aligned to Common Core State Standards

Mohamed Dirir, Norma Sinclair, and Erin Strauts

A Multi-Dimensional PSER Stopping Rule

Unfolding with system identification

12. Principles of Parameter Estimation

Presentation transcript:

A comparison of exposure control procedures in CATs using the 3PL model

Andrey Leroux Myriam Lopez Ian Hembry Barbara Dodd

The purpose of the study Compare the progressive-restricted standard error, the randomesque, the Sympson-Hetter and no exposure control methods in computerized adaptive testing Manipulated conditions: Item pool size Stopping rules Criteria Bias RMSE Item utility Item overlap

The application of computerized adaptive tests and computer-based tests has increased. Reduce item exposure rates and increase item pool usage. Refers to constraining the administration of more popular items that would otherwise become compromised due to repeated administrations (Georgiadou, Triantafillow, & Economides, 2007) Guarantees more variety in the items the examinees receive.

Variables related to the control of item exposure Precision of measurement The degree that the CAT system with exposure controls estimates examinees’ abilities when compared to the examinees’ known abilities. Exposure rate The number of times an item is administered to the total number of CATs administered. Pool utilization The percentage of items not administered throughout any of the CAT administrations. Test overlap The number of common items amongst the examinees.

Randomization strategies Randomly select an item for administration from a group of several items near the optimal level of maximum information strategy (McBride & Martin, 1983) Randomesque strategy (Kinsbury & Zara, 1989) Repeated selects the same number of the most informative items from which one is randomly selected for administration through testing. Decrease the overlap in items seen by examinees of similar abilities.

Conditional strategies Specify a desired maximum value and use the exposure control parameters to control whether or not the item can be administered. The values of these parameters have to be set through an iterative adjustment process in which at each step the effects of the previous adjustments are estimated using computer simulations of adaptive test administrations.

Sympson-Hetter (SH) Let t denote the iteration steps; P (t) (A i |S i ), the value of the control parameter for item i at step t; and P (t) (S i ) and P (t) (A i ), the probabilities of selecting and administering item i at step t. If the simulation at step t is completed, P (t) (S i ) and P (t) (A i ), are estimated, and for items for which the estimates of P (t) (A i )do not meet the requirement, the values of the control parameters are adjusted.

Stratified strategies When using maximum information, items with large a are more likely to be selected than those with small a values. Stratify the item pool and are constrained to be administered from a given strata. a-stratified procedure (Chang & Ying, 1999) Group items with similar a values and select within a group at each stage. a-stratified with b-blocking (Chang, Qian, & Ying, 2001) Group items into M blocks in ascending order of b-parameter values. Then, each of the M blocks is stratified into K strata according to their a parameters.

Combined strategies Two or more than two procedures are combined. Progressive-restricted procedure (PR) (Revuelta & Ponsoda, 1998) Progressive--Decide the maximum exposure rate (100k)% Restricted— s=h/m Progressive-restricted standard error procedure (PR-SE) s: the ratio of stopping rule SE over the current SE. Dichotomous 3PL model: Administer fewer items than PR, but yield similar correlations between estimated and known theta and low item overlap as PR. Polytomous partial credit model: use most of the item pool.

Method Study design Four exposure methods Progressive-restricted standard error procedure (exposure rate =.3) Sympton-Hetter (exposure rate =.3) Randomesque (five items) No exposure control Dichotomous 3PL model 2 item pool sizes (300, 540 items) Two stopping rules fixed-length with 50 items Variable-length with.3 of the SE or a maximum of 50 items

Item pools Nine different test forms Each contained 60 items, so the large item pool has 540 items Randomly selected 5 of the 9 forms for the small item pool Each test form contained 6 content areas 24%, 16%, 15%, 15%, 23%, 7% Data generation SAS macro IRTGEN 200 datasets

CAT simulations Content balancing Kingsbury and Zara (1989) : the every next item is chosen from the item content group with the largest difference in percentage between desired and current. EAP for ability estimation. Criteria Bias, RMSE, item exposure rates, pool utilization and item overlap across test administrations.

Results

Since EAP was estimated, theta are shrink-aged, and the negative theta will be positively biased.

Discussion The PR-SE method yielded good precision, low non- administration rates, and reduction in item overlap. Suggestions for use: Randomesque: minimal exposure control with high precision and test efficiency SH: for more balance between precision and item exposure control PR-SE: higher item exposure control

Comments Stratified strategies have been used in CAT. Why the authors stated these method are seldom used? By definition, bias is the difference between the expected estimate and the parameter, and RMSE is the root of averaged squared difference between the estimate and the parameter. Therefore, Equation 2 should be called averaged residual, and Equation 3 called the root of squared averaged residual. Barrada, Olea, Ponsoda, & Abad (2008) incorporated a quadratic form to have a nonlinear function. The purpose of nonlinear s is to control the contribution from a random component and to make information component more flexible.

Future studies Investigate the impact of different distributions of ability Compare to other exposure control procedures The PR combined with the different stopping rule The PR-SE method should be compared with contemporary ones like on-the-fly SH method, omega method proposed by Chen (2011). It is possible to incorporate a nonlinear s to the Progressive-restricted standard error procedure (PR-SE). The s parameter in PR-SE method is defined as the ratio of stopping rule SE over the current SE when administrating an item.

Another possible idea is to apply PR method and PR- SE method to CDM-CAT.