A Comparison of Progressive Item Selection Procedures for Computerized Adaptive Tests Brian Bontempo, Mountain Measurement Gage Kingsbury, NWEA Anthony.

Slides:



Advertisements
Similar presentations
Chapter 3 Properties of Random Variables
Advertisements

What You Need to Know about the Computer Adaptive NREMT Exam.
Hong Jiao, George Macredy, Junhui Liu, & Youngmi Cho (2012)
Mean, Proportion, CLT Bootstrap
Consistency in testing
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
Analysis of Variance Outlines: Designing Engineering Experiments
Materials for Lecture 11 Chapters 3 and 6 Chapter 16 Section 4.0 and 5.0 Lecture 11 Pseudo Random LHC.xls Lecture 11 Validation Tests.xls Next 4 slides.
Sampling: Final and Initial Sample Size Determination
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Point and Confidence Interval Estimation of a Population Proportion, p
Evaluating Hypotheses
The Question The Answer P = 94 %. Practical Uses of   To infer  from S x To compare a sample to an assumed population To establish a rejection criterion.
Estimating a Population Proportion
+ A New Stopping Rule for Computerized Adaptive Testing.
Hui-Hua Lee 1, Kevin R. Piner 1, Mark N. Maunder 2 Evaluation of traditional versus conditional fitting of von Bertalanffy growth functions 1 NOAA Fisheries,
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Standard error of estimate & Confidence interval.
Statistics for Managers Using Microsoft® Excel 7th Edition
A comparison of exposure control procedures in CATs using the 3PL model.
Technical Considerations in Alignment for Computerized Adaptive Testing Liru Zhang, Delaware DOE Shudong Wang, NWEA 2014 CCSSO NCSA New Orleans, LA June.
Identification of Misfit Item Using IRT Models Dr Muhammad Naveed Khalid.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
T-distribution & comparison of means Z as test statistic Use a Z-statistic only if you know the population standard deviation (σ). Z-statistic converts.
Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs.
Sampling: Final and Initial Sample-Size Determination
Analysis and Visualization Approaches to Assess UDU Capability Presented at MBSW May 2015 Jeff Hofer, Adam Rauk 1.
Topic 5 Statistical inference: point and interval estimate
1 Chapter 6. Section 6-1 and 6-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Ying (“Alison”) Cheng 1 John Behrens 2 Qi Diao 3 1 Lab of Educational and Psychological Measurement Department of Psychology, University of Notre Dame.
Lecture 2 Forestry 3218 Lecture 2 Statistical Methods Avery and Burkhart, Chapter 2 Forest Mensuration II Avery and Burkhart, Chapter 2.
A Process Control Screen for Multiple Stream Processes An Operator Friendly Approach Richard E. Clark Process & Product Analysis.
OPENING QUESTIONS 1.What key concepts and symbols are pertinent to sampling? 2.How are the sampling distribution, statistical inference, and standard.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
NCLEX ® is a Computerized Adaptive Test (CAT) How Does It Work?
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Analysis Overheads1 Analyzing Heterogeneous Distributions: Multiple Regression Analysis Analog to the ANOVA is restricted to a single categorical between.
Issues concerning the interpretation of statistical significance tests.
1 Chapter 6. Section 6-1 and 6-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
1 OUTPUT ANALYSIS FOR SIMULATIONS. 2 Introduction Analysis of One System Terminating vs. Steady-State Simulations Analysis of Terminating Simulations.
Machine Learning Chapter 5. Evaluating Hypotheses
Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.
Bayesian Approach For Clinical Trials Mark Chang, Ph.D. Executive Director Biostatistics and Data management AMAG Pharmaceuticals Inc.
Figure 13.1 Relationship to the Previous Chapters & The Marketing Research Process Figure 13.1 Relationship of Sample Size Determination to the Previous.
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Practical Issues in Computerized Testing: A State Perspective Patricia Reiss, Ph.D Hawaii Department of Education.
Dynamic Models, Autocorrelation and Forecasting ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
The Design of Statistical Specifications for a Test Mark D. Reckase Michigan State University.
1 Probability and Statistics Confidence Intervals.
1 ES Chapters 14 & 16: Introduction to Statistical Inferences E n  z  
© 2009 Pearson Education, Inc publishing as Prentice Hall 13-1 Chapter 13 Sampling: Final and Initial Sample-Size Determination.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Multilevel modelling: general ideas and uses
Confidence Intervals for Proportions
ESTIMATION.
M.Sc. Project Doron Harlev Supervisor: Dr. Dana Ron
PSY 614 Instructor: Emily Bullock, Ph.D.
Evaluation of measuring tools: reliability
Aligned to Common Core State Standards
Section 7.7 Introduction to Inference
Mohamed Dirir, Norma Sinclair, and Erin Strauts
Evaluating Hypotheses
Tutorial 9 Suppose that a random sample of size 10 is drawn from a normal distribution with mean 10 and variance 4. Find the following probabilities:
LESSON 18: CONFIDENCE INTERVAL ESTIMATION
Evaluating Hypothesis
Implementation of the Bayesian approach to imputation at SORS Zvone Klun and Rudi Seljak Statistical Office of the Republic of Slovenia Oslo, September.
Uncertainty Propagation
Presentation transcript:

A Comparison of Progressive Item Selection Procedures for Computerized Adaptive Tests Brian Bontempo, Mountain Measurement Gage Kingsbury, NWEA Anthony Zara, Pearson VUE

Soap Box Problems with Item Exposure Control Mechanism research to date –Focus has been on the frequency of exposure not the duration of time in the field, fresh items vs. stale items –Not enough empirical research linking exposure to parameter drift –Focus has been on OVER exposure and not enough on under exposure (of high quality items) Referred to Item Exposure Control Mechanisms rather than Item Selection Algorithms

Issues with Maximum Information CAT Item Overexposure & Underexposure Sparse Data Matrix –Narrow ability distribution around each operational item P-Values approach target probability Item-Total Point Biserial-Correlation Coefficients have restriction of range issues DIF - no examinees around true difficulty so estimation is off Parameter drift – no examinees around true difficulty so estimation is off Item Overlap between adjacent tests

Item Selection Algorithms Kingsbury, G.G. & Zara, A.R. (1991) –The  items (“pond”) with the most information are selected. From there, a single item is selection at random. Revuelta, J. & Ponsada, V. (1998) –Items are selected completely at random at the beginning of the test and selected entirely based on maximum information at the end w=(1-s)R i +sI.

Item Selection Algorithms Kingsbury, G.G. & Zara, A.R. (1991) –Succeeded in reducing exposure and overlap –Did not widen the variance of the ability of candidates taking each item Revuelta, J. & Ponsada, V. (1998) –Succeeded in reducing exposure –Succeeded in widening the variance of the ability of candidates taking each item –Major problems with overlap between adjacent tests

Hybrid Randomesque Progressive Item Selections Algorithms Improve pool utilization Improve the usefulness of p-value, pt-bis, DIF, and drift Reduce overlap

Hybrid Randomesque Progressive Item Selections Algorithms Progressive Random to Targeted using Information –Select one item at random from the  items with the greatest weights (w) w = (1-s)R i +sI s = Serial position (sequence number)/test length R = Random component I = Test Information

Hybrid Randomesque Progressive Item Selections Algorithms Progressive Random to Targeted with a fixed probability of correct response –Select one item at random from the  items with the greatest weights (w) w = (1-s)/R i +s/|P ij – P target | s = Serial position (sequence number)/test length R = Random component P ij = Probability of Correct Response

Hybrid Randomesque Progressive Item Selections Algorithms Progressive Random to Targeted with a linear shrinking pond size –Select one item at random from the  items that are best targeted or yield the highest information  ij =N pool -s(N pool /N test )+c s = Serial position (sequence number)/test length N pool = Number of Items in Item Pool N test = Number of Items in the Test c = constant

Hybrid Randomesque Progressive Item Selections Algorithms Progressive Random to Targeted using SEM –Select one item at random from the  items that are within the probability derived from the confidence interval around the ability estimate P i (  low ) < P i (  < P i (  high ) P i (  low ) = Calculate the item parameters for a perfectly targeted item using the ability estimate at the low end of the confidence interval. Then calculate the probability of correct response to this item using the ability estimate P i (  high ) = Calculate the item parameters for a perfectly targeted item using the ability estimate at the high end of the confidence interval. Then calculate the probability of correct response to this item using the ability estimate

Simulation Study

Algorithms Tested Maximum information Kingsbury & Zara Progressive Progressive Random to Targeted using Information (  =10) Progressive Random to Targeted with a fixed probability of correct response (  =10) Progressive Random to Targeted with varying pond size (c=length of test/item pool size) Progressive Random to Targeted using SEM (1.36)

Simulation Design Item pool - 1,000 actual item parameter estimates (1 PL/Rasch) Test design - 3 different fixed test lengths –25 items –50 items –100 items Test takers – A sample of 10,000 test takers was drawn randomly from the initial sample of test takers. For each sim, the ability estimate from the actual test was input as the true trait level. 21 sims per test taker (3 test lengths X 7 item selection algorithms)

Evaluation Criteria Impact on test precision Impact on the variance in the ability distribution for each item Impact on item exposure and usage

Results

Precision

Exposure

Variance in Ability Estimate

P-Value

Item-Total Point-Biserial

Summary Quality CAT design should focus on effective Item Selection Algorithms not Item Exposure Control Mechanisms We can evaluate Item Selection Algorithms based on efficiency, pool utilization, and the distribution of the variance in the ability estimates around the items. Four Hybrid Progressive Randomesque item selection algorithms were defined. The Progressive Random to Targeted using Test Information proved successful.

Future Research The algorithms need to be tweaked. The algorithms need to be tested on longer tests. The overlap between adjacent tests needs to be assessed. The study needs to include an items select at random algorithm as a benchmark.

Thank You for Listening! For a copy of the paper contact: Brian Bontempo, Ph.D.