+ A New Stopping Rule for Computerized Adaptive Testing.

Slides:



Advertisements
Similar presentations
Implications and Extensions of Rasch Measurement.
Advertisements

Hong Jiao, George Macredy, Junhui Liu, & Youngmi Cho (2012)
Multidimensional Adaptive Testing with Optimal Design Criteria for Item Selection Joris Mulder & Wim J. Van Der Linden 1.
SAMPLE DESIGN: HOW MANY WILL BE IN THE SAMPLE—DESCRIPTIVE STUDIES ?
VALIDITY AND RELIABILITY
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Item Response Theory in Health Measurement
Measurement Reliability and Validity
Statistics Versus Parameters
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Concept of Measurement
Evaluating Hypotheses
Examing Rounding Rules in Angoff Type Standard Setting Methods Adam E. Wyse Mark D. Reckase.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.
Determining the Size of
Computerized Adaptive Testing in Clinical Substance Abuse Practice: Issues and Strategies Barth Riley Lighthouse Institute, Chestnut Health Systems.
A comparison of exposure control procedures in CATs using the 3PL model.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
1 Reducing the duration and cost of assessment with the GAIN: Computer Adaptive Testing.
Classical and Bayesian Computerized Adaptive Testing Algorithms Richard J. Swartz Department of Biostatistics
Measurement and Data Quality
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Portfolio Management Lecture: 26 Course Code: MBF702.
United Nations Workshop on Principles and Recommendations for a Vital Statistics System, Revision 3, for African English-speaking countries Addis Ababa,
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Extension to Multiple Regression. Simple regression With simple regression, we have a single predictor and outcome, and in general things are straightforward.
Variables, sampling, and sample size. Overview  Variables  Types of variables  Sampling  Types of samples  Why specific sampling methods are used.
Selecting and Recruiting Subjects One Independent Variable: Two Group Designs Two Independent Groups Two Matched Groups Multiple Groups.
ASSESMENT IN OPEN AND DISTANCE LEARNING Objectives: 1.To explain the characteristic of assessment in ODL 2.To identify the problem and solution of assessment.
Applying SGP to the STAR Assessments Daniel Bolt Dept of Educational Psychology University of Wisconsin, Madison.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Multiple Perspectives on CAT for K-12 Assessments: Possibilities and Realities Alan Nicewander Pacific Metrics 1.
Handbook on Precision Requirements and Variance Estimation for ESS Household Surveys Denisa Florescu, Eurostat European Conference on Quality in Official.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
NCLEX ® is a Computerized Adaptive Test (CAT) How Does It Work?
© The McGraw-Hill Companies, Inc., 2002 McGraw-Hill/Irwin Slide Cost Estimation.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
Item Response Theory in Health Measurement
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
Brian Lukoff Stanford University October 13, 2006.
The Design of Statistical Specifications for a Test Mark D. Reckase Michigan State University.
Validity & Reliability. OBJECTIVES Define validity and reliability Understand the purpose for needing valid and reliable measures Know the most utilized.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Introduction to emulators Tony O’Hagan University of Sheffield.
Reducing Burden on Patient- Reported Outcomes Using Multidimensional Computer Adaptive Testing Scott B. MorrisMichael Bass Mirinae LeeRichard E. Neapolitan.
Measurement and Scaling Concepts
Combining Deterministic and Stochastic Population Projections Salvatore BERTINO University “La Sapienza” of Rome Eugenio SONNINO University “La Sapienza”
Sampling and Sampling Distribution
Empathy in Medical Care Jessica Ogle (D
Concept of Test Validity
ICS 280 Learning in Graphical Models
Student Growth Measurements and Accountability
Reliability & Validity
Sampling Distribution
Sampling Distribution
Shasta County Curriculum Leads November 14, 2014 Mary Tribbey Senior Assessment Fellow Interim Assessments Welcome and thank you for your interest.
Aligned to Common Core State Standards
Mohamed Dirir, Norma Sinclair, and Erin Strauts
EPSY 5245 EPSY 5245 Michael C. Rodriguez
A Multi-Dimensional PSER Stopping Rule
Presentation transcript:

+ A New Stopping Rule for Computerized Adaptive Testing

+ Objective The predicted standard error reduction (PSER) stopping rule uses the predictive posterior variance to determine the reduction in standard error that would result from the administration of additional items. The performance of the PSER was compared with that of the minimum standard error stopping rule and a modified version of the minimum information stopping rule. Results indicate that the PSER makes efficient use of CAT item pools, administering fewer items when predictive gains in information are small and increasing measurement precision when information is abundant.

+ Theoretical Framework Fixed length stopping rule Simplicity Limit the efficiency on unnecessary administration of items Variable length stopping rules The standard error (SE) stopping rule Equivalent measurement precision across the examinee trait continuum Limit measurement precision even though informative items are still available The minimum information stopping rule Prevent the administration of items that contribute little information Deliver less accurate measurement precision than the minimum SE stopping rule

+ Theoretical Framework The difficult balance between measurement precision and testing efficiency can subject existing stopping rules to two potential problems that are interest in the current study. Meaningful gains in measurement precision may be both desirable and possible with the administration of only few additional items. (fixed length and the minimum SE) Items are unnecessarily administered to examinees for more precise trait estimates are unlikely. Poor match between pool information and trait distribution (fixed length and the minimum SE) Good match between pool information and trait distribution (the minimum information)

+ Theoretical Framework

+

+ There is strong motivation to reduce testing burden and high measurement precision for examinees whose trait levels fall within the targeted range on medical and psychiatric CAT assessments. The new stopping rule, called the predicted standard error reduction (PSER), seeks to balance the dual concerns of measurement precision and testing efficiency by considering the predicted change in measurement precision that would result from the administration of additional items. The purpose of the research is to explore the properties of the PSER and assess its performance with regard to the number of items administered and the accuracy of theta estimation in comparison with existing stopping rules.

+ Method The PSER Hyper (∆SE>= 0.03 but SE is met, continue) Hypo (∆ SE<0.01 but SE is not met, stop) The minimum SE stopping rule The modified minimum information stopping rule (with the minimum SE)

+ Method Item pool: mismatched: +1.5 right shifted matched 30 and 90 items 1000 examinees draw from N(0, 1) GRM MEPV for item selection and EAP for trait estimation Criteria The PSER (0.03 for hyper and 0.01 for hypo)  hyper + hypo The min SE (0.3)  base line The min information (0.5)  hypo

+ Results

+

+

+

+

+

+ Conclusion The new method (PSER) based on the predictive posterior variance. The statistic is computed as part of most fully Bayesian item selection procedures, including the minimum expected posterior variance criterion (MEPV). The PSER tried to ask one basic question: is it worth continuing? The “worth” is conceptualized as the amount of reduction in uncertainty and measured against two threshold values, the hyper(H+) and hypo (H-) thresholds. The PSER stopping rule seeks to balance the dual concerns of measurement precision (H+) and testing efficiency (H-) by considering the predicted change in measurement precision that would result from the administration of additional items.