Hong Jiao, George Macredy, Junhui Liu, & Youngmi Cho (2012)

Slides:



Advertisements
Similar presentations
Estimating Population Values
Advertisements

Hypothesis Test II: t tests
MEASUREMENT Goal To develop reliable and valid measures using state-of-the-art measurement models Members: Chang, Berdes, Gehlert, Gibbons, Schrauf, Weiss.
Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.
AP STATISTICS LESSON 12 – 2 ( DAY 2 )
How Should We Assess the Fit of Rasch-Type Models? Approximating the Power of Goodness-of-fit Statistics in Categorical Data Analysis Alberto Maydeu-Olivares.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Multidimensional Adaptive Testing with Optimal Design Criteria for Item Selection Joris Mulder & Wim J. Van Der Linden 1.
Item Response Theory in Health Measurement
Exploring the Full-Information Bifactor Model in Vertical Scaling With Construct Shift Ying Li and Robert W. Lissitz.
Supervised Learning Recap
Latent Change in Discrete Data: Rasch Models
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Lecture 5: Learning models using EM
Discrete Event Simulation How to generate RV according to a specified distribution? geometric Poisson etc. Example of a DEVS: repair problem.
Evaluating Hypotheses
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
PSY 307 – Statistics for the Behavioral Sciences
Expectation Maximization Algorithm
Visual Recognition Tutorial
+ A New Stopping Rule for Computerized Adaptive Testing.
8-1 Introduction In the previous chapter we illustrated how a parameter can be estimated from sample data. However, it is important to understand how.
A comparison of exposure control procedures in CATs using the 3PL model.
A Comparison of Progressive Item Selection Procedures for Computerized Adaptive Tests Brian Bontempo, Mountain Measurement Gage Kingsbury, NWEA Anthony.
IRT Applications of Kullback- Leibler Divergence and Analysis of its Distribution Dmitry Belov Law School Admission Council Ronald Armstrong Rutgers University.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Bayesian Model Selection in Factorial Designs Seminal work is by Box and Meyer Seminal work is by Box and Meyer Intuitive formulation and analytical approach,
Modern Test Theory Item Response Theory (IRT). Limitations of classical test theory An examinee’s ability is defined in terms of a particular test The.
Error Analysis Accuracy Closeness to the true value Measurement Accuracy – determines the closeness of the measured value to the true value Instrument.
EM and expected complete log-likelihood Mixture of Experts
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Online Learning for Latent Dirichlet Allocation
The second part of Second Language Assessment 김자연 정샘 위지영.
A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models Russell Almond Florida State University College of Education Educational Psychology.
Discriminative Training and Acoustic Modeling for Automatic Speech Recognition - Chap. 4 Discriminative Training Wolfgang Macherey Von der Fakult¨at f¨ur.
Multiple Perspectives on CAT for K-12 Assessments: Possibilities and Realities Alan Nicewander Pacific Metrics 1.
L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 9 1 MER301:Engineering Reliability LECTURE 9: Chapter 4: Decision Making for a Single.
Module 14: Confidence Intervals This module explores the development and interpretation of confidence intervals, with a focus on confidence intervals.
Non-Bayes classifiers. Linear discriminants, neural networks.
NCLEX ® is a Computerized Adaptive Test (CAT) How Does It Work?
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Optimal Delivery of Items in a Computer Assisted Pilot Francis Smart Mark Reckase Michigan State University.
Assessing Estimability of Latent Class Models Using a Bayesian Estimation Approach Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University Departments.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
Practical Issues in Computerized Testing: A State Perspective Patricia Reiss, Ph.D Hawaii Department of Education.
Item Response Theory in Health Measurement
BPS - 3rd Ed. Chapter 191 Comparing Two Proportions.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Reliability EDUC 307. Reliability  How consistent is our measurement?  the reliability of assessments tells the consistency of observations.  Two or.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Reducing Burden on Patient- Reported Outcomes Using Multidimensional Computer Adaptive Testing Scott B. MorrisMichael Bass Mirinae LeeRichard E. Neapolitan.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Before 10.7 Review of the Distributive Property.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Classification of unlabeled data:
Comparing Two Proportions
Latent Variables, Mixture Models and EM
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Bayesian Models in Machine Learning
Basic Training for Statistical Process Control
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Mohamed Dirir, Norma Sinclair, and Erin Strauts
A Multi-Dimensional PSER Stopping Rule
Chapter 10: Basics of Confidence Intervals
LESSON 18: CONFIDENCE INTERVAL ESTIMATION
Review of Chapter 10 Comparing Two Population Parameters
CHAPTER 10 Comparing Two Populations or Groups
Biointelligence Laboratory, Seoul National University
Presentation transcript:

Hong Jiao, George Macredy, Junhui Liu, & Youngmi Cho (2012)

     

 Starting Point (first item) ◦ “Best guess”, “Use what you’ve got”, or “Start easy”. ◦ Selecting five items randomly from the calibration item pool  Item Selection Algorithm ◦ Fisher information ◦ Kullback-Leibler (KL) information  Termination Rule ◦ Fixed-length ◦ Fixed-precision

  The latent trait measured within each latent class is unidimensional but the latent traits measured across latent classes are multidimensional.  Estimation of ability parameters ◦ One single latent ability parameter ◦ Class-specific ability parameters

 Estimation of a single latent ability parameter, to maximize the KL information between two latent classes at the current ability estimate. ◦ ◦ Maximizes the information to distinguish between the latent classes conditional on the current ability estimate. ◦ Appropriate for used when the same latent ability is measured across latent classes.

 Estimation of a single latent ability parameter, to maximize the distinction between latent classes as well as between the current ability estimate and its true value. ◦ ◦ Maximizes the information to distinguish between both latent classes and the upper and lower bounds of the interval set around the current ability estimate. ◦ Appropriate for used when the same latent ability is measured across latent classes.

 Estimation of one latent ability for each latent class, to maximize the distinction between latent classes and between current ability estimates for each latent class. ◦ ◦ No interim latent class membership updating.

 Combine Method 1 and 3, is a sum of the weighted KL information based on each class-specific ability estimate makes use of all possible sources of information ◦ ◦ Only appropriate for use when the same latent trait us measured across the two classes.

 12 Item selection methods

 Memberships: 2; 5000 examinees for each class.  Four item pools, each with 500 items.  Mixing proportion: 50% for both latent classes.  Test length: 20-item Large item separationSmall item separation Large ability separationPool 1Pool 2 Small ability separationPool 3Pool 4

 Ability estimation ◦ For Method 1 & 2: a single ability estimate across classes.  Administration of item  estimated a latent class membership  estimated ability parameter.  Sequentially administered item and updated latent class membership and ability parameter. ◦ For Method 3 & 4: class-specific ability estimates.  Administration of item  estimated class-specific ability parameters.  Sequentially administered item and updated ability parameters.  The latent class membership only estimated when the last item was administered.

The distribution of the converged posterior classification decisions as a function if item sequence (5-20) in the CAT administration. The classification became stabilized or converged for more than 70% of the examinees after administration of the first five items.

The number of examinees whose classification converged at Item 5 was smaller than that for Pool 1, due to less KL information provided by Pool 2. All alternatives under Method 2 required fewer items to produce stable classification decisions for a majority of the examinees.

 If more than two latent classes involve in the test, are these KL methods still workable?  To consider mixture model in computerized classification test.  Why the random item selection yielded significantly the most accurate estimates of person ability, compared to the proposed four methods.  The speedness behavior is a kind of latent class. To add this condition by setting the only last several items with latent class model.