A Cognitive Diagnosis Model for Cognitively-Based Multiple-Choice Options Jimmy de la Torre Department of Educational Psychology Rutgers, The State University.

Slides:



Advertisements
Similar presentations
An Introduction to Test Construction
Advertisements

Test Development.
Topic 5: Common CDMs. In addition to general models for cognitive diagnosis, there exists several specific CDMs in the literature These CDMs have been.
ASSESSING RESPONSIVENESS OF HEALTH MEASUREMENTS. Link validity & reliability testing to purpose of the measure Some examples: In a diagnostic instrument,
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
1 Content-based Interpretations of Test Scores Michael Kane National Conference of Bar Examiners Maryland Assessment Research Center for Education Success.
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.
Skills Diagnosis with Latent Variable Models. Topic 1: A New Diagnostic Paradigm.
Exploring the Full-Information Bifactor Model in Vertical Scaling With Construct Shift Ying Li and Robert W. Lissitz.
EdTPA: Task 1 Support Module Mike Vitale Mark L’Esperance College of Education East Carolina University Introduction edTPA INTERDISCIPLINARY MODULE SERIES.
Bayesian Decision Theory
ITEM RESPONSE MODELING OF PRESENCE-SEVERITY ITEMS: APPLICATION TO MEASUREMENT OF PATIENT-REPORTED OUTCOMES Ying Liu and Jay Verkuilen.
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
Assessment: Reliability, Validity, and Absence of bias
Chapter 4 Validity.
VALIDITY.
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
Evaluating Hypotheses
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Uses of Language Tests.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Task analysis 1 © Copyright De Montfort University 1998 All Rights Reserved Task Analysis Preece et al Chapter 7.
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Analysis of Clustered and Longitudinal Data
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Item Response Theory. What’s wrong with the old approach? Classical test theory –Sample dependent –Parallel test form issue Comparing examinee scores.
The New England Common Assessment Program (NECAP) Alignment Study December 5, 2006.
Creating Assessments with English Language Learners in Mind In this module we will examine: Who are English Language Learners (ELL) and how are they identified?
Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI, LLUIS GARCIA-PUEYO,
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Lecture 8: Generalized Linear Models for Longitudinal Data.
Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,
Confidential and Proprietary. Copyright © 2010 Educational Testing Service. All rights reserved. 10/7/2015 A Model for Scaling, Linking, and Reporting.
Machine Learning CSE 681 CH2 - Supervised Learning.
 Closing the loop: Providing test developers with performance level descriptors so standard setters can do their job Amanda A. Wolkowitz Alpine Testing.
Teaching Today: An Introduction to Education 8th edition
UCLA Graduate School of Education & Information Studies National Center for Research on Evaluation, Standards, and Student Testing V3, 4/9/07 SIG: Technology.
Reliability & Validity
All Hands Meeting 2005 The Family of Reliability Coefficients Gregory G. Brown VASDHS/UCSD.
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
Uncertainty Management in Rule-based Expert Systems
Lecture by: Chris Ross Chapter 7: Teacher-Designed Strategies.
Question paper 1997.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
University of Ostrava Czech republic 26-31, March, 2012.
Assessing Responsiveness of Health Measurements Ian McDowell, INTA, Santiago, March 20, 2001.
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Classroom Assessment A Practical Guide for Educators by Craig A. Mertler Chapter 4 Overview of Assessment Techniques.
Chapter 6 - Standardized Measurement and Assessment
Writing A Review Sources Preliminary Primary Secondary.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
Item Response Theory Dan Mungas, Ph.D. Department of Neurology University of California, Davis.
Two Approaches to Estimation of Classification Accuracy Rate Under Item Response Theory Quinn N. Lathrop and Ying Cheng Assistant Professor Ph.D., University.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Statistical Innovations for Health and Educational Research
Classroom Assessment A Practical Guide for Educators by Craig A
Introduction to the Validation Phase
Data Analysis and Standard Setting
Preparing for the Verbal Reasoning Measure
National Conference on Student Assessment
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Assessment for Learning
Presentation transcript:

A Cognitive Diagnosis Model for Cognitively-Based Multiple-Choice Options Jimmy de la Torre Department of Educational Psychology Rutgers, The State University of New Jersey

All wrong answers are wrong; But some wrong answers are more wrong than others.

Introduction Assessments should educate and improve student performance, not merely audit it In other words, assessments should not only ascertain the status of learning, but also further learning Due to emphasis on accountability, more and more resources are allocated towards assessments that only audit learning Tests used to support school and system accountability do not provide diagnostic information about individual students

Tests based on unidimensional IRT models report single-valued scores that submerge any distinct skills These scores are useful in establishing relative order but not evaluation of students' specific strengths and weaknesses Cluster scores have been used, but these scores are unreliable and provide superficial information about the underlying processes Needed are assessments that can provide interpretative, diagnostic, highly informative, and potentially prescriptive information

Some psychometric models allow the merger of advances in cognitive and psychometric theories to provide inferences more relevant to learning These models are called cognitive diagnosis models (CDMs) CDMs are discrete latent variable models They are developed specifically for diagnosing the presence or absence of multiple fine- grained skills, processes or problem-solving strategies involved in an assessment

Fundamental difference between IRT and CDM: A fraction subtraction example IRT: performance is based on a unidimensional continuous latent trait Students with higher latent traits have higher probability of answering the question correctly

Fundamental difference between IRT and CDM: A fraction subtraction example IRT: performance is based on a unidimensional continuous latent trait Students with higher latent traits have higher probability of answering the question correctly CDM: performance is based on binary attribute vector Successful performance on the task requires a series of successful implementations of the attributes specified for the task

Required attributes: (1) Borrowing from whole (2) Basic fraction subtraction (3) Reducing Other attributes: (5) Converting whole to fraction (4) Separating whole from fraction

Denote the response and attribute vectors of examinee i by and Each attribute pattern is a unique latent class; thus, K attributes define latent classes Attribute specification for the items can be found in the Q-matrix, a J x K binary matrix DINA (Deterministic Input Noisy “And” gate) is a CDM model that can be used in modeling the distribution of given Background

In the DINA model where is the latent group classification of examinee i with respect to item j P(H|g) is the probability that examinees in group g will respond with h to item j In more conventional notation of the DINA = guessing, = slip

Of the various test formats, multiple-choice (MC) has been widely used for its ability to sample and accommodate diverse contents Typical CDM analyses of MC tests involve dichotomized scores (i.e., correct/incorrect) The approach ignores the diagnostic insights about student difficulties and alternative conceptions in the distractors Wrong answers can reveal both what students know and what they do not know

Purpose of the paper is to propose a two- component framework for maximizing the diagnostic value of MC assessments Component 1: Prescribes how MC options can be designed to contain more diagnostic information Component 2: Describes a CDM model that can exploit such information Viability (i.e., estimability, efficiency) of the proposed framework is evaluated using a simulation study

Component 1: Cognitively-Based MC Options For the MC format,, where each number represents a different option An option is coded or cognitively-based if it is constructed to correspond to some of the latent classes Each coded option has an attribute specification Attribute specifications for non-coded options are implicitly represented by the zero-vector

A Fraction Subtraction Example A) B) C) D)

Attributes Required for Each Option of Option (1) Borrowing from whole (2) Basic fraction subtraction (3) Reducing (4) Separating whole from fraction (5) Converting whole to fraction A) B) C) D)

The option with the largest number of required attributes is the key

Attributes Required for Each Option of Option (1) Borrowing from whole (2) Basic fraction subtraction (3) Reducing (4) Separating whole from fraction (5) Converting whole to fraction A) B) C) D)

The option with the largest number of required attributes is the key Distractors are created to reflect the type of responses students who lack one or more of the required attributes for the key are likely to give

Attributes Required for Each Option of Option (1) Borrowing from whole (2) Basic fraction subtraction (3) Reducing (4) Separating whole from fraction (5) Converting whole to fraction A) B) C) D)

The option with the largest number of required attributes is the key Distractors are created to reflect the type of responses students who lack one or more of the required attributes for the key are likely to give Knowledge states represented by the distractors should be in the subset of the knowledge state that corresponds to the key Number of latent classes under the proposed framework is equal to, the number of coded options plus 1

“0”

“0” “1”

“2” “3” “1”

“2”“4” “3” “1” “0”

Component 2: The MC-DINA Model Let be the Q-vector for option h of item j, and With respect to item j, examinee i is in group Probability of examinee i choosing option h of item j is

This is the DINA model extended to coded MC options, hence, MC-DINA model Each item has parameters Expected response for a group, say h, is its coded option h: “correct” response for group h MC-DINA model can still be used even if only the key is coded as long as the distractors are distinguished from each other The MC-DINA model is equivalent to the DINA model if no distinctions are made between the distractors

Option GroupABCD

Option GroupABCD 0P(A|0)

Option GroupABCD 0P(A|0)P(B|0)

Option GroupABCD 0P(A|0)P(B|0)P(C|0)

Option GroupABCD 0P(A|0)P(B|0)P(C|0)P(D|0)

Option GroupABCD 0P(A|0)P(B|0)P(C|0)P(D|0) 1P(A|1)P(B|1)P(C|1)P(D|1) 2 3 4

Option GroupABCD 0P(A|0)P(B|0)P(C|0)P(D|0) 1P(A|1)P(B|1)P(C|1)P(D|1) 2P(A|2)P(B|2)P(C|2)P(D|2) 3P(A|3)P(B|3)P(C|3)P(D|3) 4P(A|4)P(B|4)P(C|4)P(D|4)

Option GroupABCD

Option GroupABCD 0 1

Option GroupABCD 0P(A|0)P(B|0)P(C|0)P(D|0) 1P(A|1)P(B|1)P(C|1)P(D|1) DINA Model for Nominal Response N-DINA Model

Option 0 1 A B CDGroup

Option Group01 0 1

Option Group01 0P(0|0)P(1|0) 1P(0|1)P(1|1) Plain DINA Model P(1|0) – guessing parameter P(0|1) – slip parameter

, the marginalized likelihood of examinee i Estimation Like in IRT, JMLE of the MC-DINA model parameters can lead to inconsistent estimates Using MMLE, we maximize prior probability of

Like in IRT, JMLE of the MC-DINA model parameters can lead to inconsistent estimates Using MMLE, we maximize The estimator based on an EM algorithm is where is the expected number of examinees in group g choosing option h of item j Estimation

A Simulation Study Purpose: To investigate how – – well the item parameters and SE can be estimated – – accurately the attributes can be classified – – MC-DINA compares with the traditional DINA 1000 examinees, 30 items, 5 attributes Parameters: Number of replicates: 100

Required attribute per item: 1, 2 or 3 (10 each) Exhaustive hierarchically linear specification: – – One-attribute item – – Two-attribute item – – Three-attribute item

Results Bias, Mean and Empirical SE Across 30 Items

Bias, Mean and Empirical SE by Item Classification (True Probability: 0.25)

Bias, Mean and Empirical SE by Item Classification (True Probability: 0.82)

Bias, Mean and Empirical SE by Item Classification (True Probability: 0.06)

Review of Parameter Estimation Results Algorithm provides accurate estimates of the model parameters and SEs SE of does not depend on item type When, What factor affects the precision of ?, expected number of examinees in group g of item j

Illustration of the impact of Consider the following three items

Implications The differences in sample sizes in the latent groups account for the observed differences in the SEs of the parameter estimates This underscores the importance, not only of the overall sample size I, but also the expected numbers of examinees in the latent groups in determining the precision of the estimates

Attribute Classification Accuracy Percent of Attribute Correctly Classified

Summary and Conclusion There is an urgent need for assessments that provide interpretative, diagnostic, highly informative, and potentially prescriptive scores This type of scores can inform classroom instruction and learning With appropriate construction, MC items can be designed to be more diagnostically informative Diagnostic information in MC distractors can be harnessed using the MC-DINA

Parameters of the MC-DINA model can be accurately estimated MC-DINA attribute classification accuracy is dramatically better than the traditional DINA Caveat: This framework is only the psychometric aspect of cognitive diagnosis Development of cognitively diagnostic assessment is a multi-disciplinary endeavor requiring collaboration between experts from learning science, cognitive science, subject domains, didactics, psychometrics,...

More general version of the model (e.g., attribute specifications need not be linear, exhaustive nor hierarchical) Applications to traditional MC assessments Issues related to sample size – – Sample size needed for different numbers of items and attributes, and types of attribute specifications – – Trade-off between the number of coded options and sample size necessary for stable estimates – – Feasibility of some simplifying assumptions such as equiprobability in choosing non-expected responses Further considerations

That’s all folks!