Download presentation
Presentation is loading. Please wait.
Published byDominick Jennings Modified over 9 years ago
1
Item Response Theory Dan Mungas, Ph.D. Department of Neurology University of California, Davis
2
What is it? Why should anyone care?
3
IRT Basics
4
Item Response Theory - What Is It Modern approach to psychometric test development –Mathematical measurement theory –Associated numeric and computational methods Widely used in large scale educational, achievement, and aptitude testing More than 40 years of conceptual and methodological development
5
Item Response Theory - Methods Dataset consists of rectangular table –rows correspond to subjects –columns correspond to items IRT applications simultaneously estimate subject ability and item parameters –iterative, maximum likelihood estimation algorithms
6
Physical Function Scale Hays, Morales & Reise (2000) ItemLIMITEDLIMITEDNOT LIMITED A LOTA LITTLEAT ALL Vigorous activities, running, Lifting heavy objects, Strenuous sports123 Climbing one flight123 Walking more than 1 mile123 Walking one block123 Bathing / dressing self123 Preparing meals / doing laundry123 Shopping123 Getting around inside home123 Feeding self123
7
Basic Data Structure SubjectItem1Item2Item3Item4 S1X 11 X 12 X 13 X 14 S2X 21 X 22 X 23 X 24 S3X 31 X 32 X 33 X 34 S4X 41 X 42 X 43 X 44
8
Item Response Theory - Basic Results Item parameters –difficulty –discrimination –correction for guessing most applicable for multiple choice items Subject Ability (in the psychometric sense) –Capacity to successfully respond to test items (or propensity to respond in a certain direction) –Net result of all genetic and environmental influences –Measured by scales composed of homogenous items Item difficulty and subject ability are on the same scale
9
Item Response Theory - Fundamental Assumptions Unidimensionality - items measure a homogenous, single domain Local independence - covariance among items is determined only by the latent dimension measured by the item set
10
IRT Models 1PL (Rasch) –Only Difficulty and Ability are estimated –Discrimination is assumed to be equal across items 2PL –Discrimination, Difficulty and Ability are estimated –Guessing is assumed to not have an effect 3PL –Discrimination, Difficulty, Guessing, and Ability are estimated (multiple choice items)
11
Item Response Theory - Invariance Properties Invariance requires that basic assumptions are met Item parameters are invariant across different samples –Within the range of overlap of distributions –Distributions of samples can differ Ability estimates are invariant across different item sets –Assumes that ability range of items spans ability range of subjects that is of interest
12
Item Response Theory - Outcomes Item-Level Results –Item Characteristic Curve (ICC) non-linear function relating ability to probability of correct response to item –Item Information Curve (IIC) non-linear function showing precision of measurement (reliability) at different ability points –Both curves are defined by the item parameters
13
Item Characteristic Curves
14
Information Curves
16
Item Response Theory - Outcomes Test-Level Results –Test Characteristic Curve (TCC) non-linear function relating ability to expected total test score –Test Information Curve (TIC) non-linear function showing precision of measurement (reliability) at different ability points
17
Test Characteristic Curve Mini-Mental State Examination
18
Test Information Curves for Mattis Dementia Rating Scale and IRT Derived Scales
19
Why Do We Care - Applications of IRT in Health Care Settings Refined scoring of tests Characterization of psychometric properties of existing tests Construction of new tests
20
Test Scoring IRT permits refined scoring of items that allows for differential weighting of items based on their item parameters
21
Physical Function Scale Hays, Morales & Reise (2000) ItemLIMITEDLIMITEDNOT LIMITED A LOTA LITTLEAT ALL Vigorous activities, running, Lifting heavy objects, Strenuous sports123 Climbing one flight123 Walking more than 1 mile123 Walking one block123 Bathing / dressing self123 Preparing meals / doing laundry123 Shopping123 Getting around inside home123 Feeding self123
22
How to Score Test Simple approach: there are numbers that will be circled; total these up, and there we have a score. But: should “limited a lot” for walking a mile receive the same weight as “limited a lot” in getting around inside the home? Should “limited a lot” for walking one block be twice as bad as “limited a little” for walking one block?
23
How IRT Can Help IRT provides us with a data-driven means of rational scoring for such measures Items that are more discriminating are given greater weight In practice, the simple sum score is often very good; improvement is at the margins
24
Description of Psychometric Properties The Test Information Curve (TIC) shows reliability that continuously varies by ability –Depicts ability levels associated with high and low reliability The standard error of measurement is directly related to information value (I( )) –SEM = 1 / sqrt(I( )) SEM and I( ) also have a direct correspondence to traditional r –r = 1 - 1/ I( )
25
I( ), SEM, r I( ) SEM (s.d. units)r 11.000.00 20.710.50 4 0.75 90.330.89 120.290.92 160.250.94 250.200.96 360.170.97
26
TICs for English and Spanish language Versions of Two Scales
27
Construction of New Scales Items can be selected to create scales with desired measurement properties Can be used for prospective test development Can be used to create new scales from existing tests/item pools
28
TICs from an Existing Global Cognition Scale and Re-Calibrated Existing Cognitive Tests
29
Principles of Scale Construction Information corresponds to assessment goals –Broad and flat TIC for longitudinal change measure in population with heterogenous ability –For selection or diagnostic test, peak at point of ability continuum where discrimination is most important
30
Other Issues In IRT Polytomous IRT models are available –Useful for ordinal (Likert) rating scales Each possible score of the item (minus 1) is treated like a separate item with a different difficulty parameter Information is greater for polytomous item than for the same item dichotomized at a cutpoint
31
Other Issues in IRT Applicable to broad range of content domains IRT certainly applies to cognitive abilities Also applies to other health outcomes –Quality of life –Physical function –Fatigue –Depression –Pain
32
Other Issues in IRT Differential Item Function - Test Bias IRT provides explicit methods to evaluate and quantify the extent to which items and tests have different measurement properties in different groups –e.g. racial and ethnic groups, linguistic groups, gender
33
English and Spanish Item Characteristic Curves for “Lamb/Cordero” Item
34
English and Spanish Item Characteristic Curves for “Stone/Piedra” Item
35
Challenges/ Limitations of IRT Large samples required for stable estimation –150-200 for 1PL –400-500 for 2PL –600-1000 for 3PL Analytic methods are labor intensive –There are a number of (expensive) applications readily available for IRT analyses –Evaluation of basic assumptions, identification of appropriate model, and systematic IRT analysis require considerable expertise and labor
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.