A Different Way to Think About Measurement Development:

Slides:



Advertisements
Similar presentations
DIF Analysis Galina Larina of March, 2012 University of Ostrava.
Advertisements

Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
LOGO One of the easiest to use Software: Winsteps
Logistic Regression Psy 524 Ainsworth.
Abstract Two studies examined the internal and external validity of three school climate scales on the School Climate Bullying Survey (SCBS), a self–report.
Item Response Theory in Health Measurement
Introduction to Item Response Theory
AN OVERVIEW OF THE FAMILY OF RASCH MODELS Elena Kardanova
Models for Measuring. What do the models have in common? They are all cases of a general model. How are people responding? What are your intentions in.
Galina Larina of March, 2012 University of Ostrava
Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.
Overview of field trial analysis procedures National Research Coordinators Meeting Windsor, June 2008.
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
A Different Way to Think About Measurement Development: An Introduction to Item Response Theory (IRT) Joseph Olsen, Dean Busby, & Lena Chiu Jan 23, 2015.
1 IRT basics: Theory and parameter estimation Wayne C. Lee, David Chuah, Patrick Wadlington, Steve Stark, & Sasha Chernyshenko.
© UCLES 2013 Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Item Response Theory Psych 818 DeShon. IRT ● Typically used for 0,1 data (yes, no; correct, incorrect) – Set of probabilistic models that… – Describes.
Item Response Theory. What’s wrong with the old approach? Classical test theory –Sample dependent –Parallel test form issue Comparing examinee scores.
Measurement, Control, and Stability of Multiple Response Styles Using Reverse Coded Items Eric Tomlinson Daniel Bolt University of Wisconsin-Madison Ideas.
STRONG TRUE SCORE THEORY- IRT LECTURE 12 EPSY 625.
The ABC’s of Pattern Scoring Dr. Cornelia Orr. Slide 2 Vocabulary Measurement – Psychometrics is a type of measurement Classical test theory Item Response.
SAS PROC IRT July 20, 2015 RCMAR/EXPORT Methods Seminar 3-4pm Acknowledgements: - Karen L. Spritzer - NCI (1U2-CCA )
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
A COMPARISON METHOD OF EQUATING CLASSIC AND ITEM RESPONSE THEORY (IRT): A CASE OF IRANIAN STUDY IN THE UNIVERSITY ENTRANCE EXAM Ali Moghadamzadeh, Keyvan.
1 Differential Item Functioning in Mplus Summer School Week 2.
Measurement Models: Identification and Estimation James G. Anderson, Ph.D. Purdue University.
The ABC’s of Pattern Scoring
Explanatory Factor Analysis: Alpha and Omega Dominique Zephyr Applied Statistics Lab University of Kenctucky.
University of Ostrava Czech republic 26-31, March, 2012.
Item Factor Analysis Item Response Theory Beaujean Chapter 6.
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
Psychometric Evaluation of Questionnaire Design and Testing Workshop December , 10:00-11:30 am Wilshire Suite 710 DATA.
Item Response Theory in Health Measurement
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
ALISON BOWLING CONFIRMATORY FACTOR ANALYSIS. REVIEW OF EFA Exploratory Factor Analysis (EFA) Explores the data All measured variables are related to every.
Demonstration of SEM-based IRT in Mplus
The Invariance of the easyCBM® Mathematics Measures Across Educational Setting, Language, and Ethnic Groups Joseph F. Nese, Daniel Anderson, and Gerald.
CFA with Categorical Outcomes Psych DeShon.
An Introduction to Latent Curve Models
Nonparametric Statistics
FACTOR ANALYSIS & SPSS.
Looking at the both ‘ends’ of the social aptitude dimension
BINARY LOGISTIC REGRESSION
1University of Oklahoma 2Shaker Consulting
QDET2, Miami, FL, Hibiscus A
The University of Manchester
Structural Equation Modeling using MPlus
Evaluating Multi-Item Scales
Ron D. Hays GIM-HSR Friday Noon Seminar Series November 4, 2016
Correlation, Regression & Nested Models
Item Analysis: Classical and Beyond
Evaluating IRT Assumptions
Lecture 6 Structured Interviews and Instrument Design Part II:
Dr. Chin and Dr. Nettelhorst Winter 2018
Nonparametric Statistics
Interpretations of item thresholds for the partial credit model
The University of Manchester
Kozan, K. , & Richardson, J. C. (2014)
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Confirmatory factor analysis
Sean C. Wright Illinois Institute of Technology, APTMetrics
Data validation for use in SEM
Item Analysis: Classical and Beyond
Evaluating Multi-item Scales
Multitrait Scaling and IRT: Part I
Item Analysis: Classical and Beyond
Psychometric testing and validation (Multi-trait scaling and IRT)
Presentation transcript:

A Different Way to Think About Measurement Development: An Introduction to Item Response Theory (IRT) Joseph Olsen, Dean Busby, & Lena Chiu Jan 23, 2015

Content Introduction Item Response Models and Outcomes Software Packages Demonstration Additional Concepts References and Resources

Introduction IRT surfaced in the 1970s (originally called “latent trait models”) Became popular in the 1980s, and was adapted in ability tests like SAT and GRE Social scientists starting using IRT in the past decade. How it works:

Classical Test Theory (CTT) versus IRT Generally speaking, if you have continuous variables, you use CTT. When you have categorical (Dichotomous/polytoumous) variables, you use IRT. In personality and attitude assessment we’re more likely to use CTT. But IRT provides advantages, including item characteristics and item information curves. IRT provides more precise and accurate measures to model the latent trait. A well-build IRT model is more precise than CTT. (But you can mess up IRT just like you can mess up CTT) IRT is trying to achieve reliable measurement across the whole trait continuum from the lowest to the highest. That is usually not a consideration for CTT analyses.

IRT Models and Outcomes Item Difficulty: How difficult the item is. When conducting social science studies, sometimes people call it “item endorsability” . (Some items are more readily endorsed than others. You’re more likely to say yes or no on these items.) Item Discrimination: How strongly related the response on the item is on the underlying latent trait, or how well the item discriminates among participants located at different points on the latent continuum. Pseudo-change parameter: The probability of choosing a correct answer by chance.

IRT Models and Outcomes 3 Parameter Logistic (3PL) model: A model that contains all 3 parameters. Not usually used in social science scales. 2 Parameter Logistic (2PL) model: A model that estimates item difficulty and item discrimination (while pseudo-chance constrained to 0). 1 Parameter Logistic (1PL) model: A model that measures only item difficulty (and holds item discrimination constant across all items, while pseudo-chance constrained to 0). We compare the model fit indices to decide which model is the most appropriate to use.

Example for Deciding between 2PL and 1PL Models Syntax for Mplus 1PL model: Avoidance by T1v755* T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762 (1); Avoidance @ 1; Syntax for Mplus 2 PL model: Avoidance by T1v755* T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762; Avoidance @ 1; Latent Trait Model Type Log Likelihood -2 log likelihood Paremeters Chi-Square DF Significance Avoidance 1 PL Model -170020.989 340041.978 112 1325.554 7.000 < 0.001 2 PL Model -169358.212 338716.424 119  

Software Packages - MPLUS Mplus is capable of measuring basic IRT models. See demonstrations later on in this presentation. For more complex models, software designed just for IRT is required.

Software Packages - FlexMirt https://flexmirt.vpgcentral.com/

Software Packages – IRT Pro http://www.ssicentral.com/irt/

Demonstration: Graded Response Item Response Theory (IRT) Model for Avoidant Attachment Items

Sample and Measures The Avoidant Attachment Scale in RELATE. 6089 individuals that took READY and answered the Avoidant Attachment Scale questions.

Eight Items Measuring Avoidant Attachment 755. I find it relatively easy to get close to others. 756. I’m not very comfortable having to depend on other people. 757. I’m comfortable having others depend on me. 758. I don’t like people getting too close to me. 759. I’m somewhat uncomfortable being too close to others. 760. I find it difficult to trust others completely. 761. I’m nervous whenever anyone gets too close to me. 762. Others often want me to be more intimate than I feel comfortable being. Reverse Coded Items - 756, 758, 759, 760, 761, 762. Original Response Categories: 1 = Strongly Disagree; 2 = Disagree; 3 = Somewhat Disagree; 4 = Undecided; 5 = Somewhat Agree; 6 = Agree; 7 = Strongly Agree

Mplus Commands for Single-Factor EFA with Categorical Items Title: Single-Factor Exploratory Factor Analysis (EFA) Data: File is READY_attachment scale.dat; Variable: Names are Gender Culture T1vAge T1v755 T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762 T1v763 T1v764 T1v765 T1v766 T1v767 T1v768 T1v769 T1v770 T1v771; MISSING ARE ALL (-9999); CATEGORICAL ARE ALL; USEVARIABLES ARE T1v755 T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762; ANALYSIS: ESTIMATOR IS ML; TYPE IS EFA 1 1; PLOT: TYPE IS PLOT2;

Establishing Construct Unidimensionality: Scree Plot for 8 Avoidant Attachment Items Categorical Exploratory Factor Analysis Eigenvalues: 4.085, .897, .796, .656, .544, .470, .306, .246

Mplus Commands for Single-Factor CFA with Categorical Items Title: Single-Factor Categorical Confirmatory Factor Analysis(CFA)- Equivalent to a 2-Parameter Logistic (2PL) Graded Response Model Data: File is READY_attachment scale.dat; Variable: Names are Gender Culture T1vAge T1v755 T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762 T1v763 T1v764 T1v765 T1v766 T1v767 T1v768 T1v769 T1v770 T1v771; MISSING ARE ALL (-9999) ; CATEGORICAL ARE ALL; USEVARIABLES ARE T1v755 T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762; ANALYSIS: ESTIMATOR IS ML; MODEL: avoid BY T1v755* T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762; avoid@1; PLOT: TYPE IS PLOT2; OUTPUT: STDYX;

EFA and Standardized CFA Factor Loadings with Maximum Likelihood (Logistic) Estimation for Categorical Items Factor Loadings EFA CFA .642 .646 Item 755. I find it relatively easy to get close to others. .489 .495 Item 756. I’m not very comfortable having to depend on other people. .402 .405 Item 757. I’m comfortable having others depend on me. .850 .856 Item 758. I don’t like people getting too close to me. .834 .844 Item 759. I’m somewhat uncomfortable being too close to others. .625 .630 Item 760. I find it difficult to trust others completely. .837 .848 Item 761. I’m nervous whenever anyone gets too close to me. .576 .582 Item 762. Others often want me to be more intimate than I feel comfortable being.

Item Characteristic Curves: Category Usage For Items 755 and 756 756. I’m not very comfortable having to depend on other people. (reversed) 755. I find it relatively easy to get close to others.

Item Characteristic Curves: Category Usage For Items 757 and 758 757. I’m comfortable having others depend on me. 758. I don’t like people getting too close to me. (reversed)

Item Characteristic Curves: Category Usage For Items 759 and 760 759. I’m somewhat uncomfortable being too close to others. (reversed) 760. I find it difficult to trust others completely. (reversed)

Item Characteristic Curves: Category Usage For Items 761 and 762 761. I’m nervous whenever anyone gets too close to me. (reversed) 761. Others often want me to be more intimate than I feel comfortable being. (reversed)

Item and Test Information Curves for the Avoidant Attachment Items (Items 755-762) Item Information Curves Test (Total) Information Curve

Partial Total Information Curves for Two Sets of Items Items 755, 756, 757, 760, and 762 Items 758, 759, 761

Quick Comparison between CTT and IRT Models and Output

CTT Reliability Test (Cronbach’s Alpha=0.833)

CTT Confirmatory Factor Analysis MODEL FIT INFORMATION Akaike (AIC) 163581.316 Bayesian (BIC) 163742.454 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.087 Probability RMSEA <= .05 0.000 CFI 0.944 TLI 0.921 STDYX Standardization Two-Tailed Estimate S.E. Est./S.E. P-Value ATTACHME BY T1V755 0.577 0.010 60.529 0.000 T1V756 0.473 0.011 43.298 0.000 T1V757 0.337 0.012 27.213 0.000 T1V758 0.808 0.006 139.126 0.000 T1V759 0.791 0.006 129.527 0.000 T1V760 0.598 0.009 64.402 0.000 T1V761 0.802 0.006 135.732 0.000 T1V762 0.538 0.010 53.443 0.000

755. I find it relatively easy to get close to others. 756. I’m not very comfortable having to depend on other people. 757. I’m comfortable having others depend on me. 760. I find it difficult to trust others completely. 762. Others often want me to be more intimate than I feel comfortable being. 758. I don’t like people getting too close to me. 759. I’m somewhat uncomfortable being too close to others. 761. I’m nervous whenever anyone gets too close to me.

Additional Information: More Concept Introductions

Item Response Theory Models for Dichotomous and Polytomous items Introduction to the Graded Response Model

The Rasch Model Threshold parameterization (IRTPRO, FlexMirt): 𝑙𝑛 𝑃(𝑥=1) 𝑃(𝑥=0) =𝑙𝑛 𝑃(𝑥=1) 1−𝑃(𝑥=1) =𝜃− 𝛽 𝑗 𝜃 is the latent trait 𝛽 𝑗 is the estimated difficulty of item j 𝜎 𝜃 2 is the estimated variance of the latent trait The expected value (mean) of the latent trait is 0 Intercept parameterization (Mplus): 𝑙𝑛 𝑃(𝑥=1) 1−𝑃(𝑥=1) =𝜃+ 𝛽 𝑗 ∗ 𝛽 𝑗 ∗ =− 𝛽 𝑗 Probability model: 𝑃 𝑥=1 𝜃 = exp⁡(𝜃− 𝛽 𝑗 ) 1+exp⁡(𝜃− 𝛽 𝑗 )

Logistic Item Characteristic Curves for Five Equally Discriminating Items Differing only in Difficulty

The One Parameter Logistic (1PL) Model Threshold parameterization: 𝑙𝑛 𝑃(𝑥=1) 𝑃(𝑥=0) =𝑙𝑛 𝑃(𝑥=1) 1−𝑃(𝑥=1) =𝛼(𝜃− 𝛽 𝑗 ) 𝜃 is the latent trait 𝛽 𝑗 is the difficulty of item j 𝛼 is a common discrimination parameter for all items The estimated variance of the latent trait is fixed at 1 The expected value of the latent trait is 0 Intercept parameterization: 𝑙𝑛 𝑃(𝑥=1) 1−𝑃(𝑥=1) =𝛼𝜃+ 𝛽 𝑗 ∗ 𝛽 𝑗 ∗ =− 𝛽 𝑗 /𝛼

The Two Parameter Logistic (2PL) Model Threshold parameterization: 𝑙𝑛 𝑃(𝑥=1) 1−𝑃(𝑥=1) = 𝛼 𝑗 (𝜃− 𝛽 𝑗 ) 𝛼 𝑗 is an item-specific discrimination parameter The estimated variance of the latent trait is fixed at 1 Intercept parameterization: 𝑙𝑛 𝑃(𝑥=1) 1−𝑃(𝑥=1) = 𝛼 𝑗 𝜃+ 𝛽 𝑗 ∗ 𝛽 𝑗 ∗ = −𝛽 𝑗 / 𝛼 𝑗 Probability model: 𝑃 𝑥=1 𝜃 = exp⁡ [𝛼 𝑗 (𝜃− 𝛽 𝑗 )] 1+exp⁡ [𝛼 𝑗 (𝜃− 𝛽 𝑗 )] =𝜳[ 𝛼 𝑗 𝜃− 𝛽 𝑗 ]

Item Characteristic Curves for Five Equally Difficult Items Differing only in their Discrimination Parameters

Parameter Constraints for Selected Dichotomous Item IRT Models Rasch: 𝛼 1 = 𝛼 2 = 𝛼 3 =1 One Parameter Logistic (1PL): 𝛼 1 = 𝛼 2 = 𝛼 3 𝜎 𝜃 2 =1 Two Parameter Logistic (2PL): Confirmatory Factor Analysis (CFA) 𝛼 1 =1 Effect-coded 2PL 𝛼 1 + 𝛼 2 + 𝛼 3 =𝐽=3 Average slope is 1 All other parameters are freely estimated Latent Trait Variance Intercepts/ Thresholds Loadings item1 𝛽 1 ∗ 𝜎 𝜃 2 𝛼 1 item2 𝛼 2 𝛽 2 ∗ 𝜃 1 𝛼 3 item3 𝛽 3 ∗

Polytomous (Cumulative Probability) Logits for Dichotomous and Polytomous (Graded Response) Logistic IRT Models Dichotomous 1 Step1 F S Polytomous (Cumulative Probability) 2 3 Step2 Step3 Dichotomous 𝑙𝑛 𝑃(𝑥=1) 𝑃(𝑥=0) Polytomous (Graded Response Model) Cumulative Probability 𝑙𝑛 𝑃(𝑥≥𝑘) 𝑃(𝑥<𝑘)

The Graded Response Model (GRM) Threshold parameterization (IRTPRO, FlexMirt): 𝑙𝑛 𝑃(𝑥≥𝑘) 𝑃(𝑥<𝑘) = 𝛼 𝑗 𝜃− 𝛽 𝑗𝑘 𝛼 𝑗 is the estimated discrimination parameter for item j 𝛽 𝑗𝑘 is the estimated category boundary threshold between categories k and k-1 for item j The estimated variance of the latent trait (𝜃) is fixed at 1 Intercept parameterization (Mplus): 𝑙𝑛 𝑃(𝑥≥𝑘) 𝑃(𝑥<𝑘) = 𝛼 𝑗 𝜃+ 𝛽 𝑗𝑘 ∗ 𝛽 𝑗𝑘 ∗ = −𝛽 𝑗𝑘 / 𝛼 𝑗 𝛼 𝑗 is the estimated factor loading for item j 𝛽 𝑗𝑘 ∗ is the estimated (rescaled and sign-reversed) category boundary threshold between categories k and k-1 for item j The estimated variance of the latent variable (𝜃) is fixed at 1

The Graded Response Model with a Common Discrimination Parameter Threshold parameterization: 𝑙𝑛 𝑃(𝑥≥𝑘) 𝑃(𝑥<𝑘) =𝛼(𝜃− 𝛽 𝑗𝑘 ) 𝛼 is a common discrimination parameter for all items The estimated variance of the latent trait is fixed at 1 Intercept parameterization: 𝑙𝑛 𝑃(𝑥≥𝑘) 𝑃(𝑥<𝑘) =𝛼𝜃+ 𝛽 𝑗𝑘 ∗ 𝛽 𝑗𝑘 ∗ = −𝛽 𝑗𝑘 /𝛼

The Graded Response “Rasch” Model Graded Response Rasch threshold parameterization: 𝑙𝑛 𝑃(𝑥≥𝑘) 𝑃(𝑥<𝑘) =𝜃− 𝛽 𝑗𝑘 𝜎 𝜃 2 is the estimated variance of the latent trait Graded Response Rasch intercept parameterization: 𝑙𝑛 𝑃(𝑥≥𝑘) 𝑃(𝑥<𝑘) =𝜃+ 𝛽 𝑗𝑘 ∗ 𝛽 𝑗𝑘 ∗ = −𝛽 𝑗𝑘

Model Constraints for the Graded Response Model item1 Thresholds Graded Response Rasch Model: 𝛼 1 = 𝛼 2 = 𝛼 3 =1 Graded Response Model with a Common Discrimination Parameter 𝛼 1 = 𝛼 2 = 𝛼 3 = 𝛼 𝜎 𝜃 2 =1 Traditional Graded Response Model CFA Graded Response Model 𝛼 1 =1 Effect-coded Graded Response Model 𝛼 1 + 𝛼 2 + 𝛼 3 =3 𝛽 11 ∗ 𝛽 12 ∗ 1 Loadings 𝛽 13 ∗ 𝜎 𝜃 2 𝛼 1 item2 𝛽 21 ∗ 𝜃 𝛼 2 𝛽 22 ∗ 1 𝛽 23 ∗ 𝛼 3 item3 𝛽 31 ∗ 𝛽 32 ∗ 1 𝛽 33 ∗

Estimating the Graded Response Model as Constrained Item Factor Analysis models with MPlus Graded Response Rasch model Graded Response Model with a Common Discrimination Parameter Traditional Graded Response Model Graded Response model with a reference indicator (Mplus default) Effect coded Graded Response model MODEL: f by item1-item7@1; ! Fix all of the loadings at 1 f*; ! Estimate the latent trait variance MODEL: f by item1-item7* (a); ! Estimate a common factor loading f@1; ! Fix the latent trait variance at 1 MODEL: f by item1-item7*; ! Freely estimate all of the loadings f@1; ! Fix the latent trait variance at 1 MODEL: f by item1-item7; ! Fix the loading for the first item at 1 f*; ! Estimate the latent trait variance MODEL: f by item1-item7* (a1-a7); ! Estimate and label the loadings f*; ! Estimate the latent trait variance MODEL CONSTRAINT: ! Constrain the loadings to average 1 a1=7-a2-a3-a4-a5-a6-a7; !

Graded Response and Generalized Partial Credit Logistic IRT Models for Polytomous Data (Graded Response Model) Cumulative Probability Logits 1 2 3 Step1 F S Step2 Step3 (Generalized Partial Credit Model) Adjacent Category Logits 1 2 3 Step1 F S Step2 Step3 𝑙𝑛 𝑃(𝑥≥𝑘) 𝑃(𝑥<𝑘) 𝑙𝑛 𝑃(𝑥=𝑘) 𝑃(𝑥=𝑘−1)

References and Resources de Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Press,

References and Resources: Joint Committee on Standards for Educational and Psychological Testing of the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational ResearchAssociation. http://www.apa.org/science/programs/testing/standards.aspx

References and Resources: