A Different Way to Think About Measurement Development:

A Different Way to Think About Measurement Development:
An Introduction to Item Response Theory (IRT) Joseph Olsen, Dean Busby, & Lena Chiu Jan 23, 2015

Content Introduction Item Response Models and Outcomes
Software Packages Demonstration Additional Concepts References and Resources

Introduction IRT surfaced in the 1970s (originally called “latent trait models”) Became popular in the 1980s, and was adapted in ability tests like SAT and GRE Social scientists starting using IRT in the past decade. How it works:

Classical Test Theory (CTT) versus IRT
Generally speaking, if you have continuous variables, you use CTT. When you have categorical (Dichotomous/polytoumous) variables, you use IRT. In personality and attitude assessment we’re more likely to use CTT. But IRT provides advantages, including item characteristics and item information curves. IRT provides more precise and accurate measures to model the latent trait. A well-build IRT model is more precise than CTT. (But you can mess up IRT just like you can mess up CTT) IRT is trying to achieve reliable measurement across the whole trait continuum from the lowest to the highest. That is usually not a consideration for CTT analyses.

IRT Models and Outcomes
Item Difficulty: How difficult the item is. When conducting social science studies, sometimes people call it “item endorsability” . (Some items are more readily endorsed than others. You’re more likely to say yes or no on these items.) Item Discrimination: How strongly related the response on the item is on the underlying latent trait, or how well the item discriminates among participants located at different points on the latent continuum. Pseudo-change parameter: The probability of choosing a correct answer by chance.

IRT Models and Outcomes
3 Parameter Logistic (3PL) model: A model that contains all 3 parameters. Not usually used in social science scales. 2 Parameter Logistic (2PL) model: A model that estimates item difficulty and item discrimination (while pseudo-chance constrained to 0). 1 Parameter Logistic (1PL) model: A model that measures only item difficulty (and holds item discrimination constant across all items, while pseudo-chance constrained to 0). We compare the model fit indices to decide which model is the most appropriate to use.

Example for Deciding between 2PL and 1PL Models
Syntax for Mplus 1PL model: Avoidance by T1v755* T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762 (1); 1; Syntax for Mplus 2 PL model: Avoidance by T1v755* T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762; 1; Latent Trait Model Type Log Likelihood -2 log likelihood Paremeters Chi-Square DF Significance Avoidance 1 PL Model 112 7.000 < 0.001 2 PL Model 119

Software Packages - MPLUS
Mplus is capable of measuring basic IRT models. See demonstrations later on in this presentation. For more complex models, software designed just for IRT is required.

Software Packages - FlexMirt

Software Packages – IRT Pro

Demonstration: Graded Response Item Response Theory (IRT) Model for Avoidant Attachment Items

Sample and Measures The Avoidant Attachment Scale in RELATE.
6089 individuals that took READY and answered the Avoidant Attachment Scale questions.

Eight Items Measuring Avoidant Attachment
755. I find it relatively easy to get close to others. 756. I’m not very comfortable having to depend on other people. 757. I’m comfortable having others depend on me. 758. I don’t like people getting too close to me. 759. I’m somewhat uncomfortable being too close to others. 760. I find it difficult to trust others completely. 761. I’m nervous whenever anyone gets too close to me. 762. Others often want me to be more intimate than I feel comfortable being. Reverse Coded Items - 756, 758, 759, 760, 761, 762. Original Response Categories: 1 = Strongly Disagree; 2 = Disagree; 3 = Somewhat Disagree; 4 = Undecided; 5 = Somewhat Agree; 6 = Agree; 7 = Strongly Agree

Mplus Commands for Single-Factor EFA with Categorical Items
Title: Single-Factor Exploratory Factor Analysis (EFA) Data: File is READY_attachment scale.dat; Variable: Names are Gender Culture T1vAge T1v755 T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762 T1v763 T1v764 T1v765 T1v766 T1v767 T1v768 T1v769 T1v770 T1v771; MISSING ARE ALL (-9999); CATEGORICAL ARE ALL; USEVARIABLES ARE T1v755 T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762; ANALYSIS: ESTIMATOR IS ML; TYPE IS EFA 1 1; PLOT: TYPE IS PLOT2;

Establishing Construct Unidimensionality: Scree Plot for 8 Avoidant Attachment Items
Categorical Exploratory Factor Analysis Eigenvalues: 4.085, .897, .796, .656, .544, .470, .306, .246

Mplus Commands for Single-Factor CFA with Categorical Items
Title: Single-Factor Categorical Confirmatory Factor Analysis(CFA)- Equivalent to a 2-Parameter Logistic (2PL) Graded Response Model Data: File is READY_attachment scale.dat; Variable: Names are Gender Culture T1vAge T1v755 T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762 T1v763 T1v764 T1v765 T1v766 T1v767 T1v768 T1v769 T1v770 T1v771; MISSING ARE ALL (-9999) ; CATEGORICAL ARE ALL; USEVARIABLES ARE T1v755 T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762; ANALYSIS: ESTIMATOR IS ML; MODEL: avoid BY T1v755* T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762; PLOT: TYPE IS PLOT2; OUTPUT: STDYX;

EFA and Standardized CFA Factor Loadings with Maximum Likelihood (Logistic) Estimation for Categorical Items Factor Loadings EFA CFA Item 755. I find it relatively easy to get close to others. Item 756. I’m not very comfortable having to depend on other people. Item 757. I’m comfortable having others depend on me. Item 758. I don’t like people getting too close to me. Item 759. I’m somewhat uncomfortable being too close to others. Item 760. I find it difficult to trust others completely. Item 761. I’m nervous whenever anyone gets too close to me. Item 762. Others often want me to be more intimate than I feel comfortable being.

Item Characteristic Curves: Category Usage For Items 755 and 756
756. I’m not very comfortable having to depend on other people. (reversed) 755. I find it relatively easy to get close to others.

757. I’m comfortable having others depend on me. 758. I don’t like people getting too close to me. (reversed)

759. I’m somewhat uncomfortable being too close to others. (reversed) 760. I find it difficult to trust others completely. (reversed)

761. I’m nervous whenever anyone gets too close to me. (reversed) 761. Others often want me to be more intimate than I feel comfortable being. (reversed)

Item and Test Information Curves for the Avoidant Attachment Items (Items 755-762)
Item Information Curves Test (Total) Information Curve

Partial Total Information Curves for Two Sets of Items
Items 755, 756, 757, 760, and 762 Items 758, 759, 761

Quick Comparison between CTT and IRT Models and Output

CTT Reliability Test (Cronbach’s Alpha=0.833)

CTT Confirmatory Factor Analysis
MODEL FIT INFORMATION Akaike (AIC) Bayesian (BIC) RMSEA (Root Mean Square Error Of Approximation) Estimate Probability RMSEA <= CFI TLI STDYX Standardization Two-Tailed Estimate S.E Est./S.E. P-Value ATTACHME BY T1V T1V T1V T1V T1V T1V T1V T1V

755. I find it relatively easy to get close to others.
756. I’m not very comfortable having to depend on other people. 757. I’m comfortable having others depend on me. 760. I find it difficult to trust others completely. 762. Others often want me to be more intimate than I feel comfortable being. 758. I don’t like people getting too close to me. 759. I’m somewhat uncomfortable being too close to others. 761. I’m nervous whenever anyone gets too close to me.

Additional Information: More Concept Introductions

Item Response Theory Models for Dichotomous and Polytomous items
Introduction to the Graded Response Model

The Rasch Model Threshold parameterization (IRTPRO, FlexMirt):
𝑙𝑛 𝑃(𝑥=1) 𝑃(𝑥=0) =𝑙𝑛 𝑃(𝑥=1) 1−𝑃(𝑥=1) =𝜃− 𝛽 𝑗 𝜃 is the latent trait 𝛽 𝑗 is the estimated difficulty of item j 𝜎 𝜃 2 is the estimated variance of the latent trait The expected value (mean) of the latent trait is 0 Intercept parameterization (Mplus): 𝑙𝑛 𝑃(𝑥=1) 1−𝑃(𝑥=1) =𝜃+ 𝛽 𝑗 ∗ 𝛽 𝑗 ∗ =− 𝛽 𝑗 Probability model: 𝑃 𝑥=1 𝜃 = exp⁡(𝜃− 𝛽 𝑗 ) 1+exp⁡(𝜃− 𝛽 𝑗 )

Logistic Item Characteristic Curves for Five Equally Discriminating Items Differing only in Difficulty

The One Parameter Logistic (1PL) Model
Threshold parameterization: 𝑙𝑛 𝑃(𝑥=1) 𝑃(𝑥=0) =𝑙𝑛 𝑃(𝑥=1) 1−𝑃(𝑥=1) =𝛼(𝜃− 𝛽 𝑗 ) 𝜃 is the latent trait 𝛽 𝑗 is the difficulty of item j 𝛼 is a common discrimination parameter for all items The estimated variance of the latent trait is fixed at 1 The expected value of the latent trait is 0 Intercept parameterization: 𝑙𝑛 𝑃(𝑥=1) 1−𝑃(𝑥=1) =𝛼𝜃+ 𝛽 𝑗 ∗ 𝛽 𝑗 ∗ =− 𝛽 𝑗 /𝛼

The Two Parameter Logistic (2PL) Model
Threshold parameterization: 𝑙𝑛 𝑃(𝑥=1) 1−𝑃(𝑥=1) = 𝛼 𝑗 (𝜃− 𝛽 𝑗 ) 𝛼 𝑗 is an item-specific discrimination parameter The estimated variance of the latent trait is fixed at 1 Intercept parameterization: 𝑙𝑛 𝑃(𝑥=1) 1−𝑃(𝑥=1) = 𝛼 𝑗 𝜃+ 𝛽 𝑗 ∗ 𝛽 𝑗 ∗ = −𝛽 𝑗 / 𝛼 𝑗 Probability model: 𝑃 𝑥=1 𝜃 = exp⁡ [𝛼 𝑗 (𝜃− 𝛽 𝑗 )] 1+exp⁡ [𝛼 𝑗 (𝜃− 𝛽 𝑗 )] =𝜳[ 𝛼 𝑗 𝜃− 𝛽 𝑗 ]

Item Characteristic Curves for Five Equally Difficult Items Differing only in their Discrimination Parameters

Parameter Constraints for Selected Dichotomous Item IRT Models
Rasch: 𝛼 1 = 𝛼 2 = 𝛼 3 =1 One Parameter Logistic (1PL): 𝛼 1 = 𝛼 2 = 𝛼 3 𝜎 𝜃 2 =1 Two Parameter Logistic (2PL): Confirmatory Factor Analysis (CFA) 𝛼 1 =1 Effect-coded 2PL 𝛼 1 + 𝛼 2 + 𝛼 3 =𝐽=3 Average slope is 1 All other parameters are freely estimated Latent Trait Variance Intercepts/ Thresholds Loadings item1 𝛽 1 ∗ 𝜎 𝜃 2 𝛼 1 item2 𝛼 2 𝛽 2 ∗ 𝜃 1 𝛼 3 item3 𝛽 3 ∗

Polytomous (Cumulative Probability)
Logits for Dichotomous and Polytomous (Graded Response) Logistic IRT Models Dichotomous 1 Step1 F S Polytomous (Cumulative Probability) 2 3 Step2 Step3 Dichotomous 𝑙𝑛 𝑃(𝑥=1) 𝑃(𝑥=0) Polytomous (Graded Response Model) Cumulative Probability 𝑙𝑛 𝑃(𝑥≥𝑘) 𝑃(𝑥<𝑘)

The Graded Response Model (GRM)
Threshold parameterization (IRTPRO, FlexMirt): 𝑙𝑛 𝑃(𝑥≥𝑘) 𝑃(𝑥<𝑘) = 𝛼 𝑗 𝜃− 𝛽 𝑗𝑘 𝛼 𝑗 is the estimated discrimination parameter for item j 𝛽 𝑗𝑘 is the estimated category boundary threshold between categories k and k-1 for item j The estimated variance of the latent trait (𝜃) is fixed at 1 Intercept parameterization (Mplus): 𝑙𝑛 𝑃(𝑥≥𝑘) 𝑃(𝑥<𝑘) = 𝛼 𝑗 𝜃+ 𝛽 𝑗𝑘 ∗ 𝛽 𝑗𝑘 ∗ = −𝛽 𝑗𝑘 / 𝛼 𝑗 𝛼 𝑗 is the estimated factor loading for item j 𝛽 𝑗𝑘 ∗ is the estimated (rescaled and sign-reversed) category boundary threshold between categories k and k-1 for item j The estimated variance of the latent variable (𝜃) is fixed at 1

The Graded Response Model with a Common Discrimination Parameter
Threshold parameterization: 𝑙𝑛 𝑃(𝑥≥𝑘) 𝑃(𝑥<𝑘) =𝛼(𝜃− 𝛽 𝑗𝑘 ) 𝛼 is a common discrimination parameter for all items The estimated variance of the latent trait is fixed at 1 Intercept parameterization: 𝑙𝑛 𝑃(𝑥≥𝑘) 𝑃(𝑥<𝑘) =𝛼𝜃+ 𝛽 𝑗𝑘 ∗ 𝛽 𝑗𝑘 ∗ = −𝛽 𝑗𝑘 /𝛼

The Graded Response “Rasch” Model
Graded Response Rasch threshold parameterization: 𝑙𝑛 𝑃(𝑥≥𝑘) 𝑃(𝑥<𝑘) =𝜃− 𝛽 𝑗𝑘 𝜎 𝜃 2 is the estimated variance of the latent trait Graded Response Rasch intercept parameterization: 𝑙𝑛 𝑃(𝑥≥𝑘) 𝑃(𝑥<𝑘) =𝜃+ 𝛽 𝑗𝑘 ∗ 𝛽 𝑗𝑘 ∗ = −𝛽 𝑗𝑘

Model Constraints for the Graded Response Model
item1 Thresholds Graded Response Rasch Model: 𝛼 1 = 𝛼 2 = 𝛼 3 =1 Graded Response Model with a Common Discrimination Parameter 𝛼 1 = 𝛼 2 = 𝛼 3 = 𝛼 𝜎 𝜃 2 =1 Traditional Graded Response Model CFA Graded Response Model 𝛼 1 =1 Effect-coded Graded Response Model 𝛼 1 + 𝛼 2 + 𝛼 3 =3 𝛽 11 ∗ 𝛽 12 ∗ 1 Loadings 𝛽 13 ∗ 𝜎 𝜃 2 𝛼 1 item2 𝛽 21 ∗ 𝜃 𝛼 2 𝛽 22 ∗ 1 𝛽 23 ∗ 𝛼 3 item3 𝛽 31 ∗ 𝛽 32 ∗ 1 𝛽 33 ∗

Estimating the Graded Response Model as Constrained Item Factor Analysis models with MPlus
Graded Response Rasch model Graded Response Model with a Common Discrimination Parameter Traditional Graded Response Model Graded Response model with a reference indicator (Mplus default) Effect coded Graded Response model MODEL: f by ! Fix all of the loadings at 1 f*; ! Estimate the latent trait variance MODEL: f by item1-item7* (a); ! Estimate a common factor loading ! Fix the latent trait variance at 1 MODEL: f by item1-item7*; ! Freely estimate all of the loadings ! Fix the latent trait variance at 1 MODEL: f by item1-item7; ! Fix the loading for the first item at 1 f*; ! Estimate the latent trait variance MODEL: f by item1-item7* (a1-a7); ! Estimate and label the loadings f*; ! Estimate the latent trait variance MODEL CONSTRAINT: ! Constrain the loadings to average 1 a1=7-a2-a3-a4-a5-a6-a7; !

Graded Response and Generalized Partial Credit Logistic IRT Models for Polytomous Data
(Graded Response Model) Cumulative Probability Logits 1 2 3 Step1 F S Step2 Step3 (Generalized Partial Credit Model) Adjacent Category Logits 1 2 3 Step1 F S Step2 Step3 𝑙𝑛 𝑃(𝑥≥𝑘) 𝑃(𝑥<𝑘) 𝑙𝑛 𝑃(𝑥=𝑘) 𝑃(𝑥=𝑘−1)

References and Resources
de Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Press,

References and Resources:
Joint Committee on Standards for Educational and Psychological Testing of the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational ResearchAssociation.

References and Resources:

A Different Way to Think About Measurement Development:

Similar presentations

Presentation on theme: "A Different Way to Think About Measurement Development:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Different Way to Think About Measurement Development:

Similar presentations

Presentation on theme: "A Different Way to Think About Measurement Development:"— Presentation transcript:

Similar presentations

About project

Feedback