A Different Way to Think About Measurement Development: An Introduction to Item Response Theory (IRT) Joseph Olsen, Dean Busby, & Lena Chiu Jan 23, 2015.

Slides:



Advertisements
Similar presentations
Test Development.
Advertisements

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 23, 2012.
DIF Analysis Galina Larina of March, 2012 University of Ostrava.
Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
LOGO One of the easiest to use Software: Winsteps
Logistic Regression Psy 524 Ainsworth.
Abstract Two studies examined the internal and external validity of three school climate scales on the School Climate Bullying Survey (SCBS), a self–report.
Item Response Theory in Health Measurement
Mixture modelling of continuous variables. Mixture modelling So far we have dealt with mixture modelling for a selection of binary or ordinal variables.
Introduction to Item Response Theory
AN OVERVIEW OF THE FAMILY OF RASCH MODELS Elena Kardanova
Models for Measuring. What do the models have in common? They are all cases of a general model. How are people responding? What are your intentions in.
Galina Larina of March, 2012 University of Ostrava
Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.
Overview of field trial analysis procedures National Research Coordinators Meeting Windsor, June 2008.
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
How to evaluate the cross-cultural equivalence of single items Melanie Revilla, Willem Saris RECSM, UPF Zurich – 15/16 July.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
1 IRT basics: Theory and parameter estimation Wayne C. Lee, David Chuah, Patrick Wadlington, Steve Stark, & Sasha Chernyshenko.
Comparison of Reliability Measures under Factor Analysis and Item Response Theory —Ying Cheng , Ke-Hai Yuan , and Cheng Liu Presented by Zhu Jinxin.
© UCLES 2013 Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Item Response Theory Psych 818 DeShon. IRT ● Typically used for 0,1 data (yes, no; correct, incorrect) – Set of probabilistic models that… – Describes.
Item Response Theory for Survey Data Analysis EPSY 5245 Michael C. Rodriguez.
Item Response Theory. What’s wrong with the old approach? Classical test theory –Sample dependent –Parallel test form issue Comparing examinee scores.
Item response modeling of paired comparison and ranking data.
STRONG TRUE SCORE THEORY- IRT LECTURE 12 EPSY 625.
The emotional distress of children with cancer in China: An Item Response Analysis of C-Ped-PROMIS Anxiety and Depression Short Forms Yanyan Liu 1, Changrong.
Introduction Neuropsychological Symptoms Scale The Neuropsychological Symptoms Scale (NSS; Dean, 2010) was designed for use in the clinical interview to.
The ABC’s of Pattern Scoring Dr. Cornelia Orr. Slide 2 Vocabulary Measurement – Psychometrics is a type of measurement Classical test theory Item Response.
Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,
SAS PROC IRT July 20, 2015 RCMAR/EXPORT Methods Seminar 3-4pm Acknowledgements: - Karen L. Spritzer - NCI (1U2-CCA )
Social patterning in bed-sharing behaviour A longitudinal latent class analysis (LLCA)
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
A COMPARISON METHOD OF EQUATING CLASSIC AND ITEM RESPONSE THEORY (IRT): A CASE OF IRANIAN STUDY IN THE UNIVERSITY ENTRANCE EXAM Ali Moghadamzadeh, Keyvan.
1 DIF. 2 Winsteps: MFQ & DIF 3 Sample 2500 “boys” and 2500 “girls” All roughly 14 years old Data collected from ALSPAC hands-on clinic Short-form (13-item)
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
1 Differential Item Functioning in Mplus Summer School Week 2.
Measurement Models: Identification and Estimation James G. Anderson, Ph.D. Purdue University.
The ABC’s of Pattern Scoring
Explanatory Factor Analysis: Alpha and Omega Dominique Zephyr Applied Statistics Lab University of Kenctucky.
University of Ostrava Czech republic 26-31, March, 2012.
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire Design and Testing.
Item Factor Analysis Item Response Theory Beaujean Chapter 6.
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
Psychometric Evaluation of Questionnaire Design and Testing Workshop December , 10:00-11:30 am Wilshire Suite 710 DATA.
Item Response Theory in Health Measurement
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
ALISON BOWLING CONFIRMATORY FACTOR ANALYSIS. REVIEW OF EFA Exploratory Factor Analysis (EFA) Explores the data All measured variables are related to every.
Demonstration of SEM-based IRT in Mplus
CFA Model Revision Byrne Chapter 4 Brown Chapter 5.
Item Response Theory Dan Mungas, Ph.D. Department of Neurology University of California, Davis.
CJT 765: Structural Equation Modeling Class 9: Putting it All Together.
Lesson 2 Main Test Theories: The Classical Test Theory (CTT)
The Invariance of the easyCBM® Mathematics Measures Across Educational Setting, Language, and Ethnic Groups Joseph F. Nese, Daniel Anderson, and Gerald.
The University of Manchester
Structural Equation Modeling using MPlus
A Different Way to Think About Measurement Development:
Evaluating Multi-Item Scales
Item Analysis: Classical and Beyond
Lecture 6 Structured Interviews and Instrument Design Part II:
Dr. Chin and Dr. Nettelhorst Winter 2018
The University of Manchester
Kozan, K. , & Richardson, J. C. (2014)
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Sean C. Wright Illinois Institute of Technology, APTMetrics
Item Analysis: Classical and Beyond
Evaluating Multi-item Scales
Item Analysis: Classical and Beyond
Presentation transcript:

A Different Way to Think About Measurement Development: An Introduction to Item Response Theory (IRT) Joseph Olsen, Dean Busby, & Lena Chiu Jan 23, 2015

Content Introduction Item Response Models and Outcomes Software Packages Demonstration Additional Concepts References and Resources

Introduction IRT surfaced in the 1970s (originally called “latent trait models”) Became popular in the 1980s, and was adapted in ability tests like SAT and GRE Social scientists starting using IRT in the past decade. How it works:

Classical Test Theory (CTT) versus IRT Generally speaking, if you have continuous variables, you use CTT. When you have categorical (Dichotomous/polytoumous) variables, you use IRT. In personality and attitude assessment we’re more likely to use CTT. But IRT provides advantages, including item characteristics and item information curves. IRT provides more precise and accurate measures to model the latent trait. A well-build IRT model is more precise than CTT. (But you can mess up IRT just like you can mess up CTT) IRT is trying to achieve reliable measurement across the whole trait continuum from the lowest to the highest. That is usually not a consideration for CTT analyses.

IRT Models and Outcomes Item Difficulty: How difficult the item is. When conducting social science studies, sometimes people call it “item endorsability”. (Some items are more readily endorsed than others. You’re more likely to say yes or no on these items.) Item Discrimination: How strongly related the response on the item is on the underlying latent trait, or how well the item discriminates among participants located at different points on the latent continuum. Pseudo-change parameter: The probability of choosing a correct answer by chance.

IRT Models and Outcomes 3 Parameter Logistic (3PL) model: A model that contains all 3 parameters. Not usually used in social science scales. 2 Parameter Logistic (2PL) model: A model that estimates item difficulty and item discrimination (while pseudo-chance constrained to 0). 1 Parameter Logistic (1PL) model: A model that measures only item difficulty (and holds item discrimination constant across all items, while pseudo-chance constrained to 0). We compare the model fit indices to decide which model is the most appropriate to use.

Example for Deciding between 2PL and 1PL Models Syntax for Mplus 1PL model: Avoidance by T1v755* T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762 (1); 1; Syntax for Mplus 2 PL model: Avoidance by T1v755* T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762; 1; Latent TraitModel TypeLog Likelihood-2 log likelihoodParemetersChi-SquareDFSignificance Avoidance1 PL Model < Avoidance2 PL Model

Software Packages - MPLUS Mplus is capable of measuring basic IRT models. See demonstrations later on in this presentation. For more complex models, software designed just for IRT is required.

Software Packages - FlexMirt

Software Packages – IRT Pro

Demonstration: Graded Response Item Response Theory (IRT) Model for Avoidant Attachment Items

Sample and Measures The Avoidant Attachment Scale in RELATE individuals that took READY and answered the Avoidant Attachment Scale questions.

Eight Items Measuring Avoidant Attachment Items: 755. I find it relatively easy to get close to others I’m not very comfortable having to depend on other people I’m comfortable having others depend on me I don’t like people getting too close to me I’m somewhat uncomfortable being too close to others I find it difficult to trust others completely I’m nervous whenever anyone gets too close to me Others often want me to be more intimate than I feel comfortable being. Reverse Coded Items - 756, 758, 759, 760, 761, 762. Original Response Categories: 1 = Strongly Disagree; 2 = Disagree; 3 = Somewhat Disagree; 4 = Undecided; 5 = Somewhat Agree; 6 = Agree; 7 = Strongly Agree

Mplus Commands for Single-Factor EFA with Categorical Items Title: Single-Factor Exploratory Factor Analysis (EFA) Data: File is READY_attachment scale.dat; Variable: Names are Gender Culture T1vAge T1v755 T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762 T1v763 T1v764 T1v765 T1v766 T1v767 T1v768 T1v769 T1v770 T1v771; MISSING ARE ALL (-9999); CATEGORICAL ARE ALL; USEVARIABLES ARE T1v755 T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762; ANALYSIS: ESTIMATOR IS ML; TYPE IS EFA 1 1; PLOT: TYPE IS PLOT2;

Establishing Construct Unidimensionality: Scree Plot for 8 Avoidant Attachment Items Categorical Exploratory Factor Analysis Eigenvalues: 4.085,.897,.796,.656,.544,.470,.306,.246

Mplus Commands for Single-Factor CFA with Categorical Items Title: Single-Factor Categorical Confirmatory Factor Analysis(CFA)- Equivalent to a 2-Parameter Logistic (2PL) Graded Response Model Data: File is READY_attachment scale.dat; Variable: Names are Gender Culture T1vAge T1v755 T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762 T1v763 T1v764 T1v765 T1v766 T1v767 T1v768 T1v769 T1v770 T1v771; MISSING ARE ALL (-9999) ; CATEGORICAL ARE ALL; USEVARIABLES ARE T1v755 T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762; ANALYSIS: ESTIMATOR IS ML; MODEL: avoid BY T1v755* T1v756 T1v757 T1v758 T1v759 T1v760 T1v761 T1v762; PLOT: TYPE IS PLOT2; OUTPUT: STDYX;

EFA and Standardized CFA Factor Loadings with Maximum Likelihood (Logistic) Estimation for Categorical Items Factor Loadings EFACFA Item 755. I find it relatively easy to get close to others Item 756. I’m not very comfortable having to depend on other people Item 757. I’m comfortable having others depend on me Item 758. I don’t like people getting too close to me Item 759. I’m somewhat uncomfortable being too close to others Item 760. I find it difficult to trust others completely Item 761. I’m nervous whenever anyone gets too close to me Item 762. Others often want me to be more intimate than I feel comfortable being.

Item Characteristic Curves: Category Usage For Items 755 and I find it relatively easy to get close to others I’m not very comfortable having to depend on other people. (reversed)

Item Characteristic Curves: Category Usage For Items 757 and I’m comfortable having others depend on me I don’t like people getting too close to me. (reversed)

Item Characteristic Curves: Category Usage For Items 759 and I’m somewhat uncomfortable being too close to others. (reversed) 760. I find it difficult to trust others completely. (reversed)

Item Characteristic Curves: Category Usage For Items 761 and I’m nervous whenever anyone gets too close to me. (reversed) 761. Others often want me to be more intimate than I feel comfortable being. (reversed)

Item and Test Information Curves for the Avoidant Attachment Items (Items ) Item Information Curves Test (Total) Information Curve

Partial Total Information Curves for Two Sets of Items Items 755, 756, 757, 760, and 762 Items 758, 759, 761

Quick Comparison between CTT and IRT Models and Output

CTT Reliability Test (Cronbach’s Alpha=0.833)

CTT Confirmatory Factor Analysis MODEL FIT INFORMATION Akaike (AIC) Bayesian (BIC) RMSEA (Root Mean Square Error Of Approximation) Estimate Probability RMSEA <= CFI TLI STDYX Standardization Two-Tailed Estimate S.E. Est./S.E. P-Value ATTACHME BY T1V T1V T1V T1V T1V T1V T1V T1V

755. I find it relatively easy to get close to others I’m not very comfortable having to depend on other people I’m comfortable having others depend on me I find it difficult to trust others completely Others often want me to be more intimate than I feel comfortable being I don’t like people getting too close to me I’m somewhat uncomfortable being too close to others I’m nervous whenever anyone gets too close to me.

Additional Information: More Concept Introductions

Item Response Theory Models for Dichotomous and Polytomous items Introduction to the Graded Response Model

The Rasch Model

Logistic Item Characteristic Curves for Five Equally Discriminating Items Differing only in Difficulty

The One Parameter Logistic (1PL) Model

The Two Parameter Logistic (2PL) Model

Item Characteristic Curves for Five Equally Difficult Items Differing only in their Discrimination Parameters

item 1 item 2 item 3 1 Parameter Constraints for Selected Dichotomous Item IRT Models Intercepts/ Thresholds Loadings Latent Trait Variance

Dichotomous 01 Step1FS Polytomous (Cumulative Probability) 0123 Step1FSSS Step2FFSS Step3FFFS Logits for Dichotomous and Polytomous (Graded Response) Logistic IRT Models

The Graded Response Model (GRM)

The Graded Response Model with a Common Discrimination Parameter

The Graded Response “Rasch” Model

1 1 1 item1 item2 item3 Model Constraints for the Graded Response Model Thresholds Loadings

Estimating the Graded Response Model as Constrained Item Factor Analysis models with MPlus Graded Response Rasch model Graded Response Model with a Common Discrimination Parameter Traditional Graded Response Model Graded Response model with a reference indicator (Mplus default) Effect coded Graded Response model MODEL: f by ! Fix all of the loadings at 1 f*; ! Estimate the latent trait variance MODEL: f by item1-item7*; ! Freely estimate all of the loadings ! Fix the latent trait variance at 1 MODEL: f by item1-item7* (a); ! Estimate a common factor loading ! Fix the latent trait variance at 1 MODEL: f by item1-item7; ! Fix the loading for the first item at 1 f*; ! Estimate the latent trait variance MODEL: f by item1-item7* (a1-a7); ! Estimate and label the loadings f*; ! Estimate the latent trait variance MODEL CONSTRAINT: ! Constrain the loadings to average 1 a1=7-a2-a3-a4-a5-a6-a7; !

(Graded Response Model) Cumulative Probability Logits 0123 Step1FSSS Step2FFSS Step3FFFS Graded Response and Generalized Partial Credit Logistic IRT Models for Polytomous Data (Generalized Partial Credit Model) Adjacent Category Logits 0123 Step1FS Step2FS Step3FS

References and Resources de Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Press,

References and Resources: Joint Committee on Standards for Educational and Psychological Testing of the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational ResearchAssociation.

References and Resources: