1 IRT basics: Theory and parameter estimation Wayne C. Lee, David Chuah, Patrick Wadlington, Steve Stark, & Sasha Chernyshenko.

Slides:



Advertisements
Similar presentations
DIF Analysis Galina Larina of March, 2012 University of Ostrava.
Advertisements

Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
LOGO One of the easiest to use Software: Winsteps
Fit of Ideal-point and Dominance IRT Models to Simulated Data Chenwei Liao and Alan D Mead Illinois Institute of Technology.
Logistic Regression Psy 524 Ainsworth.
Test Equating Zhang Zhonghua Chinese University of Hong Kong.
Item Response Theory in Health Measurement
Introduction to Item Response Theory
IRT Equating Kolen & Brennan, IRT If data used fit the assumptions of the IRT model and good parameter estimates are obtained, we can estimate person.
AN OVERVIEW OF THE FAMILY OF RASCH MODELS Elena Kardanova
HSRP 734: Advanced Statistical Methods July 24, 2008.
Galina Larina of March, 2012 University of Ostrava
Overview of field trial analysis procedures National Research Coordinators Meeting Windsor, June 2008.
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Estimating Growth when Content Specifications Change: A Multidimensional IRT Approach Mark D. Reckase Tianli Li Michigan State University.
A Different Way to Think About Measurement Development: An Introduction to Item Response Theory (IRT) Joseph Olsen, Dean Busby, & Lena Chiu Jan 23, 2015.
© UCLES 2013 Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh.
Model Checking in the Proportional Hazard model
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Identification of Misfit Item Using IRT Models Dr Muhammad Naveed Khalid.
Item Response Theory Psych 818 DeShon. IRT ● Typically used for 0,1 data (yes, no; correct, incorrect) – Set of probabilistic models that… – Describes.
Item Response Theory. What’s wrong with the old approach? Classical test theory –Sample dependent –Parallel test form issue Comparing examinee scores.
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
DIFFERENTIAL ITEM FUNCTIONING AND COGNITIVE ASSESSMENT USING IRT-BASED METHODS Jeanne Teresi, Ed.D., Ph.D. Katja Ocepek-Welikson, M.Phil.
Part 2 DIF detection in STATA. Dif Detect - Stata Developed by Paul Crane et al, Washington University based on Ordinal logistic regression (Zumbo, 1999)
Introduction Neuropsychological Symptoms Scale The Neuropsychological Symptoms Scale (NSS; Dean, 2010) was designed for use in the clinical interview to.
SAS PROC IRT July 20, 2015 RCMAR/EXPORT Methods Seminar 3-4pm Acknowledgements: - Karen L. Spritzer - NCI (1U2-CCA )
Investigating Faking Using a Multilevel Logistic Regression Approach to Measuring Person Fit.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
1 Differential Item Functioning in Mplus Summer School Week 2.
1 EPSY 546: LECTURE 1 SUMMARY George Karabatsos. 2 REVIEW.
Slide 1 The Kleinbaum Sample Problem This problem comes from an example in the text: David G. Kleinbaum. Logistic Regression: A Self-Learning Text. New.
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
The ABC’s of Pattern Scoring
Examining Data. Constructing a variable 1. Assemble a set of items that might work together to define a construct/ variable. 2. Hypothesize the hierarchy.
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire Design and Testing.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Item Factor Analysis Item Response Theory Beaujean Chapter 6.
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Item Response Theory in Health Measurement
About GGum Zhao Shouying. General on IRT Item response theory models have become increasingly popular measurement tools in the past thirty-five years.
Item Parameter Estimation: Does WinBUGS Do Better Than BILOG-MG?
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
Using PARSCALE via Stata and Dan’s spreadsheet Laura Gibbons, PhD.
Demonstration of SEM-based IRT in Mplus
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
PARCC Field Test Study Comparability of High School Mathematics End-of- Course Assessments National Conference on Student Assessment San Diego June 2015.
Item Response Theory Dan Mungas, Ph.D. Department of Neurology University of California, Davis.
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire.
Two Approaches to Estimation of Classification Accuracy Rate Under Item Response Theory Quinn N. Lathrop and Ying Cheng Assistant Professor Ph.D., University.
Overview of Item Response Theory Ron D. Hays November 14, 2012 (8:10-8:30am) Geriatrics Society of America (GSA) Pre-Conference Workshop on Patient- Reported.
Lesson 2 Main Test Theories: The Classical Test Theory (CTT)
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Chapter 17 STRUCTURAL EQUATION MODELING. Structural Equation Modeling (SEM)  Relatively new statistical technique used to test theoretical or causal.
IRT Equating Kolen & Brennan, 2004 & 2014 EPSY
ARDHIAN SUSENO CHOIRUL RISA PRADANA P.
A Different Way to Think About Measurement Development:
Classical Test Theory Margaret Wu.
Item Analysis: Classical and Beyond
Practical Introduction to PARSCALE
Examining Data.
Item Analysis: Classical and Beyond
Multitrait Scaling and IRT: Part I
Item Analysis: Classical and Beyond
  Using the RUMM2030 outputs as feedback on learner performance in Communication in English for Adult learners Nthabeleng Lepota 13th SAAEA Conference.
Presentation transcript:

1 IRT basics: Theory and parameter estimation Wayne C. Lee, David Chuah, Patrick Wadlington, Steve Stark, & Sasha Chernyshenko

2 Overview How do I begin a set of IRT analyses? What do I need? Software Data What do I do? Input/ syntax files Examination of output On-line!

3 “Eye-ARE-What?” Item response theory (IRT) Set of probabilistic models that… Describes the relationship between a respondent’s magnitude on a construct (a.k.a. latent trait; e.g., extraversion, cognitive ability, affective commitment)… To his or her probability of a particular response to an individual item

4 But what does that buy you? Provides more information than classical test theory (CTT) Classical test statistics depend on the set of items and sample examined IRT modeling not dependent on sample examined Can examine item bias/ measurement equivalence and provide conditional standard errors of measurement

5 Before we begin… Data preparation Raw data must be recoded if necessary (negatively worded items must be reverse coded such that all items in the scale indicate a positive direction) Dichotomization (optional) Reducing multiple options into two separate values (0, 1; right, wrong)

6 Calibration and validation files Data is split into two separate files Calibration sample for estimating IRT parameters Validation sample for assessing the fit of the model to the data Data files for the programs that we will be discussing must be in ASCII/ text format

7 Investigating dimensionality The models presented make a common assumption of unidimensionality Hattie (1985) reviewed 30 techniques Some propose the ratio of the 1 st eigenvalue to the 2 nd eigenvalue (Lord, 1980) On-line we describe how to examine the eigenvalues following Principal Axis Factoring (PAF)

8 PAF and scree plots If the data are dichotomous, factor analyze tetrachoric correlations Assume continuum underlies item responses Dominant first factor

9 Two models presented The Three Parameter Logistic model (3PL) For dichotomous data E.g., cognitive ability tests Samejima's Graded Response model For polytomous data where options are ordered along a continuum E.g., Likert scales Common models among applied psychologists

10 The 3PL model Three parameters: a = item discrimination b = item extremity/ difficulty c = lower asymptote, “pseudo-guessing” Theta refers to the latent trait

11 Effect of the “a” parameter Small “a,” poor discrimination

12 Effect of the “a” parameter Larger “a,” better discrimination

13 Effect of the “b” parameter Low “b,” “easy item”

14 Effect of the “b” parameter Higher “b,” more difficult item “b” inversely proportional to CTT p

15 Effect of the “c” parameter c=0, asymptote at zero

16 Effect of the “c” parameter “low ability” respondents may endorse correct response

17 Estimating 3PL parameters DOS version of BILOG (Scientific Software) Multiple files in directory, but small size overall Easier to estimate parameters for a large number of scales or experimental groups Data file must be saved as ASCII text ID number Individual responses Input file (ASCII text)

18 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Title line

19 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Data File Name Characters in ID field Parameters File for missing

20 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Requested files for: Scoring, Parameters, Covariances

21 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Number of items Sample size

22 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; FORTRAN statement for reading data Name of scale/ measure

23 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Estimation specifications (not the default for BILOG)

24 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Scoring: Maximum likelihood, no prior distribution of scale scores, no rescaling

25 Phase one output file (*.PH1) CLASSICAL ITEM STATISTICS FOR SUBTEST AGR NUMBERNUMBER ITEM*TEST CORRELATION ITEM NAMETRIEDRIGHT PERCENTLOGIT/1.7 PEARSONBISERIAL Can indicate problems in parameter estimation

26 Phase two output file (*.PH2) CYCLE 12: LARGEST CHANGE = LOG LIKELIHOOD = CYCLE 13: LARGEST CHANGE = [FULL NEWTON STEP] -2 LOG LIKELIHOOD = CYCLE 14: LARGEST CHANGE = Check for convergence

27 Phase three output file (*.PH3) Theta estimation Scoring of individual respondents Required for DTF analyses

28 Parameter file (specified, *.PAR) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT AGR AGR AGR AGR AGR AGR AGR “a” “b”“c” (32X,2F12.6,12X,F12.6)

29 PARTO3PL output (*.3PL) 0001AGR AGR AGR AGR AGR AGR AGR AGR AGR AGR a b c

30 Scoring and covariance files Like the *.PAR file, specifically requested *.COV - Provides parameters as well as the variances/covariances between the parameters Necessary for DIF analyses *.SCO - Provides ability score information for each respondent

31 Samejima's Graded Response model Used when options are ordered along a continuum, as with Likert scales v = response to the polytomously scored item i k = particular option a = discrimination parameter b = extremity parameter

32 Sample SGR Plot “Low option” “High option” Low discrimination (a=0.4)

33 Sample SGR Plot Better discrimination (a=2)

34 Running MULTILOG MULTILOG for DOS Example with DOS batch file INFORLOG with MULTILOG INFORLOG is typically interactive Process automated with batch file and an input file (described on-line) *.IN1 (parameter estimation) *.IN2 (scoring)

35 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; (4A1,10A1) Title line

36 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; (4A1,10A1) Number of items, examinees, characters in the ID field, single group

37 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; (4A1,10A1) SGR model Number of options for each item

38 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; (4A1,10A1) Number of cycles for estimation End of command syntax

39 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; (4A1,10A1) Five characters Denoting five options

40 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; (4A1,10A1) Recoding of options for MULTILOG

41 The second input file (*.IN2) SCORING AGREEABLENESS SCALE SGR MODEL >PRO SCORE IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >START; Y >SAVE; >END; (4A1,10A1) Scoring Yes to INFORLOG (parameters in a separate file)

42 Running MULTILOG Run the batch file *.IN1  *.LS1 (*.lis file renamed as *.ls1) ensure that the data were read in and the model specified correctly also provides a report of the estimation procedure with the estimated item parameters Things of note…

43 0ITEM 1: 5 GRADED CATEGORIES P(#) ESTIMATE (S.E.) A (0.12) B( 1) (0.18) B( 2) (0.11) B( 3) (0.06) B( 4) (0.10) I(THETA): OBSERVED AND EXPECTED COUNTS/PROPORTIONS IN CATEGORY(K): OBS. FREQ OBS. PROP EXP. PROP “a” includes a 1.7 scaling factor Frequencies for each option Collapsing options

44 Scoring output *.IN2  *.LS2 Last portion of the file contains the person parameters (estimated theta, standard error, the number of iterations used, and the respondent's ID number).

45 What now? Review Data requirements for IRT Two models: 3PL (dichotomous), SGR (polytomous), more on-line! MODFIT Can plot IRF’s, ORF’s Model-data fit: Input parameters, validation sample