1 IRT basics: Theory and parameter estimation Wayne C. Lee, David Chuah, Patrick Wadlington, Steve Stark, & Sasha Chernyshenko
2 Overview How do I begin a set of IRT analyses? What do I need? Software Data What do I do? Input/ syntax files Examination of output On-line!
3 “Eye-ARE-What?” Item response theory (IRT) Set of probabilistic models that… Describes the relationship between a respondent’s magnitude on a construct (a.k.a. latent trait; e.g., extraversion, cognitive ability, affective commitment)… To his or her probability of a particular response to an individual item
4 But what does that buy you? Provides more information than classical test theory (CTT) Classical test statistics depend on the set of items and sample examined IRT modeling not dependent on sample examined Can examine item bias/ measurement equivalence and provide conditional standard errors of measurement
5 Before we begin… Data preparation Raw data must be recoded if necessary (negatively worded items must be reverse coded such that all items in the scale indicate a positive direction) Dichotomization (optional) Reducing multiple options into two separate values (0, 1; right, wrong)
6 Calibration and validation files Data is split into two separate files Calibration sample for estimating IRT parameters Validation sample for assessing the fit of the model to the data Data files for the programs that we will be discussing must be in ASCII/ text format
7 Investigating dimensionality The models presented make a common assumption of unidimensionality Hattie (1985) reviewed 30 techniques Some propose the ratio of the 1 st eigenvalue to the 2 nd eigenvalue (Lord, 1980) On-line we describe how to examine the eigenvalues following Principal Axis Factoring (PAF)
8 PAF and scree plots If the data are dichotomous, factor analyze tetrachoric correlations Assume continuum underlies item responses Dominant first factor
9 Two models presented The Three Parameter Logistic model (3PL) For dichotomous data E.g., cognitive ability tests Samejima's Graded Response model For polytomous data where options are ordered along a continuum E.g., Likert scales Common models among applied psychologists
10 The 3PL model Three parameters: a = item discrimination b = item extremity/ difficulty c = lower asymptote, “pseudo-guessing” Theta refers to the latent trait
11 Effect of the “a” parameter Small “a,” poor discrimination
12 Effect of the “a” parameter Larger “a,” better discrimination
13 Effect of the “b” parameter Low “b,” “easy item”
14 Effect of the “b” parameter Higher “b,” more difficult item “b” inversely proportional to CTT p
15 Effect of the “c” parameter c=0, asymptote at zero
16 Effect of the “c” parameter “low ability” respondents may endorse correct response
17 Estimating 3PL parameters DOS version of BILOG (Scientific Software) Multiple files in directory, but small size overall Easier to estimate parameters for a large number of scales or experimental groups Data file must be saved as ASCII text ID number Individual responses Input file (ASCII text)
18 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Title line
19 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Data File Name Characters in ID field Parameters File for missing
20 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Requested files for: Scoring, Parameters, Covariances
21 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Number of items Sample size
22 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; FORTRAN statement for reading data Name of scale/ measure
23 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Estimation specifications (not the default for BILOG)
24 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Scoring: Maximum likelihood, no prior distribution of scale scores, no rescaling
25 Phase one output file (*.PH1) CLASSICAL ITEM STATISTICS FOR SUBTEST AGR NUMBERNUMBER ITEM*TEST CORRELATION ITEM NAMETRIEDRIGHT PERCENTLOGIT/1.7 PEARSONBISERIAL Can indicate problems in parameter estimation
26 Phase two output file (*.PH2) CYCLE 12: LARGEST CHANGE = LOG LIKELIHOOD = CYCLE 13: LARGEST CHANGE = [FULL NEWTON STEP] -2 LOG LIKELIHOOD = CYCLE 14: LARGEST CHANGE = Check for convergence
27 Phase three output file (*.PH3) Theta estimation Scoring of individual respondents Required for DTF analyses
28 Parameter file (specified, *.PAR) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT AGR AGR AGR AGR AGR AGR AGR “a” “b”“c” (32X,2F12.6,12X,F12.6)
29 PARTO3PL output (*.3PL) 0001AGR AGR AGR AGR AGR AGR AGR AGR AGR AGR a b c
30 Scoring and covariance files Like the *.PAR file, specifically requested *.COV - Provides parameters as well as the variances/covariances between the parameters Necessary for DIF analyses *.SCO - Provides ability score information for each respondent
31 Samejima's Graded Response model Used when options are ordered along a continuum, as with Likert scales v = response to the polytomously scored item i k = particular option a = discrimination parameter b = extremity parameter
32 Sample SGR Plot “Low option” “High option” Low discrimination (a=0.4)
33 Sample SGR Plot Better discrimination (a=2)
34 Running MULTILOG MULTILOG for DOS Example with DOS batch file INFORLOG with MULTILOG INFORLOG is typically interactive Process automated with batch file and an input file (described on-line) *.IN1 (parameter estimation) *.IN2 (scoring)
35 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; (4A1,10A1) Title line
36 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; (4A1,10A1) Number of items, examinees, characters in the ID field, single group
37 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; (4A1,10A1) SGR model Number of options for each item
38 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; (4A1,10A1) Number of cycles for estimation End of command syntax
39 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; (4A1,10A1) Five characters Denoting five options
40 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; (4A1,10A1) Recoding of options for MULTILOG
41 The second input file (*.IN2) SCORING AGREEABLENESS SCALE SGR MODEL >PRO SCORE IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >START; Y >SAVE; >END; (4A1,10A1) Scoring Yes to INFORLOG (parameters in a separate file)
42 Running MULTILOG Run the batch file *.IN1 *.LS1 (*.lis file renamed as *.ls1) ensure that the data were read in and the model specified correctly also provides a report of the estimation procedure with the estimated item parameters Things of note…
43 0ITEM 1: 5 GRADED CATEGORIES P(#) ESTIMATE (S.E.) A (0.12) B( 1) (0.18) B( 2) (0.11) B( 3) (0.06) B( 4) (0.10) I(THETA): OBSERVED AND EXPECTED COUNTS/PROPORTIONS IN CATEGORY(K): OBS. FREQ OBS. PROP EXP. PROP “a” includes a 1.7 scaling factor Frequencies for each option Collapsing options
44 Scoring output *.IN2 *.LS2 Last portion of the file contains the person parameters (estimated theta, standard error, the number of iterations used, and the respondent's ID number).
45 What now? Review Data requirements for IRT Two models: 3PL (dichotomous), SGR (polytomous), more on-line! MODFIT Can plot IRF’s, ORF’s Model-data fit: Input parameters, validation sample