Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Impact of Item Response Theory in Educational Assessment: A Practical Point of View Cees A.W. Glas University of Twente, The Netherlands University.

Similar presentations

Presentation on theme: "The Impact of Item Response Theory in Educational Assessment: A Practical Point of View Cees A.W. Glas University of Twente, The Netherlands University."— Presentation transcript:

1 The Impact of Item Response Theory in Educational Assessment: A Practical Point of View Cees A.W. Glas University of Twente, The Netherlands University of Twente

2 Measuring body height with a questionnaire
1. I bump my head quite often 2. For school pictures I was always asked to stand in the first row 3. In bed, I often suffer from cold feet 4. When walking down the stairs, I often take two steps at a time 5. I think I would do well in a basket ball team 6. As a police officer, I would not make much of an impression 7. In most cars I sit uncomfortably 8. I literally look up to most of my friends 9. Etc.

3 Test of Body Height 3 7 5 9 11 13 1 18 2 4 8 6 21 6 16 Jim Ann Jo

4 The Rasch model

5 Item Response Curve Rasch model
Probability Correct Response Latent Ability Scale

6 Item Response Function
Discrimination Probability of Success Guessing Difficulty Ability

7 Applications Local reliability and optimal test construction
Test Equating Multilevel item response theory in school effectiveness research

8 Item and Test Information
Information is a local measure of reliability Item and test information function In Adaptive Testing items are selected to maximize information at the estimated ability of examinee.

9 Adaptive Item Selection

10 Adaptive Item Selection Cont’d
Information Item 1

11 Adaptive Item Selection Cont’d
Test Item 2 Item 1 Information

12 Adaptive Item Selection Cont’d
Test Information Item 3 Item 2 Item 1

13 Item and Test Information Cont’d
Items Ability

14 Adaptive Testing with Content Constraints
Psychometrically optimal adaptive individualized testing Test content specifications Psychometrically optimal within content constraints and practical constraints Discrete optimization problem

15 Adaptive Testing with Content Constraints
Law School Admission Test content constraints item type constraints word count constraints answer key constraints gender / minority orientation clusters of items (testlets) some items contain clues to each other

16 Test Constraints Constraints are imposed by Linear - Programming techniques For every item i a variable is defined

17 Test assembly model Item i is selected for the test or not.

18 Test assembly model Item i is selected for the test or not.
At most 5 items on statistics Items 12 and 35 contain clues to each other Time available is 60 minutes

19 Test assembly model Maximize information in the test
Item i is selected for the test or not. At most 5 items on statistics Items 12 and 35 contain clues to each other Time available is 60 minutes

20 Applications Local reliability and optimal test construction
Test Equating Multilevel item response theory in school effectiveness research

21 Equating of Examinations
Problem: level of students and difficulty of examinations fluctuate over the years Objective: to determine pass/fail cut-off scores on examinations in such a way that it reflects the same level of proficiency on the latent scale, taking into account the difficulty level of the examinations and differences in proficiency level over years






27 Simple Deterministic Model
Important feature of the model: Parameter Separation: distinct parameters for persons and items University of Twente


29 Model for Item with 5 response categories
Probability Response Category X=0 X=4 X=1 X=3 X=2 Latent Ability Scale

30 Multidimensional IRT model
University of Twente

31 Equating of Examinations
Problem: level of students and difficulty of examinations fluctuate over the years Objective: to determine pass/fail cut-off scores on examinations in such a way that it reflects the same level of proficiency on the latent scale, taking into account the difficulty level of the examinations and differences in proficiency level over years

32 Anchor Item Equating Design

33 Problems Anchor Item Design
Student ability increases between test administrations due to learning Difference in ability and item ordering between anchor test and examination due to low motivation of students If anchor test becomes known, the test functions different over the years All these effects violate the model and bias the estimated cut-off scores

34 Equating Design Central Examinations, the Netherlands

35 Equating Design SweSat

36 Applications Local reliability and optimal test construction
Test Equating Multilevel item response theory in school effectiveness research

37 Measurement model: GPCM
Alternatives to GPCM (Muraki): Graded Response Model (Samejima) Sequential Model (Tutz)

38 Structural Model Takane and de Leeuw (1987)
Model is equivalent with a factor analysis model: Discrimination parameters are factor loadings Ability parameters are factor scores

39 IRT structural modeling

40 Problems with “ordinary” regression and analysis of variance models
Different aggregation levels: school level and student level Variance structure: students within schools are more similar than students from different schools Old unsatisfactory solutions: aggregating to school level disaggregating to student level Newer solutions: multilevel models: Bryk & Raudenbush, Longford, Goldstein


42 Motivation for this approach All the niceties of IRT are available in Multilevel Analysis
Method to model unreliability in the dependent and independent variables Hetroscedasticity: reliability is defined locally Incomplete test administration and calibration design (possibility to include selection models) No assumption of normally distributed scores Less ceiling problems

43 An Example (Shalabi, Fox, Glas, Bosker)
3384 grade seven pupils in 119 schools in the West Bank Mathematics test Gender SES IQ School Leadership School Climate

44 Intra-class correlation:
Model: Intra-class correlation:



47 Conclusions IRT is based on the idea of parameter separation
An IRT measurement model can be combined with a structural model The combined model is equivalent with factor analysis and latent variable models and as such a generalization of other well-known regression models Applications of IRT Local reliability and optimal test construction Test Equating Multilevel IRT in school effectiveness research

Download ppt "The Impact of Item Response Theory in Educational Assessment: A Practical Point of View Cees A.W. Glas University of Twente, The Netherlands University."

Similar presentations

Ads by Google