The Impact of Item Response Theory in Educational Assessment: A Practical Point of View Cees A.W. Glas University of Twente, The Netherlands University.

Slides:



Advertisements
Similar presentations
Structural Equation Modeling. What is SEM Swiss Army Knife of Statistics Can replicate virtually any model from “canned” stats packages (some limitations.
Advertisements

Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
Logistic Regression Psy 524 Ainsworth.
1 Scaling of the Cognitive Data and Use of Student Performance Estimates Guide to the PISA Data Analysis ManualPISA Data Analysis Manual.
Item Response Theory in Health Measurement
Remaining Challenges and What to Do Next: Undiscovered Areas Ron D. Hays UCLA Division of General Internal Medicine & Health Services Research
Introduction to Item Response Theory
IRT Equating Kolen & Brennan, IRT If data used fit the assumptions of the IRT model and good parameter estimates are obtained, we can estimate person.
AN OVERVIEW OF THE FAMILY OF RASCH MODELS Elena Kardanova
Latent Change in Discrete Data: Rasch Models
Programme in Statistics (Courses and Contents). Elementary Probability and Statistics (I) 3(2+1)Stat. 101 College of Science, Computer Science, Education.
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Comparison of Reliability Measures under Factor Analysis and Item Response Theory —Ying Cheng , Ke-Hai Yuan , and Cheng Liu Presented by Zhu Jinxin.
© UCLES 2013 Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh.
Discriminant Analysis Testing latent variables as predictors of groups.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Identification of Misfit Item Using IRT Models Dr Muhammad Naveed Khalid.
Item Response Theory for Survey Data Analysis EPSY 5245 Michael C. Rodriguez.
Item Response Theory. What’s wrong with the old approach? Classical test theory –Sample dependent –Parallel test form issue Comparing examinee scores.
Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010.
 Random Guessing › A function of the proficiency of a person relative to the difficulty of an item(Waller, 1973, 1976, 1989) › Not a property of an item.
Naglieri Nonverbal Ability Test (NNAT) Miami-Dade County Public Schools NNAT Workshop March 26, 28, & 29, 2007.
The ABC’s of Pattern Scoring Dr. Cornelia Orr. Slide 2 Vocabulary Measurement – Psychometrics is a type of measurement Classical test theory Item Response.
Introduction Multilevel Analysis
1 Multivariate Analysis (Source: W.G Zikmund, B.J Babin, J.C Carr and M. Griffin, Business Research Methods, 8th Edition, U.S, South-Western Cengage Learning,
Test Scaling and Value-Added Measurement Dale Ballou Vanderbilt University April, 2008.
Chapter 16 Data Analysis: Testing for Associations.
Multivariate Data Analysis Chapter 1 - Introduction.
Pearson Copyright 2010 Some Perspectives on CAT for K-12 Assessments Denny Way, Ph.D. Presented at the 2010 National Conference on Student Assessment June.
Scaling and Equating Joe Willhoft Assistant Superintendent of Assessment and Student Information Yoonsun Lee Director of Assessment and Psychometrics Office.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
The ABC’s of Pattern Scoring
University of Ostrava Czech republic 26-31, March, 2012.
Item Factor Analysis Item Response Theory Beaujean Chapter 6.
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
Latent regression models. Where does the probability come from? Why isn’t the model deterministic. Each item tests something unique – We are interested.
Item Response Theory in Health Measurement
Summary of Bayesian Estimation in the Rasch Model H. Swaminathan and J. Gifford Journal of Educational Statistics (1982)
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Classical Test Theory Psych DeShon. Big Picture To make good decisions, you must know how much error is in the data upon which the decisions are.
Nonequivalent Groups: Linear Methods Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2 nd ed.). New.
Evaluating Patient-Reports about Health
Daniel Muijs Saad Chahine
Jean-Guy Blais Université de Montréal
Claus H. Carstensen, Institute for Science Education IPN Kiel, Germany
Unit 11: Testing and Individual Differences
Assessing the Quality of Instructional Materials: Item Response Theory
UCLA Department of Medicine
Evaluating Patient-Reports about Health
UCLA Department of Medicine
The Impact of Item Response Theory in Educational Assessment: A Practical Point of View Cees A.W. Glas University of Twente, The Netherlands University.
Classical Test Theory Margaret Wu.
Item Analysis: Classical and Beyond
Reliability & Validity
Booklet Design and Equating
12 Inferential Analysis.
The psychometrics of Likert surveys: Lessons learned from analyses of the 16pf Questionnaire Alan D. Mead.
Aligned to Common Core State Standards
Mohamed Dirir, Norma Sinclair, and Erin Strauts
From GLM to HLM Working with Continuous Outcomes
UNIT IV ITEM ANALYSIS IN TEST DEVELOPMENT
12 Inferential Analysis.
Investigating item difficulty change by item positions under the Rasch model Luc Le & Van Nguyen 17th International meeting of the Psychometric Society,
Item Analysis: Classical and Beyond
Chapter 7 The Normal Distribution and Its Applications
Item Analysis: Classical and Beyond
Investigations into Comparability for the PARCC Assessments
Presentation transcript:

The Impact of Item Response Theory in Educational Assessment: A Practical Point of View Cees A.W. Glas University of Twente, The Netherlands University of Twente c.a.w.glas@gw.utwente.nl

Measuring body height with a questionnaire 1. I bump my head quite often 2. For school pictures I was always asked to stand in the first row 3. In bed, I often suffer from cold feet 4. When walking down the stairs, I often take two steps at a time 5. I think I would do well in a basket ball team 6. As a police officer, I would not make much of an impression 7. In most cars I sit uncomfortably 8. I literally look up to most of my friends 9. Etc.

Test of Body Height 3 7 5 9 11 13 1 18 2 4 8 6 21 6 16 Jim Ann Jo

The Rasch model

Item Response Curve Rasch model Probability Correct Response Latent Ability Scale

Item Response Function Discrimination Probability of Success Guessing Difficulty Ability

Applications Local reliability and optimal test construction Test Equating Multilevel item response theory in school effectiveness research

Item and Test Information Information is a local measure of reliability Item and test information function In Adaptive Testing items are selected to maximize information at the estimated ability of examinee.

Adaptive Item Selection Information

Adaptive Item Selection Cont’d Information Item 1

Adaptive Item Selection Cont’d Test Item 2 Item 1 Information

Adaptive Item Selection Cont’d Test Information Item 3 Item 2 Item 1

Item and Test Information Cont’d Items Ability

Adaptive Testing with Content Constraints Psychometrically optimal adaptive individualized testing Test content specifications Psychometrically optimal within content constraints and practical constraints Discrete optimization problem

Adaptive Testing with Content Constraints Law School Admission Test content constraints item type constraints word count constraints answer key constraints gender / minority orientation clusters of items (testlets) some items contain clues to each other

Test Constraints Constraints are imposed by Linear - Programming techniques For every item i a variable is defined

Test assembly model Maximize information in the test Item i is selected for the test or not. At most 5 items on statistics Items 12 and 35 contain clues to each other Time available is 60 minutes

Equating of Examinations Problem: level of students and difficulty of examinations fluctuate over the years Objective: to determine pass/fail cut-off scores on examinations in such a way that it reflects the same level of proficiency on the latent scale, taking into account the difficulty level of the examinations and differences in proficiency level over years

Simple Deterministic Model Important feature of the model: Parameter Separation: distinct parameters for persons and items University of Twente

Model for Item with 5 response categories Probability Response Category X=0 X=4 X=1 X=3 X=2 Latent Ability Scale

Multidimensional IRT model University of Twente

Anchor Item Equating Design

Problems Anchor Item Design Student ability increases between test administrations due to learning Difference in ability and item ordering between anchor test and examination due to low motivation of students If anchor test becomes known, the test functions different over the years All these effects violate the model and bias the estimated cut-off scores

Equating Design Central Examinations, the Netherlands

Equating Design SweSat

Measurement model: GPCM Alternatives to GPCM (Muraki): Graded Response Model (Samejima) Sequential Model (Tutz)

Structural Model Takane and de Leeuw (1987) Model is equivalent with a factor analysis model: Discrimination parameters are factor loadings Ability parameters are factor scores

IRT structural modeling

Problems with “ordinary” regression and analysis of variance models Different aggregation levels: school level and student level Variance structure: students within schools are more similar than students from different schools Old unsatisfactory solutions: aggregating to school level disaggregating to student level Newer solutions: multilevel models: Bryk & Raudenbush, Longford, Goldstein

Motivation for this approach All the niceties of IRT are available in Multilevel Analysis Method to model unreliability in the dependent and independent variables Hetroscedasticity: reliability is defined locally Incomplete test administration and calibration design (possibility to include selection models) No assumption of normally distributed scores Less ceiling problems

An Example (Shalabi, Fox, Glas, Bosker) 3384 grade seven pupils in 119 schools in the West Bank Mathematics test Gender SES IQ School Leadership School Climate

Intra-class correlation: Model:   Intra-class correlation:  

Conclusions IRT is based on the idea of parameter separation An IRT measurement model can be combined with a structural model The combined model is equivalent with factor analysis and latent variable models and as such a generalization of other well-known regression models Applications of IRT Local reliability and optimal test construction Test Equating Multilevel IRT in school effectiveness research