Modern Test Theory Item Response Theory (IRT). Limitations of classical test theory An examinee’s ability is defined in terms of a particular test The.

Slides:



Advertisements
Similar presentations
MEASUREMENT Goal To develop reliable and valid measures using state-of-the-art measurement models Members: Chang, Berdes, Gehlert, Gibbons, Schrauf, Weiss.
Advertisements

Implications and Extensions of Rasch Measurement.
What You Need to Know about the Computer Adaptive NREMT Exam.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 23, 2012.
What is a CAT?. Introduction COMPUTER ADAPTIVE TEST + performance task.
DIF Analysis Galina Larina of March, 2012 University of Ostrava.
Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
VALIDITY AND RELIABILITY
Part II Sigma Freud & Descriptive Statistics
Item Response Theory in Health Measurement
Introduction to Item Response Theory
AN OVERVIEW OF THE FAMILY OF RASCH MODELS Elena Kardanova
Models for Measuring. What do the models have in common? They are all cases of a general model. How are people responding? What are your intentions in.
Latent Change in Discrete Data: Rasch Models
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Examing Rounding Rules in Angoff Type Standard Setting Methods Adam E. Wyse Mark D. Reckase.
© UCLES 2013 Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh.
Classical Test Theory By ____________________. What is CCT?
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Computerized Adaptive Testing: What is it and How Does it Work?
Intelligent System Lab. (iLab) Southern Taiwan University of Science and Technology 1 Estimation of Item Difficulty Index Based on Item Response Theory.
Measurement Problems within Assessment: Can Rasch Analysis help us? Mike Horton Bipin Bhakta Alan Tennant.
Identification of Misfit Item Using IRT Models Dr Muhammad Naveed Khalid.
Item Response Theory Psych 818 DeShon. IRT ● Typically used for 0,1 data (yes, no; correct, incorrect) – Set of probabilistic models that… – Describes.
Item Response Theory for Survey Data Analysis EPSY 5245 Michael C. Rodriguez.
Item Response Theory. What’s wrong with the old approach? Classical test theory –Sample dependent –Parallel test form issue Comparing examinee scores.
Alternatives in Assessment. Assessment Options Popham, W. J. (1995). Classroom assessment : what teachers need to know. Boston: Allyn and Bacon.
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
Technical Adequacy Session One Part Three.
Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs.
The ABC’s of Pattern Scoring Dr. Cornelia Orr. Slide 2 Vocabulary Measurement – Psychometrics is a type of measurement Classical test theory Item Response.
Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,
EAssessment Colin Milligan Heriot-Watt University.
智慧型系統實驗室 iLab 南台資訊工程 1 Evaluation for the Test Quality of Dynamic Question Generation by Particle Swarm Optimization for Adaptive Testing Department of.
The North Carolina Online Computer Skills Assessment: Relationships between Item Response Times and Item Residuals – or – The North Carolina Online Computer.
Introduction to Validity
1 An Investigation of The Response Time for Maths Items in A Computer Adaptive Test C. Wheadon & Q. He, CEM CENTRE, DURHAM UNIVERSITY, UK Chris Wheadon.
Assessment – basic concepts LETRAS – Pre-service education Diagnostic and Final Assessments Formative X Summative Assessment.
MEASUREMENT: SCALE DEVELOPMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
NCLEX ® is a Computerized Adaptive Test (CAT) How Does It Work?
MEASUREMENT. MeasurementThe assignment of numbers to observed phenomena according to certain rules. Rules of CorrespondenceDefines measurement in a given.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
The ABC’s of Pattern Scoring
University of Ostrava Czech republic 26-31, March, 2012.
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire Design and Testing.
Item Factor Analysis Item Response Theory Beaujean Chapter 6.
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Item Response Theory in Health Measurement
FIT ANALYSIS IN RASCH MODEL University of Ostrava Czech republic 26-31, March, 2012.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
Chapter 6 - Standardized Measurement and Assessment
Item Response Theory Dan Mungas, Ph.D. Department of Neurology
Item Response Theory Dan Mungas, Ph.D. Department of Neurology University of California, Davis.
Two Approaches to Estimation of Classification Accuracy Rate Under Item Response Theory Quinn N. Lathrop and Ying Cheng Assistant Professor Ph.D., University.
Overview of Item Response Theory Ron D. Hays November 14, 2012 (8:10-8:30am) Geriatrics Society of America (GSA) Pre-Conference Workshop on Patient- Reported.
Item Response Theory and Computerized Adaptive Testing Hands-on Workshop, day 2 John Rust, Iva Cek,
Lesson 2 Main Test Theories: The Classical Test Theory (CTT)
The Invariance of the easyCBM® Mathematics Measures Across Educational Setting, Language, and Ethnic Groups Joseph F. Nese, Daniel Anderson, and Gerald.
Adopting The Item Response Theory in Operations Management Research
Classical Test Theory Margaret Wu.
Item Analysis: Classical and Beyond
Booklet Design and Equating
Aligned to Common Core State Standards
By ____________________
Mohamed Dirir, Norma Sinclair, and Erin Strauts
Unidimensionality (U): What’s it good for in applied research?
Item Analysis: Classical and Beyond
Item Analysis: Classical and Beyond
Presentation transcript:

Modern Test Theory Item Response Theory (IRT)

Limitations of classical test theory An examinee’s ability is defined in terms of a particular test The difficulty of a test item is defined in terms of a particular group of test-takers In short, “examinee characteristics and test item characteristics cannot be separated: each can be interpreted only in the context of the other” (Hambleton, et.al, 1991, p. 3) Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: SAGE Publications, Inc.

Joe and the 8-item test Joe’s Ability Score: 8 Score: 0 Item 1Item 8 Very Easy Test Item 1Item 8 Very Hard Test Narrow Hard Test Item 1 Item 8 Score: 3 Adapted from: Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago: MESA Press.

Non-linearity of scores Joe’s Ability Tom’s Ability Item 1Item 8 Joe’s Ability Tom’s Ability Joe’s Ability Tom’s Ability Item 1Item 8Item 1Item 8 Score = 0Score = 8 Score = 4

Latent trait and performance Latent Variable (True Score) Form 1 score Form 2 score Form 3 score Item 1 Response Item 2 Response Item 3 Response Error 1 Error 2 Error 3 Classical Test Theory Item Response Theory Latent Variable Embretson, S. E. (1999). Issues in the measurement of cognitive abilities. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement (pp. 1-15). Mahwah, NJ: Lawrence Erlbaum Associates.

Item Response Theory (IRT) The performance of an examinee on a test item can be predicted (explained) by latent traits As a persons level of the underlying trait increases, the probability of a correct response to an item increases This relationship (person and item) can be visualized by an Item Information Curve (ICC) (Hambleton, et.al., 1991)

Understanding Item Characteristic Curves Imagine a continuum of vocabulary knowledge SleepySomnolent Oscitant Thorndike, R. M. (1999). IRT and intelligence testing: Past, present, and future. In S. E. Embretson & S. L. Hershberger (Eds.), The new rules of measurement (pp ). Mahway, NJ: Lawrence Erlbaum Associates.

Understanding ICC (2) (Thorndike, 1999, p. 20)

Item Difficulty

Item Discrimination

3-Parameter Model

Vocabulary ICC revisited

Basic IRT concept PROB(Item Passed)=FUNCTION[(TraitLevel)-(ItemDifficulty)]

Assumptions of IRT Unidimensionality – only one ability is measured by a set of items on a test Local independence – examinee’s responses to any two items are statistically independent 1-parameter model – no guessing, item discrimination is the same for all items 2-parameter model – no guessing

Advantages of IRT Sample-free item calibration Test-free person measurement Item banking facility Computer delivery of tests Test tailoring facility Score reporting facility Item bias detection Henning, G. (1987). A guide to language testing: development, evaluation, research. Boston: Heinle & Heinle.

Linking items across test forms As long as there are some common items (linking items), person ability estimates can be made from performance on different items Items common to Test A and B (Henning, 1987, p. 133)

Score reporting facility (McNamara, 1996, p.201)

Test tailoring facility An untailored standardized test gives maximum information near its mean Imagine that a university required a score above 67 to be admitted and above 82 to be exempt from language classes A tailored test can be “loaded” with items that provide maximum information at the cut-points

Computerized testing Computer-delivered tests –Tests which use a computer rather than pencil and paper for test content delivery –Items can take advantage of computer’s multimedia capabilities Computer-adaptive tests –Test is created “on the fly” to match examinee’s ability level Web-based tests –Delivered over the World Wide Web –Test-takers can access from anywhere

Adaptive testing Sands, W. A., & Waters, B. K. (1997). Introduction to ASVAB and CAT. In W. A. Sands & B. K. Waters & J. R. McBride (Eds.), Computerized adaptive testing (pp. 3-10). Washington: American Psychological Association.

CAT advantages Increased efficiency –More able examinees are not bored with easy questions –Less able examinees are not frustrated with incredibly difficult questions Immediate feedback is possible Examinees can work at own pace Audiovisual material can be incorporated Potential for “on demand” testing

CAT Challenges Technical sophistication required to develop and administer CAT Need for large item pool Overexposure of best items Ensuring consistency of measures and content across candidates Public perception of computer-based scores –Completely infallible –Completely bogus