Item Response Theory. What’s wrong with the old approach? Classical test theory –Sample dependent –Parallel test form issue Comparing examinee scores.

Slides:



Advertisements
Similar presentations
Test Development.
Advertisements

Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
The measurement model: what does it mean and what you can do with it? Presented by Michael Nering, Ph. D.
Logistic Regression Psy 524 Ainsworth.
Item Response Theory in Health Measurement
Some terminology When the relation between variables are expressed in this manner, we call the relevant equation(s) mathematical models The intercept and.
Introduction to Item Response Theory
IRT Equating Kolen & Brennan, IRT If data used fit the assumptions of the IRT model and good parameter estimates are obtained, we can estimate person.
AN OVERVIEW OF THE FAMILY OF RASCH MODELS Elena Kardanova
Objectives (BPS chapter 24)
Correlation 2 Computations, and the best fitting line.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Stat 217 – Day 26 Regression, cont.. Last Time – Two quantitative variables Graphical summary  Scatterplot: direction, form (linear?), strength Numerical.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Today Concepts underlying inferential statistics
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Intro to Parametric Statistics, Assumptions & Degrees of Freedom Some terms we will need Normal Distributions Degrees of freedom Z-values of individual.
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Classical Test Theory By ____________________. What is CCT?
Relationships Among Variables
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Computerized Adaptive Testing: What is it and How Does it Work?
Standard Error of the Mean
Issues in Experimental Design Reliability and ‘Error’
Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs.
Estimation of Statistical Parameters
The ABC’s of Pattern Scoring Dr. Cornelia Orr. Slide 2 Vocabulary Measurement – Psychometrics is a type of measurement Classical test theory Item Response.
Basic linear regression and multiple regression Psych Fraley.
Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
1 Psych 5510/6510 Chapter 10. Interactions and Polynomial Regression: Models with Products of Continuous Predictors Spring, 2009.
Tests and Measurements Intersession 2006.
Research Process Parts of the research study Parts of the research study Aim: purpose of the study Aim: purpose of the study Target population: group whose.
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
Central Tendency & Dispersion
The ABC’s of Pattern Scoring
University of Ostrava Czech republic 26-31, March, 2012.
Correlation & Regression Analysis
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Item Response Theory in Health Measurement
FIT ANALYSIS IN RASCH MODEL University of Ostrava Czech republic 26-31, March, 2012.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
Item Response Theory Dan Mungas, Ph.D. Department of Neurology University of California, Davis.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
Two Approaches to Estimation of Classification Accuracy Rate Under Item Response Theory Quinn N. Lathrop and Ying Cheng Assistant Professor Ph.D., University.
Lesson 2 Main Test Theories: The Classical Test Theory (CTT)
IRT Equating Kolen & Brennan, 2004 & 2014 EPSY
Chapter 16: Sample Size “See what kind of love the Father has given to us, that we should be called children of God; and so we are. The reason why the.
Item Analysis: Classical and Beyond
HLM with Educational Large-Scale Assessment Data: Restrictions on Inferences due to Limited Sample Sizes Sabine Meinck International Association.
Reliability & Validity
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
CHAPTER 3 Describing Relationships
By ____________________
Product moment correlation
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Item Analysis: Classical and Beyond
Item Analysis: Classical and Beyond
CHAPTER 3 Describing Relationships
Presentation transcript:

Item Response Theory

What’s wrong with the old approach? Classical test theory –Sample dependent –Parallel test form issue Comparing examinee scores Reliability –No predictability –“Error” is the same for everybody

So, what is IRT? A family of mathematical models that describe the interaction between examinees and test items Examinee performance can be predicted in terms of the underlying trait Provides a means for estimating scores for people and characteristics of items Common framework for describing people and items

Some Terminology “Ability” –We use this as a generic term used to describe the “thing” that we are trying to measure –The “thing” can be any old “thing” and we need not concern ourselves with labeling the “thing”, but examples of the “thing” include: Reading ability Math performance Depression

The ogive Natural occurring form that describes something about people Used throughout science, engineering, and the social sciences Also, used in architecture, carpentry, photograph, art, and so forth

The ogive

The Item Characteristic Curve (ICC) This function really does everything: –Scales items & people onto a common metric –Helps in standard setting –Foundation of equating –Some meaning in terms of student ability

The ICC Any line in a Cartesian system can be defined by a formula The simplest formula for the ogive is the logistic function:

The ICC Where b  is the item parameter, and  is the person parameter The equation represents the probability of responding correctly to item i given the ability of person j.

b is the inflection point Item i b i =0.125

We can now use the item parameter to calculate p Let’s assume we have a student with  =1.0, and we have our b  = Then we can simply plug in the numbers into our formula

Using the item parameters to calculate p p =  i =1.00

Wait a minute What do you mean a student with an ability of 1.0?? Does an ability of 0.0 mean that a student has NO ability? What if my student has a reading ability estimate of -1.2?

The ability scale Ability is on an arbitrary scale that just so happens to be centered around 0.0 We use arbitrary scales all the time: –Fahrenheit –Celsius –Decibels –DJIA

Scaled Scores Although ability estimates are centered around zero – reported scores are not However, scaled scores are typically a linear transformation of ability estimates Example of a linear transformation: –(Ability x Slope) + Intercept

The need for scaled scores ½ the kids will have negative ability estimates

The Two Scales of Measurement Reporting Scale (Scaled Scores) –Student/parent level report –School/district report –Cross year comparisons –Performance level categorization The Psychometric Scale (  ) –IRT item and person parameters –Equating –Standard setting

Unfortunately, life can get a lot worse Items vary from one another in a variety of ways: –Difficulty –Discrimination –Guessing –Item type (MC vs. CR)

Items can vary in terms of difficulty Ability of a student Easier item Harder item

Items can vary in terms of discrimination Discrimination is reflected by the “pitch” in the ICC Thus, we allow the ICCs to vary in terms of their slope

Good item discrimination 2 close ability levels Noticeable difference in p

Poor item discrimination smaller difference Same 2 ability levels

Guessing This item is asymptotically approaching 0.25

Constructed Response Items

Items and people Interact in a variety of ways We can use IRT to show that there exists a nice little s-shaped curve that shows this interaction As ability increases – the probability of a correct response increases

Advantages of IRT Because of the stochastic nature of IRT there are many statistical principles we can take advantage of A test is a sum of its parts

The test characteristic curve A test is made up of many items The TCC can be used to summarize across all of our items The TCC is simply the summation of ICCs along our ability continuum For any ability level we can use the TCC to estimate the overall test score for an examinee

Several ICCs are on a test

The test characteristic curve

From an observed test score (i.e., a student’s total test score) we can estimate ability The TCC is used in standard setting to establish performance levels The TCC can also be used to equate tests from one year to the next The test characteristic curve

Estimating Ability Total score = 3 Ability≈0.175

Psychometric “Information” The amount that an item contributes to estimating ability Items that are close to a person’s ability provide more information than items that are far away An item is most informative around the point of inflection

Item Information Item is most informative here because this is where we can discriminate among nearby  values

Item Information Item is much less informative at points along  where there is little slope in the ICC

Test Information Test information is the sum of item information Tests are also most “informative” where the slope of the TCC is the greatest Information (like everything else in IRT) is a function of ability Test information really is test “precision”

Let’s start with a TCC

Information Functions We can evaluate information at a given cutpoint BP/P

Information and CTT CTT has reliability and of course the famous  coefficient IRT has the test information function –Test quality can be evaluated conditionally along the performance continuum In IRT information is, conveniently, reciprocally related to standard error

Standard Error as a function of ability  SE = 0.25

Standard Error of Ability Total score = 3 Ability≈0.175

Standard Error of Ability Total score = 3 Ability≈0.175  Confident region of ability estimate

Item Response Theory A vast kingdom of equations, and dizzying array of complex concepts Ultimately, we use IRT to explain the interaction between students and test items The cornerstone to IRT is the ICC which depicts that as ability increases the chances of getting an item correct increases

Item Response Theory Everything in IRT can be studied conditionally along the performance continuum The CTT concept of reliability is what we call test information, and we can think of this as being a function of test precision SE is related to information and can also be studied along 

The Utility of Item Response Theory Can be used to estimate characteristics of items and people Can be used in the test development process to maximize information (minimize SE) at critical points along  Can even be used for test administration purposes