Item Response Theory for Survey Data Analysis EPSY 5245 Michael C. Rodriguez.

Slides:



Advertisements
Similar presentations
Psychometrics to Support RtI Assessment Design Michael C. Rodriguez University of Minnesota February 2010.
Advertisements

MEASUREMENT Goal To develop reliable and valid measures using state-of-the-art measurement models Members: Chang, Berdes, Gehlert, Gibbons, Schrauf, Weiss.
Implications and Extensions of Rasch Measurement.
Brief introduction on Logistic Regression
Cal State Northridge Psy 427 Andrew Ainsworth PhD
Item Response Theory in Health Measurement
Skills Diagnosis with Latent Variable Models. Topic 1: A New Diagnostic Paradigm.
AN OVERVIEW OF THE FAMILY OF RASCH MODELS Elena Kardanova
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
Overview of field trial analysis procedures National Research Coordinators Meeting Windsor, June 2008.
VALIDITY.
Beginning the Research Design
Teaching and Testing Pertemuan 13
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Why Scale -- 1 Summarising data –Allows description of developing competence Construct validation –Dealing with many items rotated test forms –check how.
Chapter 7 Evaluating What a Test Really Measures
Classroom Assessment A Practical Guide for Educators by Craig A
IEA Teacher Education Study in Mathematics 3 rd NRC Meeting, June 25-29, Taipei, Chinese Taipei. Michael C. Rodriguez University of Minnesota & Michigan.
Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Measurement Problems within Assessment: Can Rasch Analysis help us? Mike Horton Bipin Bhakta Alan Tennant.
Reporting item response theory results Jeffrey B. Brookings Wittenberg University Presented at the SAMR/SWPA Symposium: Handy tips for communicating and.
Measurement in Exercise and Sport Psychology Research EPHE 348.
Modern Test Theory Item Response Theory (IRT). Limitations of classical test theory An examinee’s ability is defined in terms of a particular test The.
Is the Force Concept Inventory Biased? Investigating Differential Item Functioning on a Test of Conceptual Learning in Physics Sharon E. Osborn Popp, David.
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
Classroom Assessments Checklists, Rating Scales, and Rubrics
Chapter 8 Introduction to Hypothesis Testing
Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.
Introduction Osborn. Daubert is a benchmark!!!: Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. Must determine if the science is.
Introduction to Validity
Variables and their Operational Definitions
Measurement Validity.
MEASUREMENT. MeasurementThe assignment of numbers to observed phenomena according to certain rules. Rules of CorrespondenceDefines measurement in a given.
Validity Validity is an overall evaluation that supports the intended interpretations, use, in consequences of the obtained scores. (McMillan 17)
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
CHAPTER 2 Research Methods in Industrial/Organizational Psychology
JS Mrunalini Lecturer RAKMHSU Data Collection Considerations: Validity, Reliability, Generalizability, and Ethics.
Experimental Research Methods in Language Learning Chapter 5 Validity in Experimental Research.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
Item Factor Analysis Item Response Theory Beaujean Chapter 6.
Spring 2015 Kyle Stephenson
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
Psychometric Evaluation of Questionnaire Design and Testing Workshop December , 10:00-11:30 am Wilshire Suite 710 DATA.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Item Response Theory in Health Measurement
Fitz-Albert R. Russell, PhD JP Establishing Standards for Students’ Assessment.
Chapter 6 - Standardized Measurement and Assessment
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
How Psychologists Do Research Chapter 2. How Psychologists Do Research What makes psychological research scientific? Research Methods Descriptive studies.
The Invariance of the easyCBM® Mathematics Measures Across Educational Setting, Language, and Ethnic Groups Joseph F. Nese, Daniel Anderson, and Gerald.
IRT Equating Kolen & Brennan, 2004 & 2014 EPSY
Introduction to the Validation Phase
Assessment Research Centre Online Testing System (ARCOTS)
Reliability and Validity in Research
Adopting The Item Response Theory in Operations Management Research
Classical Test Theory Margaret Wu.
Validity and Reliability
CHAPTER 2 Research Methods in Industrial/Organizational Psychology
Jeffrey M. Miller & Randall D. Penfield FERA November 19, 2003
Reliability and Validity of Measurement
Confidence in Competence
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Frameworks for Considering Item Response Demands
HCI Evaluation Techniques
Chapter 8 VALIDITY AND RELIABILITY
Investigations into Comparability for the PARCC Assessments
Presentation transcript:

Item Response Theory for Survey Data Analysis EPSY 5245 Michael C. Rodriguez

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, NJ: Lawrence Erlbaum.

Reeve, B.B., & Mâsse, L.C. (2004). Item response theory modeling for questionnaire evaluation.

ADDITIONAL READINGS & ONLINE RESOURCES IRT Resources Online Introduction to IRT by Zickar online Hambleton, R. K., & Jones, R. W. (1993). An NCME instructional module on : Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12(3)

One Approach – Construct Map Construct Definition (measuring a trait) – A simple form: More or less, high to low Item Development – Realizations of the construct Outcome Space – Aspect of response we value – how to score Measurement Model – How we relate scores to constructs

From Construct to Item Responses Construct Item Responses Outcome Space Measurement Model Causality Inferences Source: Mark Wilson, 2005

Background to IRT IRT is a way of thinking about measurement: a probabilistic model. We give an item or task to a person and obtain an “item-person” interaction. This results in a score with a probability, given a person’s ability.

Ability scores are more fundamental because they are test independent. Examinees come to a test administration with trait levels in relation to the construct being measured – not necessarily in relation to the test being administered.

Each child has a trait score that is defined in relation to the construct at the time of an assessment, and this remains invariant over samples of assessment tasks. Their trait score is not a function of what tasks they perform – their performance on the tasks is a function of their ability.

Measurement Models: IRT v. Rasch Most IRT models are based on a paradigm that identifies a model which explains variation in the data – to find a model that best characterizes the data. Rasch is an approach that is based on the paradigm of constructing a measure which can characterize a construct on a linear scale – such that the total score fully characterizes a person on a given construct.

Rasch Philosophy Rasch models provide a basis and justification for obtaining person locations on a continuum from total scores on assessments. Although it is not uncommon to treat total scores directly as measurements, they are actually counts of discrete observations rather than measurements.

Each observation represents the observable outcome of a comparison between a person and item. Such outcomes are directly analogous to the observation of the rotation of a balance scale in one direction or another. This observation would indicate that one or other object has a greater mass, but counts of such observations cannot be treated directly as measurements.

Item Characteristic Curve

Test Characteristic Curve

4 Points on the Raw Score Scale

Test Characteristic Curve 0.5 on the Rasch Scale 4 Points on the Raw Score Scale

Test Characteristic Curve 4 Points on the Raw Score Scale 0.5 on the Rasch Scale

Test Characteristic Curve 4 Points on the Raw Score Scale 1.2 Point on the Rasch Scale 4 Points on the Raw Score Scale 0.5 on the Rasch Scale

From Numbers to Meaning Numbers themselves do not mean much. – Is 10 meters a short distance? Long distance? We need context to bring meaning to the measure: 10 meters. However, 10 meters should always be 10 meters, no matter who takes the measure or how it is taken.

Sample Dependent Statistics Is an item with a p-value of.90 easy or difficult? … 90% passed the item Is a person with a score of 5 out of 50 items low in ability? … correctly answered 10% of the items

IRT Scaling Person-free item difficult – Locates the items on the ability continuum Item-free person ability – Locates the person on the ability continuum Places items and persons on the same scale – the ITEM MAP

Item Map

Construct MAP 1. Explains the construct; interpretation guide 2. Enables design of items that will lead individuals to give responses that inform important levels of the construct map; identify relevant item features 3. Provides criterion to analyze responses regarding degree of consistency with intended construct 4. Item selection or retention should be based on informed professional judgment

Construct Map Describing Task Characteristics

IRT Assumptions Local independence – Responses to differing items on the test are independent of one another, conditional on the trait they have in common (i.e., conditional on the latent trait items are uncorrelated). Unidimensionality – Only one dominant trait is being measured. – Multidimensional models now exist.

Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests. Standards for Educational & Psychological Testing (AERA, APA, NCME, 1999) Validity

Validation is the process of gathering evidence to achieve these goals, including evidence related to the – construct, – content, – response processes, – internal structure, – relations to other variables, and – the consequential bases of validity. Validation

In all cases, the most important sources of validity evidence are those that are most closely related the nature of the inferences we draw regarding scores. We can begin to secure evidence to support our intended inferences in the design of any test. Evidence

OTL: Uses of Assessment &INST TITLE = OTL Math Education Assessment Uses DATA = merged.dat NAME1 = 1 NAMELEN = 29 ITEM1 = 30 NI = 136 IDFILE =* * CODES = 1234 MODELS = R GROUPS = 0 &END

Response Data

Run Analysis…

Beliefs: Quality of Instruction &INST TITLE = Program as a whole: program effectiveness DATA = merged2.dat NAME1 = 1 NAMELEN = 16 ITEM1 = 17 NI = 53 IDFILE =* * CODES = MODELS = R GROUPS = 0 &END

Response Data

Run Analysis…

Evaluate the functioning of scale properties We can ask questions about the response scale: Does the 5-point scale work as interpreted? Do we need 5 points? 35

Minnesota Student Survey Mental Distress

Mental Distress Item During the last 30 days, have you felt you were under any stress or pressure? o Yes, almost more than I could take o Yes, quite a bit of pressure o Yes, more than usual o Yes, a little o No 37

38

Mental Distress Item During the last 30 days, have you felt sad? o All the time o Most of the time o Some of the time o A little of the time o None of the time 39

40

Rasch Logit Scale 2. +.# | U49hr.45.# |. | U49br.45.## | U50r.45.# |.# | U51r.35 U49dr.45 U49fr.45 1.# + U52r.35 U53r.35.#### |.## |.##### | U49hr.35.### |.####### |.### | U52r.25 U49br.35 U50r.35 0.######## +M Item Map - Thresholds 41

Other Possible Analysis Differential Item Functioning – Group difference, conditioned on trait level – Form of measurement invariance; item bias Equating over time – Constant score scale location over time – Keep item parameters and fix item locations – Examine parameter drift over time 42

Returning to TIMSS Liking Math… 43 Mean = 2.45 Rasch = 0.40 Mean = 2.46 Rasch = 0.60

Ordering of Items by… Item MeanRasch location Math is important Enjoy learning math Like math Math is easyMath is boring Like a job involving mathMath is easy Math is boringLike a job involving math 44

45 | Easy.35 | Boring.35 Job T+ |.### |.##### | S|.####### |T |.######### |.############ | Boring.25 Easy.25 Job.25.########## | 0. +M.####### |.##### S|. |.### |S.## | # T|T Job.15. | | Boring.15 Easy.15 EASY BORING JOB BORING EASY JOB BORING EASY