Introduction to IRT/Rasch Measurement with Winsteps Ken Conrad, University of Illinois at Chicago Barth Riley and Michael Dennis, Chestnut Health Systems.

Slides:



Advertisements
Similar presentations
Psychometrics to Support RtI Assessment Design Michael C. Rodriguez University of Minnesota February 2010.
Advertisements

Implications and Extensions of Rasch Measurement.
Item Analysis.
Test Development.
Scales and Indices Scales and Indices combine several categories in a question or several questions into a “composite measure” that represents a larger.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 23, 2012.
Quantitative Methods in HPELS 440:210
Chi-square and F Distributions
Item Response Theory in the Secondary Classroom: What Rasch Modeling Can Reveal About Teachers, Students, and Tests. T. Jared Robinson tjaredrobinson.com.
Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
Part II Sigma Freud & Descriptive Statistics
Item Response Theory in Health Measurement
CH. 9 MEASUREMENT: SCALING, RELIABILITY, VALIDITY
Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.
A quick introduction to the analysis of questionnaire data John Richardson.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Index and Scale Similarities: Both are ordinal measures of variables. Both rank order units of analysis in terms of specific variables. Both are measurements.
1 Measurement Measurement Rules. 2 Measurement Components CONCEPTUALIZATION CONCEPTUALIZATION NOMINAL DEFINITION NOMINAL DEFINITION OPERATIONAL DEFINITION.
Chapter 6 Indexes, Scales, and Typologies. Index and Scale  Index  Constructed by accumulating scores assigned to individual attributes.  Scale  Constructed.
Validity and Validation: An introduction Note: I have included explanatory notes for each slide. To access these, you will probably have to save the file.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
1 Reducing the duration and cost of assessment with the GAIN: Computer Adaptive Testing.
Measurement Problems within Assessment: Can Rasch Analysis help us? Mike Horton Bipin Bhakta Alan Tennant.
Measurement and Data Quality
Item Response Theory for Survey Data Analysis EPSY 5245 Michael C. Rodriguez.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Measurement in Exercise and Sport Psychology Research EPHE 348.
Instrumentation.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
CHAPTER 6, INDEXES, SCALES, AND TYPOLOGIES
1 EPSY 546: LECTURE 1 INTRODUCTION TO MEASUREMENT THEORY George Karabatsos.
01/20151 EPI 5344: Survival Analysis in Epidemiology Interpretation of Models March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Descriptive Statistics becoming familiar with the data.
Chapter 1 Measurement, Statistics, and Research. What is Measurement? Measurement is the process of comparing a value to a standard Measurement is the.
Variables and their Operational Definitions
VALIDITY AND VALIDATION: AN INTRODUCTION Note: I have included explanatory notes for each slide. To access these, you will probably have to save the file.
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
Basic concepts in measurement Basic concepts in measurement Believe nothing, no matter where you read it, or who said it, unless it agrees with your own.
Validity: Introduction. Reliability and Validity Reliability Low High Validity Low High.
The ABC’s of Pattern Scoring
University of Ostrava Czech republic 26-31, March, 2012.
Analysis and Interpretation: Analysis of Variance (ANOVA)
Item Factor Analysis Item Response Theory Beaujean Chapter 6.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Item Response Theory in Health Measurement
Sampling Design & Measurement Scaling
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
Item Response Theory Dan Mungas, Ph.D. Department of Neurology University of California, Davis.
Chapter 6 Indexes, Scales, And Typologies. Chapter Outline Indexes versus Scales Index Construction Scale Construction.
Essentials for Measurement. Basic requirements for measuring 1) The reduction of experience to a one dimensional abstraction. 2) More or less comparisons.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Indexes and Scales Why use a “composite” measure of a concept? ▫ There is often no clear “single” indicator ▫ Increase the range of variation ▫ Make data.
Evaluating Patient-Reports about Health
CHAPTER 6, INDEXES, SCALES, AND TYPOLOGIES
UCLA Department of Medicine
Evaluating Patient-Reports about Health
UCLA Department of Medicine
Classical Test Theory Margaret Wu.
Item Analysis: Classical and Beyond
Paul K. Crane, MD MPH Dan M. Mungas, PhD
Measuring Social Life: How Many? How Much? What Type?
Test Development Test conceptualization Test construction Test tryout
Chapter 6 Indexes, Scales, And Typologies
Mohamed Dirir, Norma Sinclair, and Erin Strauts
Item Analysis: Classical and Beyond
Item Analysis: Classical and Beyond
UCLA Department of Medicine
Presentation transcript:

Introduction to IRT/Rasch Measurement with Winsteps Ken Conrad, University of Illinois at Chicago Barth Riley and Michael Dennis, Chestnut Health Systems

Agenda 12:30. Ken Conrad: Power-point presentation on classical test theory compared to Rasch, includes history and introduction to the Rasch model. 2:15. Break 2:30. Discussion of an application of Rasch analysis in the measurement of posttraumatic stress disorder with interpretation of Rasch/Winsteps output. 3:15. Barth Riley: Implications and Extensions of Rasch Measurement. 4:15. Break. 4:30. Mike Dennis: Practical applications of IRT/Rasch in SUD screening and outcome assessment 5:15. Open discussion and Q & A. 5:30. End of workshop.

The Dream of Rulers of Human Functioning Beyond organ function to human function WHO, 1947 E.g., quality of life, need to ask person 1970s--Physical, social, and mental health issues Measuring many constructs requires many itemstime, $, burden Todayneed for psychometric efficiency w/o loss of reliability and construct validity

Prevailing Paradigm, Classical Test Theory CTTmore items for more reliability Since we seek efficiency (fewer items), items tend to be where most of the people arearound the mean. Resultredundancy at mid-range, few items at extremes, ceiling and floor effects Impossible to measure improvement of those in ceiling and decline of those in floor.

How children measure wooden rods (from Piaget) Classificationseparate the rods from the cups, the balls, etc. (nominal) Seriationline them up by size (ordinal) Iterationdevelop a unit to know how much bigger (interval) Standardizationmake a rule(r) and a process for determining how many units each rod has Children know that classification and seriation are not measurement, Stevens did not: nominal, ordinal, interval, ratio

Improvement: IRT/Rasch measurement and computers Rasch measurement model enables construction of a ruler with as many items as we want at any level of the construct The computer enables choice of items based on each persons pattern of responses. Each test is tailored to the individual, and not all of the items are needed.

A measure is a sample of items from an infinite domain of items that represent the attribute of interest. Items are treated as replicates of one another in the sense that differences among the items are ignored in scaling. More items=more reliability Everyone gets the same items Answers needed to all items Classical Test Theory

Ranking is sample dependent E.g., NBA players, jockeys. Height could be in the same 1-5 ordinal metric where both a jockey and NBA player could be rated 5, but this could only be interpreted with reference to a particular sample. The sample defines height. With interval scaling, height defines the sample. Over 6=NBA, under 6=jockey.

Uses ordinal data as interval. Using presumably impermissible transformations, i.e. using ordinal as interval, usually makes little, if any, difference to results of most analyses. Thus, if it behaves like an interval scale, it can be treated as one. Just use the raw scores. Add em up. Clean and easy Classical Test Theory

Assumption: all items are created equal But we know that is not true. Is that how we measure potatoes? How about spelling? Items actually range from: Easy->hard Like addition -> division E.g., Guttman: Lack of recent practice on item 5: Educated guess on item 8: Slow, nervous start:

No Difficulty Parameter in CTT. What if two students both got 5 out of 10 correct, but one got the 5 easiest right and the other the 5 hardest? Easy->hard Peter Paul Do they have the same ability? Wouldnt you like to get a better idea of what happened on Pauls test? Did he arrive late? Were test pages missing? Maybe they were word problems, and Paul is a foreign student.

With CTT, extremely difficult to compare a persons scores on two or more different testsusually compare z-scores. Assumes that samples of both tests center on the same mean. Assumes that all of the tests are normally distributed, which is rarely the case.

CTT =take the test, e.g., SD, D, A, or SA on 50 items. What if there is missing data? CTT uses ordinal scaling, but assumes equal intervals in the rating scale. However, we know that distances between scale points usually are not equal, e.g., The President is doing a good job. SD D A SA To WWII veterans: Do you wear fashionable shoes? N SD D A SA CTT gives us very limited ability to examine the performance of our rating scales. Do they really work the way we want them to? Assumptions of CTT

Cronbachs Alpha Adding items improves alpha, but are they good items? Ceiling and floor effects improve alpha. CTT assumes homoscedasticitythat the error of measurement is the same at the high end of the scale as in the middle or at the low end. However, ordinal measures are biased, especially at the extremes where there is much more error.

To Count > To Measure E.G., From counting potatoes to measuring their quality. From counting number of drinks to measuring substance use disorders. From summing Likert ratings to linear, interval measurement.