Overview of Item Response Theory Ron D. Hays November 14, 2012 (8:10-8:30am) Geriatrics Society of America (GSA) Pre-Conference Workshop on Patient- Reported.

Slides:



Advertisements
Similar presentations
Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
Advertisements

Differential Item Functioning of the English- and Spanish-Administered HINTS Psychological Distress Scale Chih-Hung Chang, Ph.D. Feinberg School of Medicine.
Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.
Item Response Theory in Health Measurement
How does PROMIS compare to other HRQOL measures? Ron D. Hays, Ph.D. UCLA, Los Angeles, CA Symposium 1077: Measurement Tools to Enhance Health- Related.
Remaining Challenges and What to Do Next: Undiscovered Areas Ron D. Hays UCLA Division of General Internal Medicine & Health Services Research
Introduction to Item Response Theory
AN OVERVIEW OF THE FAMILY OF RASCH MODELS Elena Kardanova
Models for Measuring. What do the models have in common? They are all cases of a general model. How are people responding? What are your intentions in.
Why Patient-Reported Outcomes Are Important: Growing Implications and Applications for Rheumatologists Ron D. Hays, Ph.D. UCLA Department of Medicine RAND.
1 Evaluating Multi-Item Scales Using Item-Scale Correlations and Confirmatory Factor Analysis Ron D. Hays, Ph.D. RCMAR.
Item Response Theory in Patient-Reported Outcomes Research Ron D. Hays, Ph.D., UCLA Society of General Internal Medicine California-Hawaii Regional Meeting.
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
© UCLES 2013 Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Item Response Theory. What’s wrong with the old approach? Classical test theory –Sample dependent –Parallel test form issue Comparing examinee scores.
1 Health-Related Quality of Life as an Indicator of Quality of Care Ron D. Hays, Ph.D. HS216—Quality Assessment: Making the Business Case.
Modern Test Theory Item Response Theory (IRT). Limitations of classical test theory An examinee’s ability is defined in terms of a particular test The.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
The emotional distress of children with cancer in China: An Item Response Analysis of C-Ped-PROMIS Anxiety and Depression Short Forms Yanyan Liu 1, Changrong.
Introduction Neuropsychological Symptoms Scale The Neuropsychological Symptoms Scale (NSS; Dean, 2010) was designed for use in the clinical interview to.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
SAS PROC IRT July 20, 2015 RCMAR/EXPORT Methods Seminar 3-4pm Acknowledgements: - Karen L. Spritzer - NCI (1U2-CCA )
Tests and Measurements Intersession 2006.
Evaluating Health-Related Quality of Life Measures June 12, 2014 (1:00 – 2:00 PDT) Kaiser Methods Webinar Series Ron D.Hays, Ph.D.
Quick Review Central tendency: Mean, Median, Mode Shape: Normal, Skewed, Modality Variability: Standard Deviation, Variance.
MEASUREMENT: SCALE DEVELOPMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
Measurement Issues by Ron D. Hays UCLA Division of General Internal Medicine & Health Services Research (June 30, 2008, 4:55-5:35pm) Diabetes & Obesity.
1 11/17/2015 Psychometric Modeling and Calibration Ron D. Hays, Ph.D. September 11, 2006 PROMIS development process session 10:45am-12:30 pm.
CHAPTER OVERVIEW The Measurement Process Levels of Measurement Reliability and Validity: Why They Are Very, Very Important A Conceptual Definition of Reliability.
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Item Response Theory (IRT) Models for Questionnaire Evaluation: Response to Reeve Ron D. Hays October 22, 2009, ~3:45-4:05pm
Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, PhD February 6, 2008 (3:30-6:30pm) HS 249F.
Multi-item Scale Evaluation Ron D. Hays, Ph.D. UCLA Division of General Internal Medicine/Health Services Research
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire Design and Testing.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 6 Putting Statistics to Work.
Copyright © 2005 Pearson Education, Inc. Slide 6-1.
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire Design and Testing.
Psychometric Evaluation of Questionnaire Design and Testing Workshop December , 10:00-11:30 am Wilshire Suite 710 DATA.
Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, PhD February 8, 2006 (3:00-6:00pm) HS 249F.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Item Response Theory in Health Measurement
Overlap between Subjective Well-being and Health-related Quality of Life. 3 Ron D. Hays, Ph.D. (Alina Palimaru) November 18, 2015 (11:30-12:00 noon) Geriatric.
Considerations in Comparing Groups of People with PROs Ron D. Hays, Ph.D. UCLA Department of Medicine May 6, 2008, 3:45-5:00pm ISPOR, Toronto, Canada.
Warsaw Summer School 2015, OSU Study Abroad Program Normal Distribution.
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire.
Impact of using a next button in a web-based health survey on time to complete and reliability of measurement Ron D. Hays (Rita Bode, Nan Rothrock, William.
Evaluating Multi-item Scales Ron D. Hays, Ph.D. UCLA Division of General Internal Medicine/Health Services Research HS237A 11/10/09, 2-4pm,
© 2009 Pearson Prentice Hall, Salkind. Chapter 5 Measurement, Reliability and Validity.
Evaluating Patient-Reports about Health
Looking at the both ‘ends’ of the social aptitude dimension
Psychometric Evaluation of Items Ron D. Hays
Vertical Scaling in Value-Added Models for Student Learning
UCLA Department of Medicine
Evaluating Patient-Reports about Health
UCLA Department of Medicine
Evaluating Multi-Item Scales
Evaluating IRT Assumptions
Final Session of Summer: Psychometric Evaluation
Evaluating Multi-item Scales
Multitrait Scaling and IRT: Part I
Evaluating Multi-item Scales
Item Response Theory Applications in Health Ron D. Hays, Discussant
Psychometric testing and validation (Multi-trait scaling and IRT)
Presentation transcript:

Overview of Item Response Theory Ron D. Hays November 14, 2012 (8:10-8:30am) Geriatrics Society of America (GSA) Pre-Conference Workshop on Patient- Reported Outcome Item Banks San Diego Convention Center (Room 14-A)

Latent Trait and Item Responses Latent Trait Item 1 Response P(X 1 =1) P(X 1 =0) 1 0 Item 2 Response P(X 2 =1) P(X 2 =0) 1 0 Item 3 Response P(X 3 =0) 0 P(X 3 =2) 2 P(X 3 =1) 1

Item Responses and Trait Levels Item 1 Item 2 Item 3 Person 1Person 2Person 3 Trait Continuum

Item Response Theory (IRT) IRT models the relationship between a person’s response Y i to the question (i) and his or her level of the latent construct  being measured by positing b ik estimates how difficult it is for the item (i) to have a score of k or more and the discrimination parameter a i estimates the discriminatory power of the item. If for one group versus another at the same level  we observe systematically different probabilities of scoring k or above then we will say that item i displays DIF

Some Nice IRT Features Category response curves (CRCs) Computer-adaptive testing (CAT) Assessing differential item functioning

Posttraumatic Growth Inventory Indicate for each of the statements below the degree to which this change occurred in your life as a result of your crisis. (A ppreciating each day) (0) I did not experience this change as result of my crisis (1)I experienced this change to a very small degree as a result of my crisis (2)I experienced this change to a small degree as a result of my crisis (3)I experienced this change to a moderate degree as a result of my crisis (4)I experienced this change to a great degree as a result of my crisis (5)I experienced this change to a very great degree as a result of my crisis

Category Response Curves Great Change No Change  Very small change No change Small change Moderate change Great change Very great change “Appreciating each day.”

Drop Response Options? Indicate for each of the statements below the degree to which this change occurred in your life as a result of your crisis. (A ppreciating each day) (0) I did not experience this change as result of my crisis (1)I experienced this change to a moderate degree as a result of my crisis (2)I experienced this change to a great degree as a result of my crisis (3)I experienced this change to a very great degree as a result of my crisis

Reword? Might be challenging to determine what alternative wording to use so that the replacements are more likely to be endorsed.

Keep as is? CAHPS global rating items – 0 = worst possible – 10 = best possible 11 response categories capture about 3 levels of information. – 10/9/8-0 or 10-9/8/7-0 Scale is administered as is and then collapsed in analysis

Response Burden vs. Standard Error (SE) 3-5 items per minute rule of thumb for paper survey – 8 items per minute for dichotomous items Lowering SE means adding or replacing existing items with more informative ones at the target range of the continuum.

Computer Administration Polimetrix panel sample – items per minute (automatic advance) – 8-9 items per minute (next button) Scleroderma patients at UCLA – 6 items per minute 12

CAT – Only as much response burden as needed for target level of reliability – For z-scores (mean = 0 and SD = 1): – Reliability = 1 – SE 2 = 0.90 (when SE = 0.32) – Information = 1/SE 2 = 10 (when SE = 0.32) – Reliability = 1 – 1/information CATs for patient-reported outcomes yield 0.90 reliability with about 5 items 13

Differential Item Functioning (DIF) Probability of choosing each response category should be the same for those who have the same estimated scale score, regardless of other characteristics Evaluation of DIF – Different subgroups – Mode differences

15 Location DIF Slope DIF DIF (2-parameter model) White AA White Location = uniform; Slope = non-uniform

16