Item Response Theory Applications in Health Ron D. Hays, Discussant

Slides:



Advertisements
Similar presentations
Standardized Scales.
Advertisements

Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
How Should We Assess the Fit of Rasch-Type Models? Approximating the Power of Goodness-of-fit Statistics in Categorical Data Analysis Alberto Maydeu-Olivares.
Item Response Theory in Health Measurement
How does PROMIS compare to other HRQOL measures? Ron D. Hays, Ph.D. UCLA, Los Angeles, CA Symposium 1077: Measurement Tools to Enhance Health- Related.
Introduction to Item Response Theory
Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.
Cross-Cultural Use of Measurements: Development of the Chinese SF-36 Health Survey Xinhua S. Ren, Ph.D. Boston University School of Public Health, Boston,
© UCLES 2013 Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh.
1 Reducing the duration and cost of assessment with the GAIN: Computer Adaptive Testing.
Identification of Misfit Item Using IRT Models Dr Muhammad Naveed Khalid.
Item Response Theory. What’s wrong with the old approach? Classical test theory –Sample dependent –Parallel test form issue Comparing examinee scores.
Kaizen–What Can I Do To Improve My Program? F. Jay Breyer, Ph.D. Presented at the 2005 CLEAR Annual Conference September Phoenix,
Research AND (RAND) November 22, 2010 Westat, Rockville, MD.
C.E.L.T Developing a Comprehensive Faculty Evaluation System Jeff Schlicht, Ph.D. Associate Professor Department of Health Promotion and Exercise Sciences.
Construction and Evaluation of Multi-item Scales Ron D. Hays, Ph.D. RCMAR/EXPORT September 15, 2008, 3-4pm
Evaluating Measurement Equivalence between Hispanic and Non-Hispanic Responders to the English Form of the HINTS Information SEeking Experience (ISEE)
Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,
Rasch trees: A new method for detecting differential item functioning in the Rasch model Carolin Strobl Julia Kopf Achim Zeileis.
Measurement Issues by Ron D. Hays UCLA Division of General Internal Medicine & Health Services Research (June 30, 2008, 4:55-5:35pm) Diabetes & Obesity.
1 11/17/2015 Psychometric Modeling and Calibration Ron D. Hays, Ph.D. September 11, 2006 PROMIS development process session 10:45am-12:30 pm.
Pearson Copyright 2010 Some Perspectives on CAT for K-12 Assessments Denny Way, Ph.D. Presented at the 2010 National Conference on Student Assessment June.
This article and any supplementary material should be cited as follows: Resnik L, Tian F, Ni P, Jette A. Computer-adaptive test to measure community reintegration.
Item Response Theory (IRT) Models for Questionnaire Evaluation: Response to Reeve Ron D. Hays October 22, 2009, ~3:45-4:05pm
Item Response Theory in Health Measurement
Considerations in Comparing Groups of People with PROs Ron D. Hays, Ph.D. UCLA Department of Medicine May 6, 2008, 3:45-5:00pm ISPOR, Toronto, Canada.
Overview of Item Response Theory Ron D. Hays November 14, 2012 (8:10-8:30am) Geriatrics Society of America (GSA) Pre-Conference Workshop on Patient- Reported.
Instrument Development and Psychometric Evaluation: Scientific Standards May 2012 Dynamic Tools to Measure Health Outcomes from the Patient Perspective.
Statistics & Evidence-Based Practice
Evaluating Patient-Reports about Health
Introduction to ASCQ-MeSM
Friday Harbor Laboratory University of Washington August 22-26, 2005
Daniel Muijs Saad Chahine
PeerWise Student Instructions
Regional Assessment Network Meeting
Ron D. Hays July 28, 2017 Powerpoint file posted at: Health Services
Survey research CAS 204 Furness.
NKU Executive Doctoral Co-hort August, 2012 Dot Perkins
UCLA Department of Medicine
Evaluating Patient-Reports about Health
UCLA Department of Medicine
Evaluating Multi-Item Scales
PROMIS-29 V2.0 Physical and Mental Health Summary Scores Ron D. Hays
Ron D. Hays GIM-HSR Friday Noon Seminar Series November 4, 2016
System and Study of Patient
Regional Assessment Network Meeting
Survey research CAS 204.
Paul K. Crane, MD MPH Dan M. Mungas, PhD
Evaluating IRT Assumptions
III Choosing the Right Method Chapter 10 Assessing Via Tests
Introduction to ASCQ-Me®
Dr. Robert H. Meyer Research Professor and Director
Cerebral palsy profile (cp-pro)
Introduction to ASCQ-Me®
EVALUATING STATISTICAL REPORTS
His Name Shall Be Revered …
Understanding Standards Modern Studies
Sean C. Wright Illinois Institute of Technology, APTMetrics
Spanish and English Neuropsychological Assessment Scales - Guiding Principles and Evolution Friday Harbor Psychometrics Workshop 2005.
Unidimensionality (U): What’s it good for in applied research?
A Multi-disciplinary Perspective on Decision-making and Creativity:
Critical Appraisal วิจารณญาณ
Innovative Approaches for Examining Alignment
AACC Mini Conference June 8-9, 2011
Evaluating Multi-item Scales
Multitrait Scaling and IRT: Part I
Part Time “Detailee” for the National Cancer Institute (NCI) under the Intergovernmental Personnel Act (IPA) Ron Hays
Evaluating Multi-item Scales
UCLA Department of Medicine
Patient-reported Outcome Measures
Presentation transcript:

Item Response Theory Applications in Health Ron D. Hays, Discussant These comments are supported in part by the UCLA/DREW Project EXPORT, National Institutes of Health, National Center on Minority Health & Health Disparities, (P20- MD00148-01) and the UCLA Center for Health Improvement in Minority Elders / Resource Centers for Minority Aging Research, National Institutes of Health, National Institute of Aging, (AG-02-004).                                               

Hard to find fault with this trio

Issues Sample Size Samples of 500 or more have been recommended for estimating the graded response model (Reise & Yu, 1990). Bonnie: n = 139 per group Leo: n > 10,000 per group (cross-validation)

Issues continued Differential item and scale functioning - Bonnie: CESD mode of administration; detection of DIF - Leo: CAHPS English vs. Spanish; Compensatory & non-compensatory; adjustment for DIFF Mode difference greatest in middle of scale -- extremes not well estimated

Issues continued Implications for scale or composite creation - Leo: items assessing getting care quickly (administered using never to always response scale) produced DIFF; not items assessing getting needed care (administered using no problem to big problem response scale)

Issues Continued Appropriate Model -- Number of parameters (1-PL, 2-PL, 3-PL) -- Dimensionality (SF-36 multidimensional--Bonnie) Assumptions of IRT Model Tested—Unidimensionality and local independence - Leo: linear confirmatory factor analysis - Alternatives: categorical confirmatory factor analysis, full information factor analysis, Rasch residual factor analysis

Issues Continued Appropriate Unit of Analysis -- Leo: evaluated person-level but CAHPS composites used for health plan comparisons. Cross-over Design Mail->Phone Phone->Mail Estimate IRT scores at both time points Analogous to same people taking CARES, EORTC, FACT, SF-36 (Chi-Hung)

CTT and IRT results should be compared “The critical question is not whether IRT models are superior to CTT methods. Of course they are, in the same way that a modern CD player provides superior sound when compared to a 1960s LP player…The real question is, does application of IRT result in sufficient improvement in the quality of … measurement to justify the added complexity?” Reise & Henson, J Personality Assessment, in press.                                                IRT models support research, test development and statistical analyses in many of ETS's major testing programs such as the GRE®, GMAT®, TOEFL®, and CLEP®. IRT models underlie the development of computer-based tests (CBT).

Expert ratings of stigma (Bonnie) Lack of association between expert ratings of stigma of item and mode DIFF may mean that experts are not the best source of what items are more or less socially desirable. All CESD items are potentially subject to response bias. Consistent evidence that mode effects are due to SDRS bias. Qualitative research may help to decide among competing explanations.

Issues continued Item banking (Chi-Hung) -- Evaluation of mode effects (CAT) -- Maintenance of the bank (public or private) -- Capitalize on other advantages of computer administration such as: -> Interaction with respondent -> Immediate feedback of results

National Cancer Institute Drug Information Association Coming June 24-25, 2004!! Advances in Health Outcomes Measurement: Exploring the Current State and the Future Applications of Item Response Theory, Item Banks and Computer-Adaptive Testing Conference co-sponsored by the National Cancer Institute Drug Information Association

For more information about conference, contact: Bryce Reeve, Ph.D. Outcomes Research Branch National Cancer Institute Phone: 301-594-6574 E-Mail: reeveb@mail.nih.gov

My contact information hays@rand.org drhays@ucla.edu rhays@ix.netcom.com