Evaluating Multi-item Scales

Slides:

Advertisements

Similar presentations

DIF Analysis Galina Larina of March, 2012 University of Ostrava.

Advertisements

Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.

Structural Equation Modeling Using Mplus Chongming Yang Research Support Center FHSS College.

Item Response Theory in Health Measurement

Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

1 Evaluating Multi-Item Scales Using Item-Scale Correlations and Confirmatory Factor Analysis Ron D. Hays, Ph.D. RCMAR.

SOC 681 James G. Anderson, PhD

© UCLES 2013 Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh.

1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.

1 8/14/2015 Evaluating the Significance of Health-Related Quality of Life Change in Individual Patients Ron Hays October 8, 2004 UCLA GIM/HSR.

1 Health-Related Quality of Life as an Indicator of Quality of Care Ron D. Hays, Ph.D. HS216—Quality Assessment: Making the Business Case.

Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.

The ABC’s of Pattern Scoring Dr. Cornelia Orr. Slide 2 Vocabulary Measurement – Psychometrics is a type of measurement Classical test theory Item Response.

CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.

Use of CAHPS® Database by Researchers: Findings Related to Differences by Race and Ethnicity Ron D. Hays, Ph.D. RAND.

Tests and Measurements Intersession 2006.

CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.

Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.

Measurement Issues by Ron D. Hays UCLA Division of General Internal Medicine & Health Services Research (June 30, 2008, 4:55-5:35pm) Diabetes & Obesity.

1 Differential Item Functioning in Mplus Summer School Week 2.

Measurement Models: Identification and Estimation James G. Anderson, Ph.D. Purdue University.

Differential Item Functioning. Anatomy of the name DIFFERENTIAL –Differential Calculus? –Comparing two groups ITEM –Focus on ONE item at a time –Not the.

Item Response Theory (IRT) Models for Questionnaire Evaluation: Response to Reeve Ron D. Hays October 22, 2009, ~3:45-4:05pm

Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, PhD February 6, 2008 (3:30-6:30pm) HS 249F.

Multi-item Scale Evaluation Ron D. Hays, Ph.D. UCLA Division of General Internal Medicine/Health Services Research

Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire Design and Testing.

CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.

Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire Design and Testing.

Psychometric Evaluation of Questionnaire Design and Testing Workshop December , 10:00-11:30 am Wilshire Suite 710 DATA.

Evaluating Self-Report Data Using Psychometric Methods Ron D. Hays, PhD February 8, 2006 (3:00-6:00pm) HS 249F.

Item Response Theory in Health Measurement

Considerations in Comparing Groups of People with PROs Ron D. Hays, Ph.D. UCLA Department of Medicine May 6, 2008, 3:45-5:00pm ISPOR, Toronto, Canada.

Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire.

Overview of Item Response Theory Ron D. Hays November 14, 2012 (8:10-8:30am) Geriatrics Society of America (GSA) Pre-Conference Workshop on Patient- Reported.

Evaluating Multi-item Scales Ron D. Hays, Ph.D. UCLA Division of General Internal Medicine/Health Services Research HS237A 11/10/09, 2-4pm,

Latent Curve Modeling to Understand Achievement Emotions Gavin Brown Quant-DARE Methods Showcase Feb 22, 2016.

Evaluating Patient-Reports about Health

Psychometric Evaluation of Items Ron D. Hays

QDET2, Miami, FL, Hibiscus A

Vertical Scaling in Value-Added Models for Student Learning

The University of Manchester

Structural Equation Modeling using MPlus

CHAPTER 7 Linear Correlation & Regression Methods

A Different Way to Think About Measurement Development:

UCLA Department of Medicine

Evaluating Patient-Reports about Health

UCLA Department of Medicine

Evaluating Multi-Item Scales

Ron D. Hays GIM-HSR Friday Noon Seminar Series November 4, 2016

assessing scale reliability

CJT 765: Structural Equation Modeling

Evaluating IRT Assumptions

Final Session of Summer: Psychometric Evaluation

الاختبارات محكية المرجع بناء وتحليل (دراسة مقارنة )

תוקף ומהימנות של ה- Dementia Quality of Life בארה"בMeasure

M248: Analyzing data Block D UNIT D2 Regression.

Confirmatory Factor Analysis

15.1 The Role of Statistics in the Research Process

Evaluating the Psychometric Properties of Multi-Item Scales

Evaluating Multi-Item Scales

Multitrait Scaling and IRT: Part I

Evaluating Multi-item Scales

Item Response Theory Applications in Health Ron D. Hays, Discussant

Evaluating Multi-Item Scales

Psychometric testing and validation (Multi-trait scaling and IRT)

Introduction to Psychometric Analysis of Survey Data

UCLA GIM/HSR Research Seminar Series

Evaluating the Significance of Individual Change

GIM & HSR Research Seminar: October 5, 2018

UCLA Department of Medicine

Presentation transcript:

Evaluating Multi-item Scales Ron D. Hays, Ph.D. UCLA Division of General Internal Medicine/Health Services Research drhays@ucla.edu http://twitter.com/RonDHays http://gim.med.ucla.edu/FacultyPages/Hays/ HS225A 11/04/10, 10-11:50 pm, 51-279 CHS 1

Example Responses to 2-Item Scale ID Poor Fair Good Very Good Excellent 01 2 02 1 03 04 05 2

Respondents (BMS) 4 11.6 2.9 Items (JMS) 1 0.1 0.1 Cronbach’s Alpha 01 55 02 45 03 42 04 35 05 22 Respondents (BMS) 4 11.6 2.9 Items (JMS) 1 0.1 0.1 Resp. x Items (EMS) 4 4.4 1.1 Total 9 16.1 Source SS MS df Alpha = 2.9 - 1.1 = 1.8 = 0.62 2.9 2.9 3

Computations Respondents SS Item SS Total SS (102+92+62+82+42)/2 – 372/10 = 11.6 Item SS (182+192)/5 – 372/10 = 0.1 Total SS (52+ 52+42+52+42+22+32+52+22+22) – 372/10 = 16.1 Res. x Item SS= Tot. SS – (Res. SS+Item SS) 4

Alpha for Different Numbers of Items and Average Correlation Average Inter-item Correlation ( r ) Number of Items (k) .0 .2 .4 .6 .8 1.0 2 .000 .333 .572 .750 .889 1.000 4 .000 .500 .727 .857 .941 1.000 6 .000 .600 .800 .900 .960 1.000 8 .000 .666 .842 .924 .970 1.000 Alphast = k * r 1 + (k -1) * r 5

Spearman-Brown Prophecy Formula ( ) N • alpha x alpha = y 1 + (N - 1) * alpha x N = how much longer scale y is than scale x 6

Example Spearman-Brown Calculation MHI-18 18/32 (0.98) (1+(18/32 –1)*0.98 = 0.55125/0.57125 = 0.96 7

Reliability Minimum Standards 0.70 or above (for group comparisons) 0.90 or higher (for individual assessment) SEM = SD (1- reliability)1/2 8

Intraclass Correlation and Reliability Model Reliability Intraclass Correlation One-way Two-way fixed Two-way random BMS = Between Ratee Mean Square WMS = Within Mean Square JMS = Item or Rater Mean Square EMS = Ratee x Item (Rater) Mean Square 9 9

Equivalence of Survey Data Missing data rates were significantly higher for African Americans on all CAHPS items Internal consistency reliability did not differ Plan-level reliability estimates were significantly lower for African Americans than whites M. Fongwa et al. (2006). Comparison of data quality for reports and ratings of ambulatory care by African American and White Medicare managed care enrollees. Journal of Aging and Health, 18, 707-721. 10 10

Item-scale correlation matrix 12 12

Item-scale correlation matrix 13 13

14

15

Confirmatory Factor Analysis Observed covariances compared to covariances generated by hypothesized model Statistical and practical tests of fit Factor loadings Correlations between factors 16

Fit Indices 1 - Non-normed fit index: Normed fit index: 2  -  2 Normed fit index: Non-normed fit index: Comparative fit index: null model  2 2 2   null null - model df df null model  2 null - 1 df null 2  - df 1 - model model  - 2 df null null 17

Hays, Cunningham, Ettl, Beck & Shapiro (1995, Assessment) 205 symptomatic HIV+ individuals receiving care at two west coast public hospitals 64 HRQOL items 9 access, 5 social support, 10 coping, 4 social engagement and 9 HIV symptom items

Differential Item Functioning (2-Parameter Model) AA White White Slope DIF Location DIF AA 21 Location = uniform; Slope = non-uniform 21

Language DIF Example Ordinal logistic regression to evaluate differential item functioning Purified IRT trait score as matching criterion McFadden’s pseudo R2 >= 0.02 Thetas estimated in Spanish data using English calibrations Linearly transformed Spanish calibrations (Stocking-Lord method of equating)

Lordif http://CRAN.R-project.org/package=lordif Model 1 : logit P(ui >= k) = αk + β1 * ability Model 2 : logit P(ui >= k) = αk + β1 * ability + β2 * group Model 3 : logit P(ui >= k) = αk + β1 * ability + β2 * group + β3 * ability * group DIFF assessment (log likelihood values compared): - Overall: Model 3 versus Model 1 Non-uniform: Model 3 versus Model 2 Uniform: Model 2 versus Model 1

Sample Demographics English (n = 1504) Spanish (n = 640) % Female 52% 58% % Hispanic 11% 100% Education < High school 2% 14% High school 18% 22% Some college 39% 31% College degree 41% 33% Age 51 (SD = 18) 38 (SD = 11)

Results One-factor categorical model fit the data well (CFI=0.971, TLI=0.970, and RMSEA=0.052). Large residual correlation of 0.67 between “Are you able to run ten miles” and “Are you able to run five miles?” 50 of the 114 items had language DIF 16 uniform 34 non-uniform

Impact of DIF on Test Characteristic Curves (TCCs)

Stocking-Lord Method Spanish calibrations transformed so that their TCC most closely matches English TCC. a* = a/A and b* = A * b + B Optimal values of A (slope) and B (intercept) transformation constants found through multivariate search to minimize weighted sum of squared distances between TCCs of English and Spanish transformed parameters Stocking, M.L., & Lord, F.M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201-210.

CAT-based Theta Estimates Using English (x-axis) and Spanish (y-axis) Parameters for 114 Items in Spanish Sample (n = 640, ICC = 0.89)

CAT-based Theta Estimates Using English (x-axis) and Spanish (y-axis) Parameters for 64 non-DIF Items in Spanish Sample (n = 640, ICC = 0.96)

Implications Hybrid model needed to account for language DIF English calibrations for non-DIF items Spanish calibrations for DIF items

Thank you. 31