Evaluating Multi-Item Scales

Slides:



Advertisements
Similar presentations
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Advertisements

Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
 Degree to which inferences made using data are justified or supported by evidence  Some types of validity ◦ Criterion-related ◦ Content ◦ Construct.
Part II Sigma Freud & Descriptive Statistics
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
How does PROMIS compare to other HRQOL measures? Ron D. Hays, Ph.D. UCLA, Los Angeles, CA Symposium 1077: Measurement Tools to Enhance Health- Related.
Why Patient-Reported Outcomes Are Important: Growing Implications and Applications for Rheumatologists Ron D. Hays, Ph.D. UCLA Department of Medicine RAND.
Methods for Assessing Safety Culture: A View from the Outside October 2, 2014 (9:30 – 10:30 EDT) Safety Culture Conference (AHRQ Watts Branch Conference.
1 Evaluating Multi-Item Scales Using Item-Scale Correlations and Confirmatory Factor Analysis Ron D. Hays, Ph.D. RCMAR.
Item Response Theory in Patient-Reported Outcomes Research Ron D. Hays, Ph.D., UCLA Society of General Internal Medicine California-Hawaii Regional Meeting.
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
Primer on Evaluating Reliability and Validity of Multi-Item Scales Questionnaire Design and Testing Workshop October 25, 2013, 3:30-5:00pm Wilshire.
Instrumentation.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Use of Health-Related Quality of Life Measures to Assess Individual Patients July 24, 2014 (1:00 – 2:00 PDT) Kaiser Permanente Methods Webinar Series Ron.
Tests and Measurements Intersession 2006.
Evaluating Health-Related Quality of Life Measures June 12, 2014 (1:00 – 2:00 PDT) Kaiser Methods Webinar Series Ron D.Hays, Ph.D.
Measuring Health-Related Quality of Life
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Measurement Issues by Ron D. Hays UCLA Division of General Internal Medicine & Health Services Research (June 30, 2008, 4:55-5:35pm) Diabetes & Obesity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Item Response Theory (IRT) Models for Questionnaire Evaluation: Response to Reeve Ron D. Hays October 22, 2009, ~3:45-4:05pm
Multi-item Scale Evaluation Ron D. Hays, Ph.D. UCLA Division of General Internal Medicine/Health Services Research
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire Design and Testing.
Measurement MANA 4328 Dr. Jeanne Michalski
Psychometric Evaluation of Questionnaire Design and Testing Workshop December , 10:00-11:30 am Wilshire Suite 710 DATA.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Ron D. Hays, Ph.D. Joshua S. Malett, Sarah Gaillot, and Marc N. Elliott Physical Functioning Among Medicare Beneficiaries International Society for Quality.
Chapter 6 - Standardized Measurement and Assessment
Measurement of Outcomes Ron D. Hays Accelerating eXcellence In translational Science (AXIS) January 17, 2013 (2:00-3:00 pm) 1720 E. 120 th Street, L.A.,
Considerations in Comparing Groups of People with PROs Ron D. Hays, Ph.D. UCLA Department of Medicine May 6, 2008, 3:45-5:00pm ISPOR, Toronto, Canada.
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire.
Overview of Item Response Theory Ron D. Hays November 14, 2012 (8:10-8:30am) Geriatrics Society of America (GSA) Pre-Conference Workshop on Patient- Reported.
Evaluating Multi-item Scales Ron D. Hays, Ph.D. UCLA Division of General Internal Medicine/Health Services Research HS237A 11/10/09, 2-4pm,
Evaluating Multi-Item Scales Health Services Research Design (HS 225B) January 26, 2015, 1:00-3:00pm CHS.
Reducing Burden on Patient- Reported Outcomes Using Multidimensional Computer Adaptive Testing Scott B. MorrisMichael Bass Mirinae LeeRichard E. Neapolitan.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 25 Critiquing Assessments Sherrilene Classen, Craig A. Velozo.
Evaluating Patient-Reports about Health
Survey Methodology Reliability and Validity
Psychometric Evaluation of Items Ron D. Hays
Reliability and Validity
UCLA Department of Medicine
Evaluating Patient-Reports about Health
UCLA Department of Medicine
Assessment Theory and Models Part II
Associated with quantitative studies
Test Design & Construction
Evaluation of measuring tools: validity
Understanding Results
Item Analysis: Classical and Beyond
Evaluating IRT Assumptions
Final Session of Summer: Psychometric Evaluation
By ____________________
A Multi-Dimensional PSER Stopping Rule
Item Analysis: Classical and Beyond
Evaluating the Psychometric Properties of Multi-Item Scales
Evaluating Multi-Item Scales
Evaluating Multi-item Scales
Multitrait Scaling and IRT: Part I
Evaluating Multi-item Scales
Item Analysis: Classical and Beyond
Patient-Reported Indicators of Quality of Care
Evaluating Multi-Item Scales
Psychometric testing and validation (Multi-trait scaling and IRT)
Introduction to Psychometric Analysis of Survey Data
Evaluating the Significance of Individual Change
UCLA Department of Medicine
Presentation transcript:

Evaluating Multi-Item Scales Health Services Research Design (HPM 225B) January 25, 2016, 1:00-3:00pm CHS 61-269 1

Iterative Development http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM205269.pdf

Physical Functioning Ability to conduct a variety of activities ranging from self-care to running Predictor of Hospitalizations, institutionalization, and mortality Six physical functioning items included in 2010 Consumer Assessment of Healthcare Providers and Systems (CAHPS®) Medicare Survey

Because of a health or physical problem are you unable to do or have any difficulty doing the following activities? Walking? Getting in or out of chairs? Bathing? Dressing? Using the toilet? Eating? I am unable to do this activity (0) Yes, I have difficulty (1) No, I do not have difficulty (2)

Medicare beneficiary sample (n = 366,701) 58% female 57% high school education or less 14% 18-64; 48% 65-74, 29% 75-84, 9% 85+

Item-Scale Correlations Walking (0, 1, 2) 0.71 Chairs (0, 1, 2) 0.80 Bathing (0, 1, 2) 0.83 Dressing (0, 1, 2) 0.86 Toileting (0, 1, 2) 0.84 Eating (0, 1, 2) 0.75 Possible 6-item scale range: 0-12 (2% floor, 65% ceiling)

Reliability Degree to which the same score is obtained when the target or thing being measured (person, plant or whatever) has not changed. Internal consistency (items) Need 2 or more items Test-retest (administrations) correlations Need 2 or more time points

Reliability Model Reliability Intraclass Correlation Two-way random Two-way mixed One-way BMS = Between Ratee Mean Square N = n of ratees WMS = Within Mean Square k = n of items or raters JMS = Item or Rater Mean Square EMS = Ratee x Item (Rater) Mean Square 10

Reliability Formulas Model Reliability Intraclass Correlation Two-way random Two-way mixed One-way BMS = Between Ratee Mean Square N = n of ratees WMS = Within Mean Square k = n of items or raters JMS = Item or Rater Mean Square EMS = Ratee x Item (Rater) Mean Square 11

Internal Consistency Reliability (Coefficient Alpha) (MSbms – MSems)/MSbms Ordinal alpha = 0.98 http://support.sas.com/resources/papers/proceedings14/2042-2014.pdf http://gim.med.ucla.edu/FacultyPages/Hays/utils/

Item-scale correlation matrix (“Multi-trait Scaling”) 13 13

Item-scale correlation matrix (“Multi-trait Scaling”) 14 14

Item-scale correlation matrix (“Multi-trait Scaling”) 15 15

Item Response Theory (IRT) IRT models the relationship between a person’s response Yi to the question (i) and his or her level of the latent construct () being measured by positing bik estimates how difficult it is to have a score of k or more on item (i). ai estimates the discriminatory power of the item.

Item Responses and Trait Levels Person 1 Person 2 Person 3 Trait Continuum www.nihpromis.org

Normal Curve (bell-shaped) z = -1 to 1 (68.2%); z = -2 to 2 (95.4%); z = -3 to 3 (99.6%)

Threshold #1 Parameter (Graded Response Model) Physical Functioning 1st Threshold Unable to do Walking -1.86 Chairs -1.91 Bathing -1.72 Dressing -1.78 Toileting -1.87 Eating -1.98

Normal Curve (bell-shaped) z = -1 to 1 (68.2%); z = -2 to 2 (95.4%); z = -3 to 3 (99.6%)

Threshold #2 Parameter (Graded Response Model) Physical Functioning 2nd Threshold Unable to do or have difficulty Walking -0.55 Chairs -0.81 Bathing -1.02 Dressing -1.10 Toileting -1.27 Eating -1.53

Normal Curve (bell-shaped) z = -1 to 1 (68.2%); z = -2 to 2 (95.4%); z = -3 to 3 (99.6%)

Item Parameters (Graded Response Model) Physical Functioning 1st Threshold Unable to do 2nd Threshold Have difficulty Slope (Discrimination) Walking -1.86 -0.55 4.63 Chairs -1.91 -0.81 5.65 Bathing -1.72 -1.02 6.34 Dressing -1.78 -1.10 8.23 Toileting -1.87 -1.27 7.23 Eating -1.98 -1.53 4.87

Confirmatory Factor Analysis (Polychoric* Correlations) Dressing Eating Bathing Walking Chairs * Estimated correlation between two underlying normally distributed continuous variables Toileting Residual correlations <= 0.04

Item Characteristic Curves Unable to do I do not have difficulty Have difficulty

Reliability = (Info – 1) / Info

Validity Content validity: Does measure “appear” to reflect what it is intended to (expert judges or patient judgments)? Do items operationalize concept? Do items cover all aspects of concept? Does scale name represent item content? Construct validity Are the associations of the measure with other variables consistent with hypotheses?

Physical Function Scale Correlations r = 0.39 (self-rated general health) r = -0.23 (number of chronic conditions) Cohen's rule of thumb for correlations that correspond to effect size rules of 0.20 SD, 0.50 SD and 0.80 SD are as follows: 0.100 is small correlation 0.243 is medium correlation 0.371 is large correlation (r's of 0.10, 0.30 and 0.50 are often cited as small, medium and large, respectively).

Differential Item Functioning (DIF) Probability of choosing each response category should be the same for those who have the same estimated scale score, regardless of other characteristics Evaluation of DIF by subgroups

DIF (2-parameter model) Men Women White Slope DIF Location DIF AA I cry when upset I get sad for no reason Higher Score = More Depressive Symptoms 34

Computer Adaptive Testing (CAT) Early research on CAT was funded by the U.S. military to enhance mental measurement. NASD = National Association of Security Dealers. In 1978 they began developing CAT for regulatory exam to assess knowledge of securities and securities future products among licensed brokers. The National Council of State Boards of Nursing starting using CAT in 1994.

Reliability Target for Use of Measures with Individuals Reliability ranges from 0-1 0.90 or above is goal SE = SD (1- reliability)1/2 Reliability = 1 – (SE/10)2 Reliability = 0.90 when SE = 3.2 95% CI = true score +/- 1.96 x SE T = z*10 + 50

In the past 7 days … I was grouchy [1st question] Never [39] Rarely [48] Sometimes [56] Often [64] Always [72] Estimated Anger = 56.1 SE = 5.7 (rel. = 0.68) Never: 39 Rarely: 48 Sometimes = 56 Often = 64 Always = 72

In the past 7 days … I felt like I was ready to explode [2nd question] Never Rarely Sometimes Often Always Estimated Anger = 51.9 SE = 4.8 (rel. = 0.77)

In the past 7 days … I felt angry [3rd question] Never Rarely Sometimes Often Always Estimated Anger = 50.5 SE = 3.9 (rel. = 0.85)

In the past 7 days … I felt angrier than I thought I should [4th question] - Never Rarely Sometimes Often Always Estimated Anger = 48.8 SE = 3.6 (rel. = 0.87)

In the past 7 days … I felt annoyed [5th question] Never Rarely Sometimes Often Always Estimated Anger = 50.1 SE = 3.2 (rel. = 0.90)

In the past 7 days … I made myself angry about something just by thinking about it. [6th question] Never Rarely Sometimes Often Always Estimated Anger = 50.2 SE = 2.8 (rel = 0.92) (95% CI: 44.7-55.7)

Recommended Reading Cappelleri, J. C., Lundy, J.J., & Hays, R. D. (2014). Overview of classical test theory and item response theory for quantitative assessment of items in developing patient-reported outcome measures. Clinical Therapeutics, 36 (5), 648-662

drhays@ucla. edu (310-794-2294). http://gim. med. ucla 44