Presentation is loading. Please wait.

Presentation is loading. Please wait.

Making PROs fit your needs: Practical basics of patient reported outcome measures Cindy J. Nowinski, MD, PhD.

Similar presentations


Presentation on theme: "Making PROs fit your needs: Practical basics of patient reported outcome measures Cindy J. Nowinski, MD, PhD."— Presentation transcript:

1 Making PROs fit your needs: Practical basics of patient reported outcome measures
Cindy J. Nowinski, MD, PhD

2 Outline PART 1 – Basic Concepts What are PROs Why ask about PROs
Measuring PROs Classical vs. IRT-Based measures Item Banks and what you can do with them Some NIH-sponsored PRO measures

3 Outline -continued PART 2- Case Studies Generic vs. disease specific
Measure content

4 Patient Reported Outcomes (PROs)
Any report of the status of a patient's health condition that comes directly from the patient, without interpretation of the patient's response by a clinician or anyone else. (U.S. FOOD AND DRUG ADMINISTRATION) Can include function and symptoms, but may also encompass all of health

5 Why Ask Patients? They know best about ….
Symptoms or health-related quality of life: pain, fatigue, distress How symptoms affect continuing meaningful activities Knowledge, attitudes, behavior Satisfaction Same biological/clinical value in 2 pts ≠ same impact Clinician and patient report often differ Health-related QOL scores (esp physical fx) predict survival in many conditions Hahn EA et al (2007) Mayo Clin Proc, 82, Cancer, chronic heart failure, COPD, RA

6 Glossary Domain = feeling, function or perception you want to measure (e.g., anxiety, physical function, general health perceptions) Item = statement or question that a patient answers E.g., “I feel hopeless” (Not at all, a little bit, somewhat, quite a bit, very much) Instrument = multiple items assessing the same construct (aka scale, measure, assessment) Legacy = existing instrument that is “gold standard” or a commonly used instrument

7 How do I choose a PRO measure?

8 Desirable Features Measures what you think it should measure
Quantitative (scale based) e.g., can get a score Covers the desired range of a construct/domain Psychometrically sound Meaningful (ability to walk, ability to manage a household, etc.) Questions are easy to understand Feasible Developed using rigorous quali tative and quantitative methods

9 “Psychometrically Sound”
Reliable Valid Responsive to change No floor or ceiling effects

10 Reliability (Precision)
Reliability = amount of error associated with a measurement Test-retest Intra-rater Other Reliable---All arrows hit the same spot This leads to precise measurement that improves the power and efficiency of clinical research.

11 Physical Function Measurement Precision and Range

12 Does it measure what it is supposed to measure?
Validity Does it measure what it is supposed to measure? Reliable---All arrows hit the same spot Valid--That spot is the bullseye This leads to precise measurement that improves the power and efficiency of clinical research.

13 Validity Scores on measures should correlate with accepted measures of the same domain (convergent validity) Scores on measures should not correlate with measures of unrelated domains (divergent validity) Scores on measures should discriminate between different levels of severity/difficulty (known groups validity)

14 Responsiveness When people experience clinical benefit or decline, their PRO scores should also change.

15 Responsiveness in Depressive Disorder: PROMIS Depression

16 Floor and Ceiling Effects
Floor effect – when a measure doesn’t have items to assess people at the bottom (at the floor) of the range of a domain Ceiling effect – when a measure doesn’t have items to assess people at the top (ceiling) of the domain range

17 Classical Test Theory versus Item Response Theory (IRT) -based measures

18 Classical Test Theory An individual takes an assessment
Their total score on that assessment is used for comparison purposes High Score – The person is higher on the trait Low Score-The person is lower on the trait

19 Item Response Theory Each individual item can be used for comparison purposes Person endorses better rating on “hard items”- The person is higher on the trait Person endorses worse rating on “easy items” - The person is lower on the trait Items that measure the same construct can be aggregated into longer assessments Basis for Item Banks

20 What is an Item Bank? A type of instrument- A large collection of items measuring one thing Items are “calibrated” according to difficulty Cover the full range of a trait/domain/construct Items tested evaluated/tested to ensure relevance, clarity, and psychometric robustness Items in the same bank are linked on a common metric; Any/all items can be used to provide a score on that domain Basis for Computer Adaptive Testing (CAT) Items are selected to maximize precision and retain clinical relevance Customized short forms tailored to the population and research purpose

21 Physical Functioning Item Bank
50 100 Physical Functioning Item Bank Item 1 2 3 4 5 6 7 8 9 n Are you able to run five miles? Are you able to run or jog for two miles? Are you able to walk a block on flat ground? Are you able to walk from one room to another? Are you able to stand without losing your balance for 1 minute? Are you able to get in and out of bed?

22 CATS?

23 What is wrong with today's static measures ?
3 2 2 1 1 Questionnaire with a high precision - but small range Questionnaire with a wide range - but low precision -1 Static questionnaire are either not precise enough or their measurement range is too narrow … ceiling and floor effects … -2 -3

24 Computer Adaptive Tests
1. Question 1 2 2. Question 1 2 3. Question 3 high depression 2 1 Questionnaire with a high precision - AND a wide range -1 Purpose of this slide is to walk the audience through the CAT process using an example …and thereby transmit the message of narrower measurement range and higher precision with each question … -2 low depression -3

25 Item banks also enable a variety of fixed length forms (short forms)

26 Physical Functioning Item Bank
Form C Physical Function Form A Physical Function Form B Physical Functioning Item Bank Item1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9 Item 10 Item 11 Item 12 Item 13 Item 14 Item 15 Item 16

27 What if the patient can’t respond?

28 Proxy versus Patient Better agreement on observable domains
physical and instrumental activities of daily living, physical health and motor function; Lower agreement for more subjective dimensions social functioning, pain, cognitive function and psychological or and emotional well-being Proxies report greater impairment more symptoms, functional difficulties, emotional distress and negative quality of life; But under-report pain

29 NIH-Sponsored PRO Measurement Systems
Patient Reported Outcomes Measurement Information System (PROMIS) Neuro-QoL NIH Toolbox for Assessment of Neurological and Behavioral Function (Emotion) Adult Sickle Cell Quality of Life Measure (ASCQ-ME) HealthMeasures

30 NIH Systems Utilize a Common Metric
The same instrument used for many diseases The same “scale” applicable to all instruments/diseases (T=50, SD=10) The same scale regardless of instrument format: Single item Short Form Long Form Computerized Adaptive Test (CAT)

31

32 Domain Framework

33 How are different measures related to each other?

34 A common problem when using a variety of patient-reported outcome measures is the comparability of scales on which the outcomes are reported. Linking establishes relationships between scores on two different measures. The PRO Rosetta Stone (PROsetta Stone®) linked HealthMeasures instruments with other related instruments (e.g., SF-36, Brief Pain Inventory, CES-D, MASQ, FACIT-Fatigue) to expand the range of PRO assessment options within a common, standardized metric. It provides equivalent scores for different scales that measure the same health outcome.

35 Linking Outcomes Measures

36 Feasibility?

37 Data Collection Options for HealthMeasures
Paper and pencil Online: Assessment Center Assessment Center API - application programming interface (API) exposes the Assessment Center instrument library (including CATs) for external system consumption REDCap – Research Electronic Data Capture NIH Toolbox App for iPad

38 Scoring Options Automatic with AC, AC API, NIH Toolbox App
Manually (for standard forms) using look-up tables Assessment Center Scoring Service Automatic scoring of custom fixed forms

39 CASE STUDIES

40 Selecting an Instrument
Who? - English-speaking adults with muscular dystrophy and their caregivers What? - Sleep difficulties How? - Online, no cost, minimal burden, multiple times Why? - Community engagement indicated need to explore sleep issues as a potential factor associated with overall QoL, in both teen/adult patients and caregivers. As the disorder progresses, teens and adults need help to get to/go back to sleep.

41 Generic versus Disease/Condition-Specific
Generic Disease-specific Comparisons Across and within-conditions Within-condition Content “Universal” Targeted; face valid May miss important areas/sx Responsiveness Less? More? Neuro-QoL or PROMIS?

42 PROMIS Physical Health Banks
Adult Pediatric Pain Behavior Pain Interference Pain Interference Fatigue Fatigue Upper Extremity Physical Health Sleep Disturbance Mobility Sleep-related Impairment Asthma Impact Physical Function Sexual Function

43 Does it measure what I want to measure?

44 Domain/Bank Definition
The PROMIS Sleep Disturbance item bank assesses perceptions of sleep quality, sleep depth, and restoration associated with sleep; perceived difficulties and concerns with getting to sleep or staying asleep; and perceptions of the adequacy of and satisfaction with sleep. The PROMIS Sleep Disturbance Scale does not include symptoms of specific sleep disorders, nor does it provide subjective estimates of sleep quantities (e.g., the total amount of sleep, time to fall asleep, or amount of wakefulness during sleep). The PROMIS Sleep-Related Impairment item bank assesses perceptions of alertness, sleepiness, and tiredness during usual waking hours, and the perceived functional impairments during wakefulness associated with sleep problems or impaired alertness. The Sleep-Related Impairment item bank measures the level of waking alertness, sleepiness, and function within the context of overall sleep-wake function, but does not directly assess cognitive, affective, or performance impairments.

45 Content of Measure Review the items: Face validity
Inappropriate or inapplicable items

46 Options PROMIS Item Bank v1.0 – Sleep Disturbance
PROMIS Short Form v1.0 – Sleep Disturbance 4a PROMIS Short Form v1.0 – Sleep Disturbance 6a PROMIS Short Form v1.0 – Sleep Disturbance 8a PROMIS Short Form v1.0 – Sleep Disturbance 8b

47 What’s the best option for me?
Purpose of measurement? Research, clinical care, longitudinal Level of assessment? Group versus individual Brevity versus Precision

48 Many Instrument Types CAT Short Form Scale Profile Mode Precision
Computer Computer and paper Precision High for all trait levels Varies by length and how well the form is targeted to the specific subject Varies by length Brevity Variable length (4 – 12 items) Range of lengths available Instrument Dependent 4, 6, or 8 items per domain

49

50 Brevity versus Precision – Fixed Length Forms
 length,  precision,  error Individual level assessment requires more precision than group level

51 Other Considerations Does the source or reason for sleep disturbance matter in how we choose a measure or specific items? What are the best practices in “adding to” a measure (i.e., including an open-ended question) so we can explore the different reasons behind sleep disturbance across patients and caregivers? It might, if your purpose is to learn more about the underlying causes of the sleep problem. The PROMIS generic sleep disturbance measure is intended to function the same across conditions, as an indicator of the level of sleep disturbance irrespective of cause. A study by Cooke et al found that the sleep disturbance measure functioned consistently for adults with muscular dystrophy, MS, SCI and post-polio syndrome and across different age groups. It’s not intended to give information about the cause(s) of the disturbed sleep or symptoms of a particular sleep disorder. With respect to specific items, concerns about content validity and relevance might lead you to choose some items over others. An open-ended question can give valuable information. However, it can be very labor intensive to analyze the resulting information. Alternatively, the question could include likely sources of sleep disturbance (participant selects all that are relevant) plus an “other” category for which the participant describes a source not listed. Another option would be to give a more targeted sleep questionnaire – i.e., measures of particular sleep disorders or sleep symptoms. A fourth option would be to look at associations between sleep problems and measures of different constructs to see if potential relationships can be identified. For example, measures of caregiver burden could be given to the parents and/or parents could complete generic (non-caregiver) measures of constructs hypothesized to be causing disturbed sleep.

52 Other Considerations We hypothesize, but do not have much evidence to support, that sleep disturbance does not become a common concern until the teen to adult years. How can we best measure sleep/fatigue domains across a range of ages? Baseline? One consideration is when to start the assessment. You want to start prior to when sleep/fatigue problems are hypothesized to begin, but don’t want to unnecessarily burden participants with questions that aren’t relevant. Frequency? Consider more frequent assessment if the construct measured fluctuates; less frequent if the construct is relatively stable Which measures? There are pediatric, parent-report and self-report measures of fatigue available in PROMIS, and together these cover the age range of interest. There are no pediatric sleep disturbance measures in PROMIS. Options: you can assess only 18+; decide to use it for older teens anyway (recognizing that normative values may not be applicable; language potentially at too high of a literacy level); use a different sleep measure for pediatric or for the entire age range. Cross-sectional versus longitudinal: Cross sectional – giving measures to the different age groups at the same time can suggest whether or not hypothesis is correct. But longitudinal assessment needed to evaluate developmental trajectory

53 Thanks to faculty and staff of HealthMeasures
Content Credits Thanks to faculty and staff of HealthMeasures

54 Questions?


Download ppt "Making PROs fit your needs: Practical basics of patient reported outcome measures Cindy J. Nowinski, MD, PhD."

Similar presentations


Ads by Google