Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Measurement Issues in Health Disparities Research Anita L. Stewart, Ph.D. University of California, San Francisco Health Disparities Research Methods.

Similar presentations


Presentation on theme: "1 Measurement Issues in Health Disparities Research Anita L. Stewart, Ph.D. University of California, San Francisco Health Disparities Research Methods."— Presentation transcript:

1 1 Measurement Issues in Health Disparities Research Anita L. Stewart, Ph.D. University of California, San Francisco Health Disparities Research Methods EPI 222, Spring April 14, 2011

2 2 Overview of Class u Background: culture-specific versus generic measures u Conceptual and psychometric adequacy and equivalence –Adequacy in one group –Equivalence across groups u Modifying measures

3 3 Background u U.S. population becoming more diverse u Minority groups are being included in research due to: –NIH mandate (1993 – women and minorities) –Health disparities initiatives

4 4 Types of Diverse Groups u Health disparities research focuses on differences in health between … –Minority vs. non-minority –Lower income vs. others –Lower education vs. others –Limited English Proficiency (LEP) vs. others –…. and many others

5 5 Measurement Implications of Research in Diverse Groups u Most self-reported measures were developed and tested in mainstream, well-educated groups u Little information is available on appropriateness, reliability, validity, and responsiveness in diverse groups –Although this is changing rapidly

6 6 Measurement Adequacy vs. Measurement Equivalence u Adequacy - within a “diverse” group –concepts are appropriate and relevant –psychometric properties meet minimal criteria »Good variability »Reliable and valid »Sensitive to change over time u Equivalence - between “diverse” groups –conceptual and psychometric properties are comparable

7 7 Why Not Use Culture-Specific Measures? u Measurement goal is to identify measures that can be used across all groups in one study, yet maintain sensitivity to diversity and have minimal bias u Most health disparities studies compare mean scores across diverse groups

8 8 Generic/Universal vs Group-Specific (Etic versus Emic) u Concepts unlikely to be defined exactly the same way across diverse ethnic groups u Generic/universal (etic) –features of a concept that are appropriate across groups u Group-Specific (emic) –idiosyncratic or culture-specific portions of a concept

9 9 Etic versus Emic (cont.) u Goal in health disparities research with more than one group: –identify generic/universal portion of a concept that are applicable across all groups u For within-group studies: –the culture-specific portion is also relevant

10 10 Overview of Class u Background: culture-specific versus generic measures u Conceptual and psychometric adequacy and equivalence –Adequacy in one group –Equivalence across groups

11 11 Conceptual and Psychometric Adequacy and Equivalence Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group

12 12 Left Side of Matrix: Adequacy in a Single Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group

13 13 Ride Side of Matrix: Equivalence in More Than One Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group

14 14 Overview of Class u Background: culture-specific versus generic measures u Conceptual and psychometric adequacy and equivalence –Adequacy in one group –Equivalence across groups u Modifing measures

15 15 Approaches to Explore Conceptual Adequacy in Diverse Groups u Literature reviews of concepts and measures u In-depth interviews and focus groups –discuss concepts, obtain their views u Expert consultation from diverse groups –review concept definitions –rate relevance of items

16 16 Example: Review of Measures of Dietary Intake in Minority Populations u Reviewed food frequency questionnaires for use in minority populations u Performed well in some groups and poorly in others u Group differences that could affect scores: –Portion sizes differ –Missing ethnic foods u Could underestimate total intake and nutrients RJ Coates et al. Am J Clin Nutr; 1997;65(suppl):1108S-15S.

17 17 A Structured Method for Examining Conceptual Relevance u Compiled set of 33 typical HRQL items u Administered to older African Americans u After each question, asked “how relevant is this question to the way you think about your health?” –0-10 scale with 0=not at all relevant, 10=extremely relevant Cunningham WE et al., Qual Life Res, 1999;8:749-768.

18 18 HRQL Relevance Results u Most relevant items: –Spirituality, weight-related health, hopefulness u Least relevant items: –Physical functioning, role limitations due to emotional problems

19 19 Qualitative Research: Expert Panel Reviewed Spanish FACT-G u Functional Assessment of Cancer Therapy – General (FACT-G) u Bilingual/bicultural panel reviewed items for conceptual relevance to Hispanics –One item had low relevance ( I worry about dying) »Added new item "I worry my condition will get worse" –One domain missing – spirituality »Developed new spirituality scale (FACIT-Sp) with input from cancer patients, psychotherapists, and religious experts D Cella et al. Med Care 1998: 36;1407

20 20 Example of Inadequate Concept u Patient satisfaction typically conceptualized in terms of, e.g., –access, technical care, communication, continuity, coordination, interpersonal style u In minority and low income groups, additional relevant domains: –discrimination by health professionals –sensitivity to language barriers MN Fongwa et al., Ethnicity Dis, 2006;16(3):948-955.

21 21 Measuring Park/Recreation Environments in Low-Income Communities u New focus on how environments promote physical activity –Many good new measures of environments u Reviewed adequacy for lower-income, minority communities

22 22 Measuring Park/Recreation Environments in Low-Income Communities (cont) u Recommendations: In low-income communities of color: –Identify and address most salient environmental needs –Incorporate research on preferred recreational activities –Ensure representation of perceptions of residents MF Floyd et al. Am J Prev Med, 2009;36:S156-S160.

23 23 Psychometric Adequacy in any Group u Minimal standards: – Sufficient variability – Minimal missing data – Adequate reliability/reproducibility – Evidence of construct validity – Evidence of sensitivity to change

24 24 Example: Adequacy of Reliability of Spanish SF-36 in Argentinean Sample SF-36 scaleCoefficient alpha Physical functioning.85 Role limitations - physical.84 Bodily pain.80 General health perceptions.69 Vitality.82 Social functioning.76 Role limitations - emotional.75 Mental health.84 F Augustovski et al, J Clin Epid, 2008, 61:1279-84.

25 25 Overview of Class u Background: culture-specific versus generic measures u Conceptual and psychometric adequacy and equivalence –Adequacy in one group –Equivalence across groups u Modifying measures

26 26 Conceptual Equivalence Across Groups Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group

27 27 Conceptual Equivalence u Is the concept relevant, familiar, acceptable to all diverse groups being studied? u Is the concept defined the same way in all groups? –all relevant “domains” included (none missing) –interpreted similarly

28 28 Example: Developing Concept of Interpersonal Processes of Care IPC II conceptual framework IPC Version I framework in Milbank Quarterly 19 focus groups - African American, Spanish- and English-speaking Latino, and White adults Literature review of quality of care in diverse groups

29 29 IPC-II Conceptual Framework: Reflects Concerns of All 4 Groups I. COMMUNICATION III. INTERPERSONAL STYLE General clarity Respectfulness Elicitation/responsiveness Courteousness Explanations of Perceived discrimination --processes, condition, Emotional support self-care, meds Cultural sensitivity II. DECISION MAKING Responsive to patient preferences Consider ability to comply

30 30 IPC-II Conceptual Framework (cont) IV. OFFICE STAFF Respectfulness Discrimination V. FOR LIMITED ENGLISH PROFICIENCY PATIENTS MD’s and office staff’s sensitivity to language

31 31 Conceptual Equivalence: Spanish- and English-speaking Inpatients u Administered Hospital Quality of Care Survey (H- CAHPS ® ), asked 2 open-ended questions to detect experiences missed by survey »What they liked most about care »What aspects of care they would change u Analyzed responses in relation to existing survey items or new topics MP Hurtado et al. Health Serv Res, 2005;40-6, Part II:2140-2161

32 32 Psychometric Equivalence Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group

33 33 Psychometric or Measurement Equivalence u When comparing groups (as in health disparities research): –Measures should have similar or equivalent measurement properties in all diverse groups of interest in your study »e.g., English and Spanish, African Americans and Caucasians

34 34 Psychometric Equivalence Across Groups u Psychometric characteristics should be “equivalent” across all groups: – Sufficient variability – Minimal missing data – Reliability/reproducibility – Construct validity – Sensitivity to change

35 35 Bias (Systematic Error) - A Special Concern u Observed group mean differences in a measure can be due to: –Culturally- or group-mediated differences in true score (true differences) -- OR -- –Bias - systematic differences between observed scores not attributable to true scores

36 36 Random versus Systematic Error Observed true item score score =+ error random systematic Relevant to reliability Relevant to validity “Bias”

37 37 Bias (Systematic Error) u Systematic measurement error may make group comparisons invalid u Systematic differences in scores can be due to group differences in: –the meaning of concepts or items –the extent to which measures represent a concept –cognitive processes of responding –use of response scales

38 38 Bias or “Systematic Difference”? u Bias = “deviation from true score” u Cannot speak of a “bias” in one group compared to another w/o knowing true score u Preferred term: differential “item” functioning (DIF) –Item (or measure) that has a different meaning in one group than another

39 39 Item Equivalence u No Differential Item Functioning (DIF) –Items are similarly related to the underlying trait u Meaning of response categories is similar across groups u Distance between response categories is similar across groups

40 40 Methods for Identifying Differential Item Functioning (DIF) u Item Response Theory (IRT) u Examines each item in relation to underlying latent trait u Tests if responses to one item predict the underlying latent “score” similarly in two groups –if not, items have “differential item functioning”

41 41 Example of Effect of DIF u 5 CES-D items administered to Black and White men –1 item subject to differential item functioning (bias) u 5-item scale including item suggested that Black men had more somatic symptoms than White men (p <.01) u 4-item scale excluding biased item showed no differences S Gregorich, Med Care, 2006;44:S78-S94.

42 42 Equivalence of Reliability?? No! u Difficult to compare reliability because it depends on the distribution of the construct in a sample –Thus lower reliability in one group may simply reflect poorer variability u More important is the adequacy of the reliability in both groups –Reliability meets minimal criteria within each group

43 43 Equivalence of Criterion Validity u Determine if hypothesized patterns of associations with specified criteria are confirmed in both groups, e.g. –a measure predicts utilization in both groups –a cutpoint on a screening measure has the same specificity and sensitivity in identifying a condition in both groups

44 44 Equivalence of Construct Validity u Are hypothesized patterns of associations confirmed in both groups? –Example: Scores on the Spanish version of the FACT-G had similar relationships with other health measures as scores on the English version u Primarily tested through subjectively examining pattern of correlations u Can also test using confirmatory factor analysis (CFA)

45 45 Equivalence of Construct Validity of Spanish SF-36 in Argentinean Sample u Compared Spanish SF-36 construct validity test results to U.S. English SF-36 results u Tested several previously tested hypotheses (which were confirmed): –PCS decreases with age and # of diseases –Relationship of PCS and MCS with utilization –Known groups validity (scores lower for those with various diseases) F Augustovski et al, J Clin Epid, 2008, 61:1279-84.

46 46 Equivalence of Factor Structure u Factor structure similar in new group to structure in original study –measurement model is the same across groups u Methods –Specify number of factors –Determine if hypothesized model fits the data

47 47 Factor Structure of CES-D u Original study found 4 factors –Somatic symptoms –Depressive affect –Interpersonal behavior –Positive affect u In a new population group: do you find 4 factors? LS Radloff, Applied Psychol Measurement, 1977;1:385-401.

48 48 How Evidence for Equivalence of Factor Structure is Obtained u Subjectively –visually compare factor loadings across group- specific exploratory factor analysis u Empirically –confirmatory factor analysis of data that includes multiple groups –studies of psychometric invariance

49 49 Empirical Examination of Equivalence of Factor Structure u Psychometric invariance (equivalence) u Important properties of theoretically-based factor structure (measurement model) do not vary across groups (are invariant) –measurement model is the same across groups u Empirical comparison across groups using confirmatory factor analysis –Not simply by examination

50 50 Hierarchical Tests of Psychometric Equivalence Across all groups – a sequential process: u Same number of factors or dimensions u Same items on same factors u Same factor loadings u No bias on any item across groups u Same residuals on items u No item or scale bias AND same residuals

51 51 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Scalar or Strong Factorial Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances are unbiased Strict Factorial Invariance Both scalar and residual criteria are met Criteria for Evaluating Invariance Across Groups: Technical Terms

52 52 Factor Structure of CES-D u Original study found 4 factors –Somatic symptoms –Depressive affect –Interpersonal behavior –Positive affect u In a new population group: do you find 4 factors? LS Radloff, Applied Psychol Measurement, 1977;1:385-401.

53 53 Test for Evidence of Dimensional Invariance u Two studies of Latinos: –2 factors in both studies »Depression and well-being u American Indian adolescents –3 factors »Depressed affect »Somatic symptoms and reduced activity »Positive affect TQ Miller et al., J Gerontol: Soc Sci 1997;520:S259 SM Manson et al., Psychol Assessment 1990;2:231-237

54 54 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Strong Factorial or Scalar Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances can be compared across groups Strict Factorial Invariance Both scalar invariance and residual invariance criteria are met Configural Invariance

55 55 Configural Invariance u Assumes: dimensional invariance is found (same number of factors) u Definition: Item-factor patterns are the same, same items load on same factors in both groups u CES-D example –4 factors found in Anglos, Blacks, and Chicanos –Same items loaded on each factor in all groups RE Roberts et al., Psychiatry Research, 1980;2:125-134

56 56 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Strong Factorial or Scalar Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances can be compared across groups Strict Factorial Invariance Both scalar invariance and residual invariance criteria are met Metric Invariance

57 57 Metric Invariance or Factor Pattern Invariance u Assumes: dimensional and configural invariance are found u Definition: Item loadings are the same across groups –i.e., the correlation of each item with its factor is the same in all groups

58 58 Metric Invariance Example from Interpersonal Processes of Care u Out of 91 items – factor structure of 29 items met criteria of invariance across 4 groups –Spanish-speaking Latinos, English speaking Latinos, African Americans, Whites u Dimensional –Similar factor structure across all 4 groups u Configural –Same items loaded on each factor in all 4 groups u Metric –Same item loadings in all 4 groups Stewart et al., Health Services Research, 2007; 42 (3, Part I):1235-56.

59 59 Seven “Metric Invariant” Scales: Same Item Loadings Across Groups I. COMMUNICATION Hurried communication Elicited concerns, responded Explained results, medications II. DECISION MAKING Patient-centered decision-making III. INTERPERSONAL STYLE Compassionate, respectful Discriminated Disrespectful office staff

60 60 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Strong Factorial or Scalar Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances can be compared across groups Strict Factorial Invariance Both scalar invariance and residual invariance criteria are met Strong Factorial Invariance

61 61 Strong Factorial Invariance or Scalar Invariance u Assumes: dimensional, configural, and metric invariance are found u Definition: Observed scores are unbiased, i.e., means can be compared across groups u Requires test of equivalence of mean scores across groups using confirmatory factor analysis

62 62 Seven “Scalar Invariant” (Unbiased) IPC Scales (18 items) I. COMMUNICATION Hurried communication – lack of clarity Elicited concerns, responded Explained results, medications – explained results II. DECISION MAKING Patient-centered decision-making – decided together III. INTERPERSONAL STYLE Compassionate, respectful–(subset) compassionate, respectful Discriminated – discriminated due to race/ethnicity Disrespectful office staff

63 63 Equivalence of Spanish and English Hospital Quality of Care Survey (H-CAHPS ® ) u Tested 7 subscales (e.g., nurse communication, pain control, discharge information) u Compared Spanish and English groups: –Item-scale correlations, internal consistency reliability, factor structure, and construct validity u Concluded these were equivalent MP Hurtado et al. Health Serv Res, 2005;40-6, Part II:2140-2161

64 64 Overview of Class u Background: culture-specific versus generic measures u Conceptual and psychometric adequacy and equivalence –Adequacy in one group –Equivalence across groups u Modifying measures

65 65 What if Measures Need Modifying or Adapting? u Why would we modify a measure? u What information is used to modify? u What are the types of modifications? u How should we test modified measures?

66 66 When Problems are Found Through Pretesting… Investigators Face a Choice Use the existing measure “as is” to preserve integrity of measure OR Try to modify the measure to address problems in diverse group

67 67 Argument in Favor of Using Measure “As Is” u Modifications can change the measure’s validity and reliability u Allows comparison of findings to other research using the measure

68 68 Argument Against Using Measure “As Is” …. …when problems are found u If reliability and validity are poor… u Results pertaining to the measure could be erroneous –Limited internal validity

69 69 Reasons for Considering Modifying an Existing Measure u In health disparities research –Sample/population differs from that in which original measure developed u More broadly –Measure developed awhile ago –Poor format/presentation –Study context issues

70 70 Key Reason: Population Group Differences from Original u Research in diverse population groups –Different culture, race/ethnic group –Lower level of socioeconomic status (SES) –Limited English proficiency, lower literacy u Mainstream research –Different disease, health problem, patient group, age group

71 71 Why Might a Measure Not be Suitable for New Population Group? u Concept or dimension is missing u Meaning of concepts differ from mainstream u New group may not interpret items as intended u Process of answering questions may differ

72 72 Poor Format/Presentation = High Respondent Burden u Instructions unnecessarily wordy, unclear u Way of responding is complicated u Difficult to navigate the questionnaire –Crowded on the page –Hard to track across the page u Hard to read –Poor contrast, small font

73 73 Example: Complex Instructions Instructions: There are 12 statements on this form. They are statements about families. You are to decide which of these statements are true of your family and which are false. If you think the statement is TRUE or MOSTLY TRUE of your family, please mark the box in the T (TRUE) column. If you think the statement is FALSE or MOSTLY FALSE of your family, please mark the box in the F (FALSE) column. You may feel that some of the statements are true for some family members and false for others. Mark the box in the T column if the statement is TRUE for most members. Mark the box in the F column if the statement is FALSE for most members. If the members are evenly divide, decide what is the stronger overall impression and answer accordingly. Remember, we would like to know what your family seems like to you. So do not try to figure out how other members see your family, but do give us your general impression of your family for each statement. Do not skip any item. Please begin with the first item.

74 74 Example: Burdensome Way of Responding For each question, choose from the following alternatives: 0 = Never 1 = Almost Never 2 = Sometimes 3 = Fairly Often 4 = Very Often 1. In the last month, how often have you felt nervous and “stressed”? …………………………………….01234 2. In the last month, how often have you felt that things were going your way?....................................01234 S Cohen et al. J Health Soc Beh, 1983;24(4):385-396.

75 75 What Information is Used to Decide How to Modify a Measure? u Same data identifying conceptual differences in diverse population… –often includes information for making revisions

76 76 Published Review - Physical Activity Measures for Minority Women u WHI convened experts to identify issues in measuring PA in minority and older women u Some conclusions: –Assess culturally sensitive activities (e.g., walking for transportation and errands) –Measure intermittent activities –Phrases “leisure time, free time, spare time” (used to denote non-occupational activities) not understood u Review can help select appropriate measures and adapt as needed LC Masse et al., J Women’s Health, 1998;7:57-67.

77 77 Types of Modifications u Format or presentation u Content –Dimensions –Item stems –Response options

78 78 Format/Presentation Modifications u Goal: reduce respondent burden u Improve appearance or way of responding –Simplify instructions –Modify format for responding –Create more space, reduce crowded items –Improve contrast, increase font size

79 79 Types of Modifications u Format or presentation u Content –Dimensions –Item stems –Response options Add Drop Replace Modify

80 80 Content Modification Example: Add Dimension u Study of older Korean/Chinese immigrants u Added language support to existing social support measure u Based on focus group data: –Help with translation at medical appointments –Help to ask questions in English when on the phone –Help to learn English S Wong et al. Int J Health Human Dev, 2005;61:105-121.

81 81 Content Modification Example: Add Dimension (cont) u New items were embedded in existing social support measure using same format

82 82 Minor to Major Modifications? u Each type of modification can hypothetically be rated on a continuum from having minor to major impact on reliability and validity of original measure –Minor – slight changes in format/presentation …… –Major – numerous changes in dimensions, items, and response choices

83 83 Need to Test Psychometric Properties of Modified Measures u All modifications, no matter how small, can affect reliability and validity of original measure u Burden is on investigator to test modified measure

84 84 Recommendations for Testing Modified Measures u Pretest modified measure extensively before fielding in new study u Build in ability to do psychometric testing when measure is fielded –Add validity variables (e.g., similar to original measure to test comparability) –Add follow-up to assess test-retest reliability

85 85 Analyze Psychometric Adequacy of Modified Measure in New Study u Modified measure should meet minimal criteria –Item-scale correlations –Internal-consistency reliability

86 86 Analyzing Modified Measure: Comparability to Original Measure u Compare measurement results of modified measure to original measure –Reliability (sample dependent) –Factor structure –Construct validity –Sensitivity to change

87 87 Overall Conclusions u Measurement in health disparities research is relatively new field u We encourage reporting on adequacy and equivalence of measures tested in any diverse population u As evidence grows, easier to find measures that work better across diverse groups

88 88 Resource: Reviews of Measures for Diverse Populations u Multicultural measurement in older populations, JH Skinner et al (eds), Springer Publishing Co: NY, 2002 –ALSO published as: Measurement in older ethnically diverse populations, J Mental Health Aging, Vol 7, Spring 2001 Reviews measures that have been used cross-culturally in: acculturation, socioeconomic status, social support, cognition, health, depression, and religiosity.

89 89 Resource: Special Journal Issue u Measurement in a multi-ethnic society –Med Care, Vol 44, November 2006 –Qualitative and quantitative methods in addressing measurement in diverse populations

90 90 Guidelines for Translating Measures u Handout: annotated bibliography of articles in which optimal methods of translation are used u Compiled by CADC Measurement and Methods Core

91 91 Homework for Class 3 u Complete rows 12-17 in matrix –Use form posted on the website u Include your name in the filename –Smith_HW_epi222_class3 u Email by Monday April 18 to Anita.Stewart@ucsf.edu


Download ppt "1 Measurement Issues in Health Disparities Research Anita L. Stewart, Ph.D. University of California, San Francisco Health Disparities Research Methods."

Similar presentations


Ads by Google