Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Class 7 Measurement Issues in Research with Diverse Populations Including Health Disparities Research October 30, 2008 Anita L. Stewart Institute for.

Similar presentations


Presentation on theme: "1 Class 7 Measurement Issues in Research with Diverse Populations Including Health Disparities Research October 30, 2008 Anita L. Stewart Institute for."— Presentation transcript:

1 1 Class 7 Measurement Issues in Research with Diverse Populations Including Health Disparities Research October 30, 2008 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

2 2 Overview of Class 7 u Background: culture-specific versus generic measures u Conceptual and psychometric adequacy and equivalence u Importance of mixed methods in developing measures for research in diverse populations

3 3 Background u U.S. population becoming more diverse u Minority groups are being included in research due to: –NIH mandate –Health disparities initiatives

4 4 Types of Diverse Groups u Health disparities research focuses on differences in health between … –Minority vs. non-minority –Lower income vs. others –Lower education vs. others –Limited English Proficiency (LEP) vs. others –…. and many others

5 5 Measurement Implications of Research in Diverse Groups u Most self-reported measures were developed and tested in mainstream, well-educated groups u Little information is available on appropriateness, reliability, validity, and responsiveness in diverse groups –Although this is changing rapidly

6 6 What We Need to Know About Possible Measures u For any one diverse group – evidence that measures are: –Appropriate and relevant –Good variability –Reliable and valid –Sensitive to change over time u Across all “diverse” groups in your study –Meet above criteria in all groups –Measuring the same concept in all groups

7 7 Measurement Adequacy vs. Measurement Equivalence u Adequacy - within a “diverse” group –concepts are appropriate –psychometric properties meet minimal criteria u Equivalence - between “diverse” groups –conceptual and psychometric properties are comparable

8 8 Why Not Use Culture-Specific Measures? u Measurement goal is to identify measures that can be used across all groups in one study, yet maintain sensitivity to diversity and have minimal bias u Most health disparities studies compare mean scores across diverse groups

9 9 Generic/Universal vs Group-Specific (Etic versus Emic) u Concepts unlikely to be defined exactly the same way across diverse ethnic groups u Generic/universal (etic) –features of a concept that are appropriate across groups u Group-Specific (emic) –idiosyncratic or culture-specific portions of a concept

10 10 Etic versus Emic (cont.) u Goal in health disparities research with more than one group: –identify generic/universal portion of a concept that are applicable across all groups u For within-group studies: –the culture-specific portion is also relevant

11 11 Overview of Class 7 u Background: culture-specific versus generic measures u Conceptual and psychometric adequacy and equivalence u Importance of mixed methods in developing measures for research in diverse populations

12 12 Conceptual and Psychometric Adequacy and Equivalence Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group

13 13 Left Side of Matrix: Adequacy in a Single Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group

14 14 Ride Side of Matrix: Equivalence in More Than One Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group

15 15 Conceptual Adequacy in One Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group

16 16 Qualitative Approaches to Explore Conceptual Adequacy in Diverse Groups u Literature reviews –ethnographic and anthropological u In-depth interviews and focus groups –discuss concepts, obtain their views u Expert consultation from diverse groups –review concept definitions –rate relevance of items

17 17 Conceptual Relevance of Spanish FACT-G u Bilingual/bicultural expert panel reviewed all 28 items for relevance –One item had low cultural relevance to quality of life –One concept was missing – spirituality u Developed new spirituality scale (FACIT-Sp) with input from cancer patients, psychotherapists, and religious experts –Sample item “I worry about dying” Cella D et al. Med Care 1998: 36;1407

18 18 A Structured Method for Examining Conceptual Relevance u Compiled set of 33 HRQL items u Assessed relevance to older African Americans u After each question, asked “how relevant is this question to the way you think about your health?” –Response scale: 0-10 scale with endpoints labeled –0=not at all relevant, 10=extremely relevant Cunningham WE et al., Qual Life Res, 1999;8:749-768.

19 19 Results: Most Relevant Items u Spirituality (3 items) –importance of spirituality to well-being –level of spirituality –being sick affected spirituality u Weight-related health (2 items) u Hopefulness (1 item)

20 20 Results: Least Relevant Items u Physical functioning u Role limitations due to emotional problems u All standard MOS measures ranked in the lower 2/3, including all SF12 items

21 21 Psychometric Adequacy in One Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group

22 22 Psychometric Adequacy in any Group u Minimal standards: – Sufficient variability – Minimal missing data – Adequate reliability/reproducibility – Evidence of construct validity – Evidence of responsiveness to change

23 23 Evidence of Psychometric Inadequacy in Various Diverse Groups u SF-36 social functioning scale - internal consistency reliability <.70 in three different samples: –Chinese language, adults aged 55-96 years –Japanese language, Japanese elders –English, Pima Indians Stewart AL & Nápoles-Springer A, Med Care, 2000;38(9 Suppl):II-102

24 24 Conceptual Equivalence Across Groups Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group

25 25 Conceptual Equivalence u Is the concept relevant, familiar, acceptable to all diverse groups being studied? u Is the concept defined the same way in all groups? –all relevant “domains” included (none missing) –interpreted similarly

26 26 Obtain Perspective of All Diverse Groups on Concept Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures Obtain perspectives of diverse groups

27 27 Example: Developing Concept of Interpersonal Processes of Care u Goal: instrument to explore disparities in processes of care between diverse patient groups u Initial conceptual framework reflected issues and perspectives of Latino, African American, and White patients u Based on: –Focus groups –Literature –Clinical experience

28 28 IPC Version I u Communication Elicitation of concerns, explanations, general clarity u Decision-making Involving patients in decisions u Interpersonal Style Respectfulness, emotional support, non-discrimination, cultural sensitivity A Stewart et al., Milbank Quart, 1999: 77:305 (in class 1 readings)

29 29 Limitations of First IPC Framework u Tested on small sample of 600 patients from San Francisco General Hospital u Several hypothesized concepts were not confirmed –e.g., cultural sensitivity u Needed further development

30 30 Developed Revised IPC Concept Draft IPC II conceptual framework IPC Version I framework in Milbank Quarterly 19 new focus groups - African American, Latino, and White adults Literature review of quality of care in diverse groups

31 31 IPC-II Conceptual Framework I. COMMUNICATION III. INTERPERSONAL STYLE General clarity Respectfulness Elicitation/responsiveness Courteousness Explanations of Perceived discrimination --processes, condition, Emotional support self-care, meds Cultural sensitivity Empowerment II. DECISION MAKING Responsive to patient preferences Consider ability to comply

32 32 IPC-II Conceptual Framework (cont) IV. OFFICE STAFF Respectfulness Discrimination V. FOR LIMITED ENGLISH PROFICIENCY PATIENTS MD’s and office staff’s sensitivity to language

33 33 Psychometric Equivalence Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group

34 34 Psychometric Equivalence: Two Meanings u Measures have similar measurement properties in all diverse groups of interest in your study –e.g., English and Spanish, African Americans and Caucasians u Measurement properties in your diverse group are similar to original (mainstream) groups on which the measures were developed

35 35 Group Comparisons are the Most Problematic u To compare mean levels of health or its determinants in one study, need “equivalent” concepts and measures u If not: –Potential true differences may be obscured –Observed group differences may be spurious

36 36 Alternative Explanations for Observed Group Mean Differences u Culturally- or group-mediated differences in true score (true differences) -- OR -- u Bias - systematic differences between observed scores not attributable to true scores

37 37 Bias - A Special Concern u Measurement bias may make group comparisons invalid u Bias can be due to group differences in: –the meaning of concepts or items –the extent to which measures represent a concept –cognitive processes of responding –use of response scales –appropriateness of data collection methods

38 38 Example of Effect of Biased Items u 5 CES-D items administered to Black and White men –1 item subject to differential item functioning (bias) u 5-item scale including item suggested that Black men had more somatic symptoms than White men (p <.01) u 4-item scale excluding biased item showed no differences S Gregorich, Med Care, 2006;44:S78-S94.

39 39 Bias or “Systematic Difference”? u Bias refers to “deviation from true score” u Cannot speak of a measure being “biased” in one group compared to another w/o knowing true score u Preferred term: differential “item” functioning (DIF) –Item (or measure) that has a different meaning in one group than another

40 40 Item Equivalence u Differential Item Functioning (DIF) –Items are non-equivalent if they are differentially related to the underlying trait u Meaning of response categories is similar across groups u Distance between response categories is similar across groups

41 41 Methods for Identifying Differential Item Functioning (DIF) u Item Response Theory (IRT) u Examines each item in relation to underlying latent trait u Tests if responses to one item predict the underlying latent “score” similarly in two groups –if not, items have “differential item functioning”

42 42 Equivalence of Response Choices: Spanish and English Self-rated Health u Excellent u Very good u Good u Fair u Poor u Excelente u Muy buena u Buena u Regular u Mala “Regular” in Spanish may be closer to “good” in English, thus is not comparable to the meaning of “fair”

43 43 Spanish and English Self-rated Health Responses u Excellent u Very good u Good u Fair u Poor u Excelente u Muy buena u Buena u Regular (Pasable?) u Mala Another choice, “pasable,” may be closer in meaning to “fair”

44 44 u Osmond study?

45 45 Equivalence of Reliability?? No! u Difficult to compare reliability because it depends on the distribution of the construct in a sample –Thus lower reliability in one group may simply reflect poorer variability u More important is the adequacy of the reliability in both groups –Reliability meets minimal criteria within each group

46 46 Example: Argentinean Spanish SF-36 Reliability (N=2,638) SF-36 scaleCoefficient alpha Physical functioning.85 Role limitations - physical.84 Bodily pain.80 General health perceptions.69 Vitality.82 Social functioning.76 Role limitations - emotional.75 Mental health.84 F Augustovski et al, J Clin Epid, 2008, in press;

47 47 Equivalence of Criterion Validity u Determine if hypothesized patterns of associations with specified criteria are confirmed in both groups, e.g. –a measure predicts utilization in both groups –a cutpoint on a screening measure has the same specificity and sensitivity in identifying a condition in both groups

48 48 Equivalence of Construct Validity u Are hypothesized patterns of associations confirmed in both groups? –Example: Scores on the Spanish version of the FACT had similar relationships with other health measures as scores on the English version u Primarily tested through subjectively examining pattern of correlations –Can test differences using confirmatory factor analysis (e.g., through Structural Equation Modeling)

49 49 u Example of FACT-G construct validity same

50 50 Equivalence of Construct Validity of Argentinean Spanish SF-36 u Compared Spanish SF-36 construct validity test results to U.S. English SF-36 results u Tested several previously tested hypotheses (which were confirmed): –PCS decreases with age and # of diseases –Relationship of PCS and MCS with utilization –Known groups validity (scores lower for those with various diseases)

51 51 Equivalence of Factor Structure u Factor structure is similar in new group to structure in original groups in which measure was tested –measurement model is the same across groups u Methods –Specify the number of factors you are looking for –Determine if the hypothesized model fits the data

52 52 Factor Structure of CES-D in Older Chinese Immigrants

53 53 Confirmatory Factor Analysis (CFA) u Can specify a hypothesized structure a priori u Can test mean and covariance structures –to estimate bias

54 54 Example: Argentinean Spanish

55 55 Equivalence of Factor Structure: Testing Psychometric Invariance u Psychometric invariance (equivalence) u Important properties of theoretically-based factor structure (measurement model) do not vary across groups (are invariant) –measurement model is the same across groups u Empirical comparison across groups –Not simply by examination

56 56 Criteria for Psychometric Invariance Across all groups – a sequential process: u Same number of factors or dimensions u Same items on same factors u Same factor loadings u No bias on any item across groups u Same residuals on items u No item or scale bias AND same residuals

57 57 How Evidence for These Properties is Obtained u Subjectively –visually compare factor pattern matrixes across “group-specific” exploratory factor analysis solutions u Empirically –confirmatory factor analysis of data that includes multiple groups

58 58 Great Article on Psychometric Invariance Gregorich, S.E. Do self-report instruments allow meaningful comparisons across population groups? Testing measurement invariance using the confirmatory factor analysis framework. Med Care, 2006;44 (11, supplement 3):S78-S94.

59 59 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Scalar or Strong Factorial Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances are unbiased Strict Factorial Invariance Both scalar and residual criteria are met Criteria for Evaluating Invariance Across Groups: Technical Terms

60 60 Dimensional Invariance of CES-D: Two Examples u Definition: same number of factors observed in all groups u Original 4 CES-D factors –Somatic symptoms –Depressive affect –Interpersonal behavior –Positive affect LS Radloff, The CES-D scale: A self-report depression scale for research in the general population, Applied Psychol Measurement, 1977;1:385-401.

61 61 Examples: Studies in Which Dimensional Invariance Criterion Failed u Hispanic EPESE (n=2,536) and a study of older Mexican Americans (n=330) –2 factors in both studies –Depression (somatic symptoms, depressive affect, and interpersonal behavior) –Well-being u American Indian adolescents (n=179) –3 factors –Depressed affect –Somatic symptoms and reduced activity –Positive affect TQ Miller et al., J Gerontol: Soc Sci 1997;520:S259 SM Manson et al., Psychol Assessment 1990;2:231-237

62 62 Configural Invariance u Assumes: dimensional invariance is found (same number of factors) u Definition: Item-factor patterns are the same, i.e., the same items load on the same factors in both groups u CES-D example –4 factors found in Anglos, Blacks, and Chicanos –Same items loaded on each factor in all groups RE Roberts et al., Psychiatry Research, 1980;2:125-134

63 63 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Strong Factorial or Scalar Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances can be compared across groups Strict Factorial Invariance Both scalar invariance and residual invariance criteria are met Configural Invariance of CES-D (Roberts)

64 64 Metric Invariance or Factor Pattern Invariance u Assumes: dimensional and configural invariance are found u Definition: Item loadings are the same across groups –i.e., the correlation of each item with its factor is the same in all groups

65 65 Example from Interpersonal Processes of Care Measurement Studies u Out of 91 items - 29 items achieved metric invariance (Spanish-speaking Latinos, English speaking Latinos, African Americans, Whites) –Similar factor structure across all 4 groups –Same items loaded on each factor in all 4 groups –Same item loadings in all 4 groups

66 66 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Strong Factorial or Scalar Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances can be compared across groups Strict Factorial Invariance Both scalar invariance and residual invariance criteria are met Metric Invariance of IPC Across 4 Groups for 29 Items

67 67 “Metric Invariant” Scales (29 items) I. COMMUNICATION Hurried communication Elicited concerns, responded Explained results, medications II. DECISION MAKING Patient-centered decision-making III. INTERPERSONAL STYLE Compassionate, respectful Discriminated Disrespectful office staff

68 68 Strong Factorial Invariance or Scalar Invariance u Assumes: dimensional, configural, and metric invariance are found u Definition: Observed scores are unbiased, i.e., means can be compared across groups u Requires test of equivalence of mean scores across groups using confirmatory factor analysis

69 69 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Strong Factorial or Scalar Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances can be compared across groups Strict Factorial Invariance Both scalar invariance and residual invariance criteria are met Obtained Partial Scalar Invariance Across 4 Groups for 18 Items

70 70 “Scalar Invariant” (Unbiased) Scales (18 items) I. COMMUNICATION Hurried communication – lack of clarity Elicited concerns, responded Explained results, medications – explained results II. DECISION MAKING Patient-centered decision-making – decided together III. INTERPERSONAL STYLE Compassionate, respectful–(subset) compassionate, respectful Discriminated – discriminated due to race/ethnicity Disrespectful office staff

71 71 CAHPS ® Equivalence Studies u Consumer Assessment of Healthcare Providers and Systems (CAHPS ® ) is a collection of quality of care surveys –Inpatient, outpatient, healthcare systems, special populations u Outstanding methodology – good models of studies of equivalence

72 72 Equivalence of Spanish and English Hospital Quality of Care Survey (H-CAHPS ® ) u 7 subscales: nurse communication, MD communication, communication about meds, nursing services, discharge information, pain control, and physical environment u Report on translation and adaptation, pretesting, item-scale correlations, factor structure, internal consistency reliability, and construct validity MP Hurtado et al. Health Serv Res, 2005;40-6, Part II:2140-2161

73 73 Overview of Class 7 u Background: culture-specific versus generic measures u Conceptual and psychometric adequacy and equivalence u Importance of mixed methods in developing measures for research in diverse populations

74 74 Integrating Qualitative and Quantitative Methods in Assessing Cultural Equivalence u Optimal approach - use qualitative and quantitative methods in tandem to address issues of cultural equivalence

75 75 Distinction Between International and U.S. Studies u International studies do this routinely –Translate existing measure for a new country and language –Assume non-equivalence to begin with (d ifferent nations, languages) u During translation, items added/modified to improve conceptual equivalence –Product is “adapted” and translated instrument

76 76 Typical U.S. Approach in Studies of English Speaking Diverse Groups u Select existing well-tested measures (developed in mainstream) and assume they will work (universality) u Assumes perspectives of diverse groups are similar to mainstream –“Cultural hegemony” (Guyatt) –“Middle-class ethnocentrism (Rogler)

77 77 Mixed Methods: Developing IPC Measure of Cultural Sensitivity u Initial concept and items – from qualitative work –Psychometric analyses - measure did not meet minimal criteria u Second version of concept and items – from new qualitative work, results of first study –Psychometric analyses - measure did not meet minimal criteria u Analyzed focus group data in more depth –found that cultural sensitivity is multidimensional u Now conducting new study testing a multidimensional survey of cultural sensitivity Class 1 slides provide details

78 78 What to do if Measures Are Not Equivalent in a Specific Study Comparing Groups u Need guidelines for how to handle data when substantial non-comparability is found in a study –Drop bad or “biased” items from scores »Compare results with and without biased items –Analyze study by stratifying diverse groups u The current challenge for measurement in minority health studies

79 79 Example: 20-item Spanish CES-D in Older Latinos u 2 items had low item-scale correlations, high rates of missing data in two studies –I felt hopeful about the future –I felt I was just as good as other people u 20-item version Study 1 Study 2 –Item-scale correlations -.20 to.73.05 to.78 u 18-item version –Item-scale correlations.45 to.76.33 to.79 V Gonzalez et al, Arthritis and Rheumatism, 1996;38:1429-1446.

80 80 Example: Instruments Can be Modified u GHAA Consumer Satisfaction Survey u Adapted to be appropriate for African American patients –Focus groups conducted to obtain perspectives of African Americans –New domains added (e.g., discrimination/ stereotyping) –New items added to existing domains –New scales met multi-trait scaling criteria M Fongwa et al. Ethnicity and Disease, 2006:16;948-955.

81 81 Resource: Reviews of Measures for Diverse Populations u Multicultural measurement in older populations, JH Skinner et al (eds), Springer Publishing Co: NY, 2002 –ALSO published as: Measurement in older ethnically diverse populations, J Mental Health Aging, Vol 7, Spring 2001 Reviews measures that have been used cross-culturally in: acculturation, socio-economic status, social supports, cognition, health and functional capacity, depression, health locus of control, health-related quality of life, and religiosity

82 82 Special Journal Issue u Measurement in a multi-ethnic society –Med Care, Vol 44, November 2006 –Qualitative and quantitative methods in addressing measurement in diverse populations

83 83 Conclusions u Measurement in health disparities and minority health research is a relatively new field - few guidelines u Encourage first steps - test and report adequacy and equivalence u As evidence grows, concepts and measures that work better across diverse groups will be identified

84 84 Health Disparities Research Methods (Epi 222): Spring u Course Director: Eliseo Pérez-Stable, MD u Thursday 2:45-4:15 –China Basin u Last year’s syllabus: http://www.epibiostat.ucsf.edu/courses/schedule/diverse_pops.html

85 85 Epi 222 Provides Overview Of…. u Methodological considerations in research in ethnically diverse populations u Meaning of race, ethnicity, social class and culture u Qualitative methods in developing and pre-testing instruments u Methods for adapting measures and research methods for use with diverse ethnic groups u Model of community-based participatory research

86 86 Homework for Next Week u Finish matrix: complete rows 31-38 –Translations, equivalence across diverse groups acceptability for your population, modifications possible

87 87 Next Week (Class 8) u Pretesting measures and creating a questionnaire


Download ppt "1 Class 7 Measurement Issues in Research with Diverse Populations Including Health Disparities Research October 30, 2008 Anita L. Stewart Institute for."

Similar presentations


Ads by Google