Download presentation
Presentation is loading. Please wait.
Published byRoland Stokes Modified over 9 years ago
1
1 Measurement Issues in Health Disparities Research Anita L. Stewart, Ph.D. University of California, San Francisco Clinical Research with Diverse Communities EPI 222, Spring April 17, 2008
2
2 Background u U.S. population becoming more diverse u More minority groups are being included in research due to: –NIH mandate –Recent health disparities initiatives
3
3 Types of Diverse Groups u Health disparities research focuses on differences in health between the following groups: –Minority vs. non-minority –Low income vs. others –Low education vs. others –Limited English skills vs. others –…. and others
4
4 Health Disparities Research u Describe health disparities –Health differences across various diverse groups u Identify mechanisms by which health disparities occur –Individual level –Environmental level u Intervene to reduce health disparities
5
5 Health Care Disparities u Differential access to and quality of health care is well known –Thus, health care disparities become a plausible mechanism for health disparities u Understanding determinants of health care disparities is also of interest
6
6 Types of Self-Report Measures Needed u Measures of health, and of various mechanisms for disparities –Class 4 will present numerous mechanisms u Examples from this class: sense of control, self-efficacy for managing disease, health- related quality of life for various health conditions
7
7 Measurement Implications of Research in Diverse Groups u Most self-reported measures were developed and tested in mainstream, well-educated groups –Subgroup analysis of measures has been rare u Thus, little information is available on appropriateness, reliability, validity, and responsiveness in minority and other diverse groups
8
8 The Measurement Goal: Identify Measures That Can Be Used… u To compare diverse groups u To study mechanisms within any particular “diverse” group
9
9 Group Comparisons are the Most Problematic u Disparities research involves comparing mean levels of health or its determinants u Requires “equivalent” concepts and measures –Potential true differences may be obscured –Observed group differences may be inaccurate
10
10 Alternative Explanations for Observed Group Differences u Observed group mean differences in a measure can be due to –culturally- or group-mediated differences in true score (true differences) -- OR -- –bias - systematic differences between group observed scores not attributable to true scores
11
11 Bias - A Special Concern u Measurement bias in any one group may make group comparisons invalid u Bias can be due to group differences in: –the meaning of concepts or items –the extent to which a measure represents a concept –cognitive processes of responding –use of response scales –appropriateness of data collection methods
12
12 Example of Effect of Biased Items u 5 CES-D items administered to black and white men –1 item subject to differential item functioning (bias) u 5-item scale including item suggested that black men had higher levels of somatic symptoms than white men (p <.01) u 4-item scale excluding biased item showed no differences between black and white men S Gregorich, Med Care, 2006;44:S78-S94.
13
13 Bias or “Systematic Difference”? u Bias refers to “deviation from true score” u Cannot speak of a measure being “biased” in one group compared to another w/o knowing true score u Preferred term: differential “item” functioning –Item (or measure) that has a different meaning in one group than another
14
14 Typical Sequence of Developing New Self-Report Measures Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures
15
15 Typical Sequence of Developing New Self-Report Measures Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures
16
16 Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures Obtain perspectives of diverse groups
17
17 Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures Obtain perspectives of diverse groups.. to reflect these perspectives
18
18 Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures Obtain perspectives of diverse groups.. to reflect these perspectives.. in all diverse groups
19
19 Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures Obtain perspectives of diverse groups.. to reflect these perspectives.. in all diverse groups
20
20 Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures Obtain perspectives of diverse groups.. to reflect these perspectives.. in all diverse groups Measurement studies across groups
21
21 Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures Obtain perspectives of diverse groups.. to reflect these perspectives.. in all diverse groups If results are non-equivalent
22
22 Measurement Adequacy vs. Measurement Equivalence u Making group comparisons requires conceptual and psychometric adequacy and equivalence u Adequacy - within a “diverse” group –concepts are appropriate –psychometric properties meet minimal criteria u Equivalence - between “diverse” groups –conceptual and psychometric properties are comparable
23
23 Why Not Use Culture-Specific Measures? u Measurement goal - identify measures that can be used across all groups, yet maintain sensitivity to diversity and have minimal bias u Most health disparities studies require comparing mean scores across diverse groups –need comparable measures
24
24 Conceptual and Psychometric Adequacy and Equivalence Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group
25
25 Left Side of Matrix: Issues in a Single Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group
26
26 Ride Side of Matrix: Issues in More Than One Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group
27
27 Conceptual Adequacy in One Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group
28
28 Conceptual Adequacy in One Group u Is concept relevant, meaningful, and acceptable to a “diverse” group? u Traditional research –Conceptual adequacy = simply defining a concept –Mainstream population “assumed” u Minority and health disparities research –Mainstream concepts may be inadequate –Concept should correspond to how a particular group thinks about it
29
29 Qualitative Approaches to Explore Conceptual Adequacy in Diverse Groups u Literature reviews –ethnographic and anthropological u In-depth interviews and focus groups –discuss concepts, obtain their views u Expert consultation from diverse groups –review concept definitions –rate relevance of items
30
30 Example of Inadequate Concept u Patient satisfaction typically conceptualized in mainstream populations in terms of, e.g., –access, technical care, communication, continuity, interpersonal style u In minority and low income groups, additional relevant domains include, e.g., –discrimination by health professionals –sensitivity to language barriers
31
31 Method for Examining Conceptual Relevance u Compiled set of 33 HRQL items spanning many concepts u Assessed relevance to older African Americans u After answering each question, asked “how relevant is this question to the way you think about your health?” –Response scale: 0-10 scale with endpoints labeled –Labels: 0=not at all relevant, 10=extremely relevant Cunningham WE et al., Qual Life Res, 1999;8:749-768.
32
32 Results: Conceptual Relevance u Most relevant items: –Spirituality (3 items) –Weight-related health (2 items) –Hopefulness (1 item) u Spirituality items –importance of spirituality to well-being, level of spirituality, being sick affected spirituality
33
33 Results: Conceptual Relevance u Least relevant items: –Physical functioning –Role limitations due to emotional problems u All standard MOS measures ranked in the lower 2/3, including all SF12 items
34
34 Conceptual Relevance of Spanish FACT-G u Bilingual/bicultural expert panel reviewed all 28 items for relevance –One item had low cultural relevance to quality of life –One concept was missing – spirituality u Developed new spirituality scale (FACIT-Sp) with input from cancer patients, psychotherapists, and religious experts –Sample item “I worry about dying” Cella D et al. Med Care 1998: 36;1407
35
35 Psychometric Adequacy in One Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group
36
36 Psychometric Adequacy in any Group u Minimal standards: – Sufficient variability – Minimal missing data – Adequate reliability/reproducibility – Evidence of construct validity – Evidence of responsiveness to change u Basic classical test theory approach
37
37 Evidence of Psychometric Inadequacy of Measure in Various Diverse Groups u SF-36 social functioning scale - internal consistency reliability <.70 in three different samples: –Chinese language, adults aged 55-96 years –Japanese language, Japanese elders –English, Pima Indians Stewart AL & Nápoles-Springer A, Med Care, 2000;38(9 Suppl):II-102
38
38 Conceptual Equivalence Across Groups Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group
39
39 Conceptual Equivalence u Is the concept relevant, familiar, acceptable to all diverse groups being studied? u Is the concept defined the same way in all groups? –all relevant “domains” included (none missing) –interpreted similarly u Is the concept appropriate for all diverse groups?
40
40 Generic/Universal vs Group-Specific (Etic versus Emic) u Concepts unlikely to be defined exactly the same way across diverse ethnic groups u Generic/universal (etic) –features of a concept that are appropriate across groups u Group-Specific (emic) –idiosyncratic or culture-specific portions of a concept
41
41 Etic versus Emic (cont.) u Goal in health disparities research on more than one group: –identify generic/universal portion of a concept (could be entire concept) that can be applied across all groups u For within-group analyses or studies –the culture-specific portion is also relevant –Same as examining “conceptual adequacy” within one group
42
42 Approaches Similar to Those for Conceptual Adequacy u Main difference: Need to assure concept is equivalent across groups –Additional criterion u What do we mean by equivalent conceptually? u Methods are poorly developed
43
43 Obtain Perspective of All Diverse Groups on Concept Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures Obtain perspectives of diverse groups
44
44 Example: Develop Concept of Interpersonal Processes of Care u Began with conceptual framework from literature and psychometric studies of preliminary survey u IPC Version I Conceptual Framework u Three major multi-dimensional categories : –Communication –Decision-making –Interpersonal Style
45
45 IPC Version I: Subdomains u Communication Elicitation of concerns, explanations, general clarity u Decision-making Involving patients in decisions u Interpersonal Style Respectfulness, emotional support, non-discrimination, cultural sensitivity Stewart et al., Milbank Quarterly, 1999: 77:305
46
46 Limitations of First IPC Framework u Tested on small sample of 600 patients from San Francisco General Hospital u Several hypothesized concepts were not confirmed –e.g., cultural sensitivity u Needed further development and validation on a larger sample
47
47 Developed Revised IPC Concept Draft IPC II conceptual framework IPC Version I framework in Milbank Quarterly 19 focus groups - African American, Latino, and White adults Literature review of quality of care in diverse groups
48
48 IPC-II Conceptual Framework I. COMMUNICATION III. INTERPERSONAL STYLE General clarity Respectfulness Elicitation/responsiveness Courteousness Explanations of Perceived discrimination --processes, condition, Emotional support self-care, meds Cultural sensitivity Empowerment II. DECISION MAKING Responsive to patient preferences Consider ability to comply
49
49 IPC-II Conceptual Framework (cont) IV. OFFICE STAFF Respectfulness Discrimination V. FOR LIMITED ENGLISH PROFICIENCY PATIENTS MD’s and office staff’s sensitivity to language
50
50 Psychometric Equivalence Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group
51
51 Psychometric Equivalence u Measures have similar measurement properties in all diverse group of interest in your study –e.g., English and Spanish language, African Americans and Caucasians u Measures have similar measurement properties in one diverse group as in original (mainstream) groups on which the measures were developed
52
52 Equivalence of Reliability?? No! u Difficult to compare reliability because it depends on the distribution of the construct in a sample –Thus lower reliability in one group may simply reflect poorer variability u More important is the adequacy of the reliability in both groups –Reliability meets minimal criteria within each group
53
53 Equivalence of Criterion Validity u Determine if hypothesized patterns of associations with specified criteria are confirmed in both groups, e.g. –a measure predicts utilization in both groups –a cutpoint on a screening measure has the same specificity and sensitivity in both groups
54
54 Equivalence of Construct Validity u Are hypothesized patterns of associations confirmed in both groups? –Example: Scores on the Spanish version of the FACT had similar relationships with other health measures as scores on the English version u Primarily tested through subjectively examining pattern of correlations –Can test differences using confirmatory factor analysis (e.g., through Structural Equation Modeling)
55
55 Item Equivalence u Differential Item Functioning (DIF) –Items are non-equivalent if they are differentially related to the underlying trait –Equivalence indicated by no DIF u Meaning of response categories is similar across groups u Distance between response categories is similar across groups
56
56 Equivalence of Response Choices: Spanish and English Self-rated Health u Excellent u Very good u Good u Fair u Poor u Excelente u Muy buena u Buena u Regular u Mala “Regular” in Spanish may be closer to “good” in English, thus is not comparable to the meaning of “fair”
57
57 Spanish and English Self-rated Health Responses u Excellent u Very good u Good u Fair u Poor u Excelente u Muy buena u Buena u Regular (Pasable?) u Mala Another choice, “pasable,” may be closer in meaning to “fair”
58
58 Methods for Identifying Differential Item Functioning (DIF) u Item Response Theory (IRT) u Examines each item in relation to underlying latent trait u Tests if responses to one item predict the underlying latent “score” similarly in two groups –if not, items have “differential item functioning”
59
59 Equivalence of Factor Structure u Factor structure is similar in new group to structure in original groups in which measure was tested –In other words, the measurement model is the same across groups u Methods –Specify the number of factors you are looking for –Determine if the hypothesized model fits the data
60
60 Confirmatory Factor Analysis (CFA) u Can specify a hypothesized structure a priori u Can test mean and covariance structures –to estimate bias
61
61 Equivalence of Factor Structure: Testing Psychometric Invariance u Psychometric invariance (equivalence) u Important properties of theoretically-based factor structure (measurement model) do not vary across groups (are invariant) –measurement model is the same across groups u Empirical comparison across groups –Not simply by examination
62
62 Criteria for Psychometric Invariance Across all groups – a sequential process: u Same number of factors or dimensions u Same items on same factors u Same factor loadings u No bias on any item across groups u Same residuals on items u No item or scale bias AND same residuals
63
63 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Scalar or Strong Factorial Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances are unbiased Strict Factorial Invariance Both scalar and residual criteria are met Criteria for Evaluating Invariance Across Groups: Technical Terms
64
64 Interpersonal Processes of Care (IPC) u Social-psychological aspects of the patient-physician interaction –communication, respectfulness, patient- centered decision-making, and being sensitive to patients’ needs u Developed survey of 92 items based on principles outlined above
65
65 Conducted Survey u From over 16,000 primary care patients, randomly sampled those who: –Made at least one visit in prior 12 months –Records indicated they were African American, Latino, or White (Caucasian) u Sampled within race/ethnic group
66
66 Sample Size (N=1,664) 383 Spanish speaking Latino 435 African American 428 English speaking Latino 418 White
67
67 Results u Of the 92 items, 29 had similar factor structure across all 4 groups –achieved “metric invariance”
68
68 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Strong Factorial or Scalar Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances can be compared across groups Strict Factorial Invariance Both scalar invariance and residual invariance criteria are met Results: Metric Invariance Across 4 Groups for 29 Items
69
69 Seven “Metric Invariant” Scales (29 items) I. COMMUNICATION Hurried communication Elicited concerns, responded Explained results, medications II. DECISION MAKING Patient-centered decision-making III. INTERPERSONAL STYLE Compassionate, respectful Discriminated Disrespectful office staff
70
70 Continued Exploration of Invariance: Item Bias u Tested invariance of model parameter estimates across groups for “scalar invariance” –Bias in items
71
71 Dimensional Invariance: Same number of factors Configural Invariance: Same items load on same factors Metric or Factor Pattern Invariance: Items have same loadings on same factors Strong Factorial or Scalar Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances can be compared across groups Strict Factorial Invariance Both scalar invariance and residual invariance criteria are met Obtained Partial Scalar Invariance Across 4 Groups for 18 Items
72
72 Seven “Scalar Invariant” (Unbiased) Scales (18 items) I. COMMUNICATION Hurried communication – lack of clarity Elicited concerns, responded Explained results, medications – explained results II. DECISION MAKING Patient-centered decision-making – decided together III. INTERPERSONAL STYLE Compassionate, respectful–(subset) compassionate, respectful Discriminated – discriminated due to race/ethnicity Disrespectful office staff
73
73 What to do if Measures Are Not Equivalent in a Specific Study Comparing Groups u Need guidelines for how to handle data when substantial non-comparability is found in a study –Drop bad or “biased” items from scores »Compare results with and without biased items –Analyze study by stratifying diverse groups u The current challenge for measurement in minority health studies
74
74 Example: 20-item Spanish CES-D in Older Latinos u 2 items had very low item-scale correlations, high rates of missing data in two studies –I felt hopeful about the future –I felt I was just as good as other people u 20-item version Study 1 Study 2 –Item-scale correlations -.20 to.73.05 to.78 –Cronbach’s alpha u 18-item version –Item-scale correlations.45 to.76.33 to.79
75
75 Example: Measure Can be Modified u GHAA Consumer Satisfaction Survey u Adapted to be appropriate for African American patients –Focus groups conducted to obtain perspectives of African Americans –New domains added (e.g., discrimination/ stereotyping) –New items added to existing domains Fongwa M et al. Ethnicity and Disease, 2006:16;948-955.
76
76 Approaches to Conducting Studies When You Are Not Sure u Use a combination of “universal” and group- specific items –use universal items to compare across groups –use specific items (added onto universal items) when conducting analyses within one group »To find a variable that correlates with a health measure within one group
77
77 Conclusions u Measurement in health disparities and minority health research is a relatively new field - few guidelines u Encourage first steps - test and report adequacy and equivalence u As evidence grows, concepts and measures that work better across diverse groups will be identified
78
78 Two Special Journal Issues on Measurement in Diverse Populations u Measurement in older ethnically diverse populations –J Mental Health Aging, Vol 7, Spring 2001 u Measurement in a multi-ethnic society –Med Care, Vol 44, November 2006
79
79 Homework for Class 3 u Using the same template and measure you reviewed for class 2, complete sections 14-21 –Use same file and submit entire document by email to anita.stewart@ucsf.eduanita.stewart@ucsf.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.