Download presentation
Presentation is loading. Please wait.
Published byAmice Barnett Modified over 9 years ago
1
1 Class 8 Measurement Issues in Diverse Populations Including Health Disparities Research November 17, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco
2
2 Background u U.S. population becoming more diverse u More minority groups are being included in research due to: –NIH mandate –Recent health disparities initiatives
3
3 Types of Diverse Groups u Health disparities research focuses on differences in health between the following groups: –Minority vs. non-minority –Low income vs. others –Low education vs. others –Limited English skills vs. others –…. and others
4
4 Health Disparities Research u Increasing research to: –Describe health disparities »Differences in health across various diverse groups –Identify determinants of health disparities »Individual level »Environmental level –Intervene to reduce health disparities
5
5 The Measurement Problem u Measurement goal - identify measures that can be used across all groups, and –are sensitive to diversity –have minimal bias between groups u Most self-reported measures were developed and tested in mainstream, well- educated groups –little research on measurement characteristics in diverse groups
6
6 Issues Concerning Group Comparisons u Observed mean differences in a measure can be due to –culturally- or group-mediated differences in true score (true differences) -- OR -- –bias - systematic differences between group observed scores not attributable to true scores
7
7 Bias - A Special Concern u Measurement bias in any one group may make group comparisons invalid u Bias can be due to group differences in: –the meaning of concepts or items –the extent to which measures represent a concept –cognitive processes of responding –use of response scales –appropriateness of data collection methods
8
8 Effects of Bias on Depression: Chinese and White Respondents u In Chinese respondents - 3 sources of bias that lower observed score: –tendency to not express negative feelings –exacerbated by face-to-face interview –meaning of word “depression” is more severe than for Whites – less likely to endorse it u Comparing groups – assume true level of depression is the same in both groups – –Observed scores would be lower in Chinese group –But lower level is due to these biases
9
9 Typical Sequence of Developing New Self-Report Measures Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures
10
10 Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures Obtain perspectives of diverse groups
11
11 Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures Obtain perspectives of diverse groups.. to reflect these perspectives
12
12 Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures Obtain perspectives of diverse groups.. to reflect these perspectives.. in all diverse groups
13
13 Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures Obtain perspectives of diverse groups.. to reflect these perspectives.. in all diverse groups
14
14 Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures Obtain perspectives of diverse groups.. to reflect these perspectives.. in all diverse groups Measurement studies across groups
15
15 Extra Steps in Sequence of Developing New Self-Report Measures for Diverse Groups Develop concept Create item pool Pretest/revise Field survey Psychometric analyses Final measures Obtain perspectives of diverse groups.. to reflect these perspectives.. in all diverse groups If results are non-equivalent
16
16 Measurement Adequacy vs. Measurement Equivalence u Making group comparisons requires conceptual and psychometric adequacy and equivalence u Adequacy - within a group –concepts are appropriate –psychometric properties meet minimal criteria u Equivalence - between groups –conceptual and psychometric properties are comparable
17
17 Conceptual and Psychometric Adequacy and Equivalence Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group
18
18 Left Side of Matrix: Issues in a Single Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group
19
19 Ride Side of Matrix: Issues in More Than One Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group
20
20 Conceptual Adequacy in One Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group
21
21 Conceptual Adequacy in One Group u Is concept relevant, meaningful, and acceptable in that group? u Traditional research –Conceptual adequacy = simply defining a concept –Mainstream population “assumed” u Minority and cross cultural research –Mainstream concepts may be inadequate –Concept should correspond to how a particular group thinks about it
22
22 (Old) Example of Inadequate Concept u Patient satisfaction typically conceptualized in mainstream populations in terms of, e.g., –access, technical care, communication, continuity, interpersonal style u In minority and low income groups, additional relevant domains include, e.g., –discrimination by health professionals –sensitivity to language barriers
23
23 Psychometric Adequacy in One Group Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group
24
24 Psychometric Adequacy in any Group u Minimal standards: – Sufficient variability – Minimal missing data – Adequate reliability/reproducibility – Evidence of construct validity – Evidence of responsiveness to change u Basic classical test theory approach
25
25 Evidence of Psychometric Inadequacy of SF-36 Scale in Three Diverse Groups u SF-36 social functioning scale - internal consistency reliability <.70 in three different samples: –Chinese language, adults aged 55-96 years –Japanese language, Japanese elders –English, Pima Indians Stewart AL & Nápoles-Springer A, 2000 (see readings)
26
26 Conceptual Equivalence Across Groups Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group
27
27 Conceptual Equivalence u Is the concept relevant, familiar, acceptable to all diverse groups being studied? u Is the concept defined the same way in all groups? –all relevant “domains” included (none missing) –interpreted similarly u Is the concept appropriate for all diverse groups?
28
28 Example: Subjective Test of Conceptual Equivalence of Spanish FACT-G u Bilingual/bicultural expert panel reviewed all 28 items –One item had low cultural relevance to quality of life –One concept was missing – spirituality u Developed new spirituality scale (FACIT-Sp) with input from cancer patients, psychotherapists, and religious experts –Sample item “I worry about dying” Cella D et al. Med Care 1998: 36;1407
29
29 Examples of Nonequivalent Concept: Depression u Mainstream concept expressed and reported via affect, somatic symptoms, behavior, thought patterns u In Asian Americans –public expression of self-reflection is discouraged –saving face and self-sacrifice are powerful forces in molding behavior and expression
30
30 Generic/Universal vs Group-Specific (Etic versus Emic) u Concepts unlikely to be defined exactly the same way across diverse ethnic groups u Generic/universal (etic) –features of a concept that are appropriate across groups u Group-Specific (emic) –idiosyncratic portions of a concept
31
31 Etic versus Emic (cont.) u Goal in health disparities research –identify generic/universal portion of a concept (could be entire concept) that can be applied across all groups u For within-group analyses or studies –the culture-specific portion is also relevant
32
32 Qualitative Approaches to Explore Conceptual Equivalence in Diverse Groups u Literature reviews –ethnographic and anthropological u In-depth interviews and focus groups –discuss concepts, obtain their views u Expert consultation from diverse groups –review concept definitions –rate relevance of items
33
33 Psychometric Equivalence Conceptual Psychometric Adequacy in 1 Group Equivalence Across Groups Concept equivalent across groups Psychometric properties meet minimal standards within one group Psychometric properties invariant (equivalent) across groups Concept meaningful within one group
34
34 Equivalence of Reliability?? No! u Difficult to compare reliability because it depends on the distribution of the construct in a sample –Thus lower reliability in one group may simply reflect poorer variability u More important is the adequacy of the reliability in both groups –Reliability meets minimal criteria within each group
35
35 Equivalence of Criterion Validity u Determine if hypothesized patterns of associations with specified criteria are confirmed in both groups, e.g. –a measure predicts utilization in both groups –a cutpoint on a screening measure has the same specificity and sensitivity in both groups
36
36 Equivalence of Construct Validity u Are hypothesized patterns of associations confirmed in both groups? –Example: Scores on the Spanish version of the FACT had similar relationships with other health measures as scores on the English version u Primarily tested through subjectively examining pattern of correlations –Can test differences using confirmatory factor analysis (e.g., through Structural Equation Modeling)
37
37 Equivalence of Factor Structure u Factor structure is similar in new group to structure in original groups in which measure was tested –In other words, the measurement model is the same across groups u Methods –Specify the number of factors you are looking for –Determine if the hypothesized model fits the data
38
38 Exploratory Factor Analysis (EFA) u Factor analysis methods that do not constrain the number of factors or the magnitude of the loadings u Identifies an underlying structure of a set of items with no particular hypotheses u Goal - identify as few explanatory variables (i.e., factors) as possible that account for covariation among the items
39
39 Confirmatory Factor Analysis (CFA) u Methods that specify a hypothesized structure a priori (before looking at the results) u Can test mean and covariance structures –to estimate bias
40
40 Equivalence of Factor Structure: Assuring Psychometric Invariance u Psychometric invariance (technical term for psychometric equivalence) u Invariance means that important properties of a theoretically-based factor structure (measurement model) do not differ or vary across groups (are invariant) –In other words, the measurement model is the same across groups u Empirical comparison of factor structure
41
41 Criteria for Psychometric Invariance: Non-technical Language Across two or more groups, determine whether each criterion is true – a sequential process: 1. Same number of factors (dimensions) 2. Same items load on (correlate with) same factors 3. Each item has same factor loadings 4. No bias on any item or scale across groups 5. Same residuals on items 6. No item or scale bias AND same residuals
42
42 Dimensional Invariance: Same number of dimensions Configural Invariance: Same items load on same dimensions Metric Invariance, Factor Pattern Invariance: Items have same loadings on same dimensions Strong Factorial Invariance, Scalar Invariance: Observed scores are unbiased Residual Invariance: Observed item and factor variances can be compared across groups Strict Factorial Invariance Both scalar invariance and residual invariance criteria are met Criteria for Evaluating Invariance Across Groups: Technical Terms
43
43 Dimensional Invariance u Definition: Factor structure is the same, i.e., t he same number of factors are observed in both groups u CES-D Example: –Four factors found in men and 3 factors in women (n=1000), 18-92 years of age –Failed the dimensional invariance criterion »a different number of factors was found in both groups JM Golding et al., J Clin Psychol 1991:47;61-75
44
44 Example: Dimensional Invariance of CES-D in Hispanic EPESE u Original 4 factors –Somatic symptoms –Depressive affect –Interpersonal behavior –Positive affect u Hispanic EPESE - only 2 factors –Depression (included somatic symptoms, depressive affect, and interpersonal behavior) –Well-being Miller TQ et al., The factor structure of the CES-D in two surveys of elderly Mexican Americans, J Gerontol: Soc Sci, 1997;520:S259-69.
45
45 Configural Invariance u Assumes: dimensional invariance is found –that there were the same number of factors u Definition: Item-factor patterns are the same, i.e., the same items load on the same factors in both groups u CES-D Example –4 factors found in Anglos, Blacks, and Chicanos –Same items loaded on each factor in all groups RE Roberts et al., Psychiatry Research, 1980;2:125-134
46
46 Metric Invariance or Factor Pattern Invariance u Assumes: dimensional and configural invariance are found u Definition: Item loadings are the same across groups, i.e., the correlation of each item with its factor is the same in both groups
47
47 Strong Factorial Invariance or Scalar Invariance u Assumes: dimensional, configural, and metric (factor pattern) invariance are found u Definition: Observed scores are unbiased, i.e., means can be compared across groups u Requires test of equivalence of mean scores across groups using confirmatory factor analysis
48
48 Residual Invariance u Assumes: dimensional, configural, and metric (factor pattern) invariance are found u Definition: Observed item and factor residual variances can be compared across groups, i.e., similar error associated with each item across groups
49
49 Strict Factorial Invariance u Assumes: dimensional, configural, metric (factor pattern), and strong factorial (scalar) invariance are found
50
50 Integrating Qualitative and Quantitative Methods in Assessing Cultural Equivalence u Optimal approach - use qualitative and quantitative methods in tandem to address issues of cultural equivalence u International studies do this routinely –typically develop a translated version of existing measure for a new country and language »Swedish version of SF-36 »Japanese version of Mental Health Inventory
51
51 Distinction Between International and U.S. Studies u International studies assume conceptual non- equivalence to begin with –Different nations, languages u Usually dealing with translated measures u During translation, items can be added or modified to improve conceptual and semantic equivalence –Product is an “adapted” instrument
52
52 International Versus U.S. Approach Assess Conceptual Equivalence (Qualitative) Assess Psychometric Equivalence (Quantitative)
53
53 Typical International Approach Assess Conceptual Equivalence (Qualitative) Assess Psychometric Equivalence (Quantitative) Begin here (assumes conceptual differences across countries) If new domains or definitions are found, can revise and add items Translated “adapted” version is the goal May achieve conceptual equivalence before testing psychometric adequacy
54
54 Typical U.S. Approach in Studies of English Speaking Diverse Groups u Select existing well-tested measures (developed in mainstream) and assume they will work (universality) u Assumes perspectives of diverse group are similar to mainstream –“Cultural hegemony” (Guyatt) –“Middle-class ethnocentrism (Rogler)
55
55 Typical U.S. Approach When No Translation is Done Most studies begin here (assumes universality of constructs) Assess Conceptual Equivalence (Qualitative) Assess Psychometric Equivalence (Quantitative)
56
56 Typical U.S. Approach When No Translation is Done Most studies begin here (assumes universality of constructs) If equiv. Assess Conceptual Equivalence (Qualitative) Assess Psychometric Equivalence (Quantitative) Proceed with analysis. May miss important domains and definitions
57
57 Typical U.S. Approach When No Translation is Done Most studies begin here (assumes universality of constructs) If problems If equiv. Assess Conceptual Equivalence (Qualitative) Assess Psychometric Equivalence (Quantitative) No Guidelines! If refine items based on qualitative studies, no longer have comparable instrument Proceed with analysis. May miss important domains and definitions
58
58 Typical U.S. Subgroup Approach When No Translation is Done No Guidelines! If refine items based on qualitative studies, no longer have comparable instrument Assess Conceptual Equivalence (Qualitative) Assess Psychometric Equivalence (Quantitative) Most studies begin here (assumes universality of constructs) If problems Proceed with analysis. May miss important domains and definitions If equiv.
59
59 What to do if Measures Are Not Equivalent in a Specific Study Comparing Groups u Need guidelines for how to handle data when substantial non-comparability is found in a study –Drop bad or “biased” items from scores »Compare results with and without biased items –Analyze study by stratifying diverse groups u The current challenge for measurement in minority health studies
60
60 Example: 20-item Spanish CES-D in Older Latinos u 2 items had very low item-scale correlations, high rates of missing data in two studies –I felt hopeful about the future –I felt I was just as good as other people u 20-item version Study 1 Study 2 –Item-scale correlations -.20 to.73.05 to.78 –Cronbach’s alpha u 18-item version –Item-scale correlations.45 to.76.33 to.79
61
61 Example: Measure Can be Modified u GHAA Consumer Satisfaction Survey u Adapted to be appropriate for African American patients –Focus groups conducted to obtain perspectives of African Americans –New domains added (e.g., discrimination/ stereotyping) –New items added to existing domains Fongwa M et al. Characteristics of a patient satisfaction instrument tailored to concerns of African Americans. Ethnicity and Disease, in press.
62
62 Approaches to Conducting Studies When You Are Not Sure u Use a combination of “universal” and group- specific items –use universal items to compare across groups –use specific items (added onto universal items) when conducting analyses within one group »To find a variable that correlates with a health measure within one group
63
63 Conclusions u Measurement in health disparities research is a relatively new field –Few guidelines u Encourage first steps –Test and report adequacy and equivalence u As evidence grows, concepts and measures that work better across diverse groups will be identified
64
64 Homework: Optional u IF you are interested in conducting research in diverse populations, –Complete rows 40-44 in the matrix (equivalence across diverse groups)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.