Class 9 and 10 Interpreting Pretest Data Considerations in Modifying Measures Testing Scales and Creating Scores Creating and Presenting Change Scores.

Slides:



Advertisements
Similar presentations
Test Development.
Advertisements

Standardized Scales.
Cross Cultural Research
The Research Consumer Evaluates Measurement Reliability and Validity
Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.
Effect Size and Meta-Analysis
INTERPRET MARKETING INFORMATION TO TEST HYPOTHESES AND/OR TO RESOLVE ISSUES. INDICATOR 3.05.
Meta-analysis & psychotherapy outcome research
Validity and Validation: An introduction Note: I have included explanatory notes for each slide. To access these, you will probably have to save the file.
Chapter 9 Descriptive Research. Overview of Descriptive Research Focused towards the present –Gathering information and describing the current situation.
Quantitative Research
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
Development of Questionnaire By Dr Naveed Sultana.
Reliability, Validity, & Scaling
Questionnaires and Interviews
Construction and Evaluation of Multi-item Scales Ron D. Hays, Ph.D. RCMAR/EXPORT September 15, 2008, 3-4pm
Instrumentation.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
FDA Approach to Review of Outcome Measures for Drug Approval and Labeling: Content Validity Initiative on Methods, Measurement, and Pain Assessment in.
1 Class 9 Steps in Creating and Testing Scale Scores, and Presenting Measurement Data December 1, 2005 Anita L. Stewart Institute for Health & Aging University.
1 Class 7 Pretesting measures and considerations in modifying or adapting measures November 10, 2005 Anita L. Stewart Institute for Health & Aging University.
1 Class 10 Creating Scores and Change Scores, Presenting Measurement Data, Selecting Standard Survey Items November 29, 2007 Anita L. Stewart Institute.
1 Class 9 Analyzing Pretest Data, Modifying Measures, Keeping Track of Measures, Creating Scale Scores November 15, 2007 Anita L. Stewart Institute for.
ScWk 240 Week 6 Measurement Error Introduction to Survey Development “England and America are two countries divided by a common language.” George Bernard.
Chapter Fourteen Data Preparation 14-1 Copyright © 2010 Pearson Education, Inc.
VALIDITY AND VALIDATION: AN INTRODUCTION Note: I have included explanatory notes for each slide. To access these, you will probably have to save the file.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Experimental Research Methods in Language Learning Chapter 9 Descriptive Statistics.
Descriptive Research Study Investigation of Positive and Negative Affect of UniJos PhD Students toward their PhD Research Project Dr. K. A. Korb University.
1 Class 9 Interpreting Pretest Data, Considerations in Modifying or Adapting Measures November 13, 2008 Anita L. Stewart Institute for Health & Aging University.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.
Chapter X Questionnaire and Form Design. Chapter Outline Chapter Outline 1) Overview 2) Questionnaire & Observation Forms i. Questionnaire Definition.
Chapter 14: Affective Assessment
Psychometrics: Exam Analysis David Hope
Questionnaire-Part 2. Translating a questionnaire Quality of the obtained data increases if the questionnaire is presented in the respondents’ own mother.
DESIGNING GOOD SURVEYS Laura P. Naumann Assistant Professor of Psychology Nevada State College.
Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.
Research in natural settings 2 Qualitative research: Surveys and fieldwork Macau University of Science and Technology.
Survey Methodology Reliability and Validity
PROCESSING DATA.
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.
Lecture 23 Collecting Primary data through Questionnare
PeerWise Student Instructions
DATA COLLECTION METHODS IN NURSING RESEARCH
Classroom Assessment A Practical Guide for Educators by Craig A
Leacock, Warrican and Rose (2009)
Angelika H. Claussen, PhD,
CCMH 535 RANK Career Begins/cchm535rank.com
Concept of Test Validity
Chapter 21 More About Tests.
Understanding Results
Classroom test and Assessment
پرسشنامه کارگاه.
Writing Survey Questions
Lecture 6 Structured Interviews and Instrument Design Part II:
More about Tests and Intervals
Chapter Eight: Quantitative Methods
Statistics and Research Desgin
2018 NM Community Survey Data Entry Training
Warm up – Unit 4 Test – Financial Analysis
Integrating Outcomes Learning Community Call February 8, 2012
Using statistics to evaluate your test Gerard Seinhorst
Pest Risk Analysis (PRA) Stage 2: Pest Risk Assessment
Research Problem: The research problem starts with clearly identifying the problem you want to study and considering what possible methods will affect.
Multiple Regression – Split Sample Validation
Analyzing Reliability and Validity in Outcomes Assessment
Chapter 6 Indexes, Scales, and Typologies
Chapter 4 Summary.
Indicator 3.05 Interpret marketing information to test hypotheses and/or to resolve issues.
Patient-reported Outcome Measures
Presentation transcript:

Class 9 and 10 Interpreting Pretest Data Considerations in Modifying Measures Testing Scales and Creating Scores Creating and Presenting Change Scores December 3, 2009 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

Overview of Class 9 and 10 Analyzing pretest data Modifying/adapting measures Keeping track of your study measures Testing basic psychometric properties, creating summated ratings scales, and presenting measurement information Creating and presenting change scores

Overview of Class 9 and 10 Analyzing pretest data Modifying/adapting measures Keeping track of your study measures Testing basic psychometric properties, creating summated ratings scales, and presenting measurement information Creating and presenting change scores

Tasks in Analyzing Pretest Data For each item Tabulate problems Determine importance of problems Results become basis for possible revisions/adaptations

Tasks in Analyzing Pretest Data For each item Tabulate problems Determine importance of problems Results become basis for possible revisions/adaptations

Methods of Summarizing Problems Optimal: transcripts of all pretest interviews Problems identified through: standard administration – interviewer or respondent probes Analyze dialogue (narrative) for clues to solve problems

Behavioral Coding: Problems with Standard Administration Systematic approach to identifying problems with items administered by interviewer Interviewer problems Respondent problems Method Listen to taped interview Read transcript

Examples of Interviewer “Behaviors” Indicating Problem Items Question misread or altered Slight change – meaning not affected Major change – alters meaning Question skipped by interviewer

Examples of Respondent “Behaviors” Indicating Problem Items Asked for clarification or repeat of question Indicated did not understand question Qualified answer (e.g., it depends) Indicated answer falls between existing response choices Didn’t know the answer Refused

Behavioral Coding Summary Sheet: Problems with Standard Administration Item # Interviewer: difficulty reading Subject: asks to repeat Q Subject: asks for clarification 1 2 3 4

Summarize Behavioral Codes For Each Item Proportion of interviews (respondents) with each problematic behavior # of occurrences of problem divided by N 7/48 respondents requested clarification

Behavioral Coding Summary Sheet: Standard Administration (e.g., n=20) Item # Interviewer: difficulty reading Subject: asks to repeat Q Subject: asks for clarification 1 2/20 1/20 2 3 3/20 4 …..

Missing Data: Clue to Problems More missing data associated with unclear, difficult, or irrelevant items Obtained for self-report administration

How Missing Data Prevalence Helps Items with large percent of responses missing – clue to problem In H-CAHPS® pretest: Did hospital staff talk with you about whether you would have the help you needed when you left the hospital? 35% missing for Spanish group 29% missing for English group MP Hurtado et al. Health Serv Res, 2005;40-6, Part II:2140-2161

Behavioral Coding: Problems Identified Through Probes Systematic approach to identifying respondent problems found via probes Method Listen to taped interview Read transcript

Usefulness of Transcript of Probe Interviews Can organize responses of all interview subjects by item (see example handout) Item 1 probe Response by subject 1 Response by subject 2 Etc. Item 2 probe

Results: Probing Meaning of Phrase I asked you how often doctors asked you about your health beliefs? What does the term ‘health beliefs’ mean to you? S1 - “.. I don’t want medicine” S5 - “.. How I feel, if I was exercising…” S7 - “.. Like religion? --not believing in going to doctors?”

Results: Beck Depression Inventory (BDI) Cognitive interviews older adults, oncology pts, and less educated adults Administered some selected BDI items Asked respondents to paraphrase items TL Sentell, Community Mental Health Journal, 2008;39:323

Results: Beck Depression Inventory (BDI) (cont) For each item, from 0-62% correctly paraphrased item Most misunderstandings: vocabulary confusion Phrase: I am critical of myself for my weaknesses and mistakes “Critical is when you’re very sick” “I don’t know how to explain mistakes”

Behavioral Coding of Probe Results I asked you how often doctors asked you about your health beliefs. What does the term “health beliefs” mean to you? Behavioral coding: # times response indicated lack of understanding as intended e.g., 2/10 respondents did not understand meaning based on response to probe

Results: Probing Meaning of Phrase On about how many of the past 7 days did you eat foods that are high in fiber, like whole grains, raw fruits, and raw vegetables? Probe: what does the term “high fiber” mean to you? Behavioral coding of standard administration Over half of respondents exhibited a problem Review of answers to probe Over ¼ did not understand the term Blixt S et al., Proceedings of section on survey research methods, American Statistical Association, 1993:1442.

Behavioral Coding Summary: Probes (+ Standard Administration) Item # Probe N=10 Meaning unclear Interviewer -difficulty reading Subject: asks to repeat Q Subject: asks for clarification 1 10 2/10 2/20 1/20 2 3 4/15 3/20 4

Probes Can Identify Problems Even When No Problem “Behaviors” Found Respondents appear to answer question appropriately during standard administration No problem behavior codes However, problems identified with probes Probe on meaning: Response indicates lack of understanding Probe on use of response options: Response indicates options are problematic

Results: No Behavior Coding Issues but Probe Detected Problems I seem to get sick a little easier than other people (definitely true, mostly true, mostly false, definitely false) Behavioral coding of standard administration Very few problems Review of answers to probe Almost 3/4 had comprehension problems Most problems around term “mostly” (either its true or its not) Blixt S et al., Proceedings of section on survey research methods, American Statistical Association, 1993:1442.

Interpret Behavioral Coding Results Determine if problems are common Items with only a few problems may be fine Quantifying “common” problems several types of problems (many row entries) several subjects experienced a problem problem w/item identified in >15% of interviews

Continue Analyzing Items with “Common” Problems Identify “serious” common problems Gross misunderstanding of the question Yields completely erroneous answer Couldn’t answer the question at all Some less serious problems can be addressed by improved instructions or a slight modification

Addressing More Serious Problems Conduct content analysis of transcript Use qualitative analysis software (e.g., NVIVO) For these items: review dialogue that ensued during administration of item and probes can reveal source of problems can help in deciding whether to keep, modify or drop items

Overview of Class 9 and 10 Analyzing pretest data Modifying/adapting measures Keeping track of your study measures Testing basic psychometric properties, creating summated ratings scales, and presenting measurement information Creating and presenting change scores

Full Talk on Modifying Measures A Framework for Understanding and Discussing Modifications to Measures in Studies of Diverse Groups GSA 2009 RCMAR Preconference Workshop Slides posted on syllabus: “GSA preconference slides”

Overview What is the problem? Why would we modify a measure? What information is used to modify? What are the types of modifications? How should we test modified measures?

Use the existing measure “as is” to preserve integrity of measure OR When Problems are Found Through Pretesting… Investigators Face a Choice Use the existing measure “as is” to preserve integrity of measure OR Try to modify the measure to address problems in diverse group Dilemma once problems are found..

Argument in Favor of Using Measure “As Is” Modifications can change the measure’s validity and reliability Allows comparison of findings to other research using the measure

Argument Against Using Measure “As Is” …. …when problems are found If reliability and validity are poor… Results pertaining to the measure could be erroneous Limited internal validity Erroneous conclusions about the research questions of interest Ability to compare to other research (external validity) is moot

Reasons for Considering Modifying an Existing Measure Key reason Sample/population differs from that in which original measure developed Other reasons Measure developed awhile ago Poor format/presentation Study context issues Four basic reasons for… Several other reasons having nothing to do with sample differences…..(GO ON)

Key Reason: Population Group Differences from Original Mainstream research Different disease, health problem, patient group, age group Research in diverse population groups Different culture, race/ethnic group Lower level of socioeconomic status (SES) Limited English proficiency, lower literacy In mainstream research – the bulk of the literature – modifications tend to be because of differences in the disease or health problem, or patient group differences such as age. e.g. fatigue severity measure modified for MS patients

Reasons: Measure Developed in Prior “Era” Historical events or changes have affected concept definition In all populations Specific to a diverse group Language use out of date Science of self-report not well developed Many existing measures were developed in the 70s and 80s – This has little to do with diverse groups, but instead to historical events or changes in society. Many older measures use phrases that are out of date Since then, we have learned a great deal about “the science of self-reported measures”.

Reasons: Poor Format/Presentation = High Respondent Burden Instructions unnecessarily wordy, unclear Way of responding is complicated Difficult to navigate the questionnaire Crowded on the page Hard to track across the page Hard to read Poor contrast, small font Following to some extent on the prior era Many measures not designed according to good survey design principles Poorly formatted and presented on the page.

Example: Complex Instructions Instructions: There are 12 statements on this form. They are statements about families. You are to decide which of these statements are true of your family and which are false. If you think the statement is TRUE or MOSTLY TRUE of your family, please mark the box in the T (TRUE) column. If you think the statement is FALSE or MOSTLY FALSE of your family, please mark the box in the F (FALSE) column. You may feel that some of the statements are true for some family members and false for others. Mark the box in the T column if the statement is TRUE for most members. Mark the box in the F column if the statement is FALSE for most members. If the members are evenly divide, decide what is the stronger overall impression and answer accordingly. Remember, we would like to know what your family seems like to you. So do not try to figure out how other members see your family, but do give us your general impression of your family for each statement. Do not skip any item. Please begin with the first item. I thought this was a perfect example of how one might need to consider simplifying the instructions. And there are many existing measures with similar problems – if not quite so extreme. Family Environment Scale Instructions

Example: Burdensome Way of Responding For each question, choose from the following alternatives: 0 = Never 1 = Almost Never 2 = Sometimes 3 = Fairly Often 4 = Very Often 1. In the last month, how often have you felt nervous and “stressed”? ……………………………………. 1 2 3 4 2. In the last month, how often have you felt that things were going your way?.................................... Many instruments in order to save paper I guess present the response choices along with the instructions, and then ask respondents to use those to answer. In this example, at least the same numbers appear beside the items, but the respondent has to remember the meaning of each number or refer back and forth. As one gets to the bottom of the page of 22 items, this is cumbersome. I’ve also seen instruments where a blank space is provided alongside each item and respondents are supposed to write in the correct response number. S Cohen et al. J Health Soc Beh, 1983;24(4):385-396.

What Information is Used to Decide How to Modify a Measure? Information on conceptual differences in diverse population Including information to make revisions Identified through Qualitative research Published reviews of measures To modify a measure, we need information on which to base the modifications.

Basis: Qualitative Research Methods Focus groups In-depth qualitative interviews Expert panel reviews Standard pretests Cognitive interview pretests

Basis: Qualitative Research – Two Applications Explore concept definition in diverse group Independent of a particular measure Explore a specific measure in diverse group Conceptual adequacy Administration problems . Qualitative methods help us learn how a concept is defined by a new group – The results can provide information on: 1- how the concept differs from original 2 –how to make modifications, i.e., what added information is needed Qualitative methods can obtain feedback on an existing measure 1- can help find problems 2 – often provides solutions to problems

Basis: Published Reviews Increasingly – systematic reviews of how well existing measures work in diverse population groups Summaries across studies Results and recommendations provide basis for specific modifications The RCMAR’s published one such special issue including several reviews. Reviews on measures of physical activity, depression, and neighborhood environments in minority populations.

RJ Coates et al. Am J Clin Nutr; 1997;65(suppl):1108S-15S. Example: Published Review - Measures of Dietary Intake in Minority Populations Reviewed food frequency questionnaires for use in minority populations Performed well in some groups and poorly in others Group differences that could affect scores: Portion sizes differ Missing ethnic foods Could underestimate total intake and nutrients One example – Their suggestions provide a basis for modifying existing FFQs for use in diverse populations RJ Coates et al. Am J Clin Nutr; 1997;65(suppl):1108S-15S.

Types of Modifications Format or presentation Content Dimensions Item stems Response options Most modifications can be classified into CONTENT or FORMAT Three main types of content modifications:

Format/Presentation Modifications Goal: reduce respondent burden Improve appearance or way of responding Simplify instructions Modify format for responding Create more space, reduce crowded items Add illustrations Improve contrast, increase font size

Resource on Formatting Paul Mullin et al, Applying cognitive design principles to formatting HRQOL instruments, Quality of Life Research, 2000;9:13-27.

Types of Modifications Format or presentation Content Dimensions Item stems Response options Add Drop Replace Modify

Content Modification Example: Add Dimension Social support – typical dimensions Tangible, emotional, informational Older Korean/Chinese immigrants – additional dimension Language support Added to existing measure (based on focus group data) Help with translation at medical appointments Help to ask questions in English when on the phone Help to learn English Example by some RCMAR scholars – S Wong et al. Int J Health Human Dev, 2005;61:105-121.

Content Modification Example: Modify Item Stems If wording unclear, add parenthetical phrases Have you ever been told by a doctor that you have… Diabetes (sugar in blood or high sugar)? Hypertension (high blood pressure)? Anemia (low blood) Modifying item stems is often needed to meet the needs of those with limited English proficiency, lower levels of education, or limited literacy. This can be done without any substantive changes by simply adding parenthetical phrases as alternatives to words that might not be understood

Adding or Deleting a Response Choice If too few response choices Add an option within existing response scale If too many response choices Drop one option

Content Modification Example: Too Few Response Choices How much is each person (or group of persons) supportive for you at this time in your life. Your wife, husband, or significant other person:  - None  - Some  - A lot Often, for simplicity, measures were designed with only a few response choices. However, given the tendency of many to not endorse either extreme, this provides extremely limited variability. Although not done to my knowledge, this could benefit from the addition of 1-2 more levels. G Parkerson et al. Fam Med; 1991;23:357-60.

Content Modification Example: Replace Response Choices How much is each person (or group of persons) supportive for you at this time in your life. Your wife, husband, or significant other person:  - None  - Not at all  - Some  - A little  - A lot  - Moderately  - Quite a bit  - Extremely Often, for simplicity, measures were designed with only a few response choices. However, given the tendency of many to not endorse either extreme, this provides extremely limited variability. Although not done to my knowledge, this could benefit from the addition of 1-2 more levels.

Content Modification Example: Replace Response Choices Health Perceptions Scale in older adults e.g., My health is excellent, I expect my health to get worse Original: 1 - Definitely true 2 - Mostly true 3 - Don’t know 4 - Mostly false 5 - Definitely false Modified: 1 – Not at all true 2 – A little true 3 - Somewhat true 4 - Mostly true 5 – Definitely true Agree/disagree response scales, or bidirectional response scales are VERY hard for respondents to use. In a study of Thai older adults, this investigator replaced it with a unidimensional response scale, varying in the extent to which the statement was true. L. Thiamwong et al, Thai J Nursing Res, 2008:12(4):286-296.

Minor to Major Modifications? Each type of modification can hypothetically be rated on a continuum from having minor to major impact on reliability and validity of original measure Minor – slight changes in format/presentation …… Major – numerous changes in dimensions, items, and response choices Defining the middle of the continuum is harder than one would think.

Do Not Make Assumptions All modifications, no matter how small, can affect reliability and validity of original measure Do not guess impact of modifications Burden is on investigator to test modified measure But what about several very minor modifications? What we all agree on is that any modifications CAN affect the reliability and validity of the original measure. We therefore suggest that at this stage in modifications research, investigators need to test the modified measures

Recommendations for Testing Modified Measures Pretest modified measure extensively before fielding in new study Build in ability to do psychometric testing when measure is fielded Add validity variables (e.g., similar to original measure to test comparability) Add follow-up to assess test-retest reliability Two key recommendations here 1 – PRETEST 2 – build in ability to conduct the needed analyses to justify the modifications

Analyze Psychometric Adequacy of Modified Measure in New Study Modified measure should meet minimal criteria Item-scale correlations Internal-consistency reliability

Analyzing Modified Measure: Comparability to Original Measure Compare measurement results of modified measure to original measure Reliability (sample dependent) Factor structure Construct validity Sensitivity to change Rather than prove that the modified measure is BETTER than original, is it COMPARABLE. Does factor structure conform to that of the original measure? Does the measure correlate similarly with validity variables as original measure does? Does the modified measure detect as much change over time as the original?

Some Suggestions: Avoid Dropping Items If modifications are only changes in item stems: Retain all original items Add modified items at the end Can test original and modified measure Assumes only a few new items

Overview of Class 9 and 10 Analyzing pretest data Modifying/adapting measures Keeping track of your study measures Testing basic psychometric properties, creating summated ratings scales, and presenting measurement information Creating and presenting change scores

Questionnaire Guides Organizing your survey measures Keep track of measurement decisions Sample guide to measures (class 8) Documents sources of measures Any modifications, reason for modification

Sample “Summary of Survey Variables..” Handout Develop “codebook” of scoring rules Several purposes Variable list Meaning of scores (direction of high score) Special coding How missing data handled Type of variable (helps in analyses)

Overview of Class 9 and 10 Analyzing pretest data Modifying/adapting measures Keeping track of your study measures Testing basic psychometric properties, creating summated ratings scales, and presenting measurement information Creating and presenting change scores

On to Your Field Test or Study What to do once you have your baseline data How to create summated scale scores

Review Surveys for Data Quality Examine each survey in detail as soon as it is returned, and mark any.. Missing data Inconsistent or ambiguous answers Skip patterns that were not followed

Reclaim Missing and Ambiguous Data Go over problems with respondent If survey returned in person, review then If mailed, call respondent ASAP, go over missing and ambiguous answers If you cannot reach by telephone, make a copy for your files and mail back the survey with request to clarify missing data

Print Frequencies of Each Item and Review: Range Checks Verify that responses for each item are within acceptable range Out of range values can be checked on original questionnaire corrected or considered “missing” Sometimes out of range values mean that an item has been entered in the wrong column a check on data entry quality

Testing Scaling Properties and Reliability in Your Sample for Multi-Item Scales Obtain item-scale correlations Part of internal consistency reliability program Calculate reliability in your sample (regardless of known reliability in other studies) internal-consistency for multi-item scales test-retest if you obtained it

SAS/SPSS Both Make Item Convergence Analysis Easy Reliability programs provide: Item-scale correlations corrected for overlap Internal consistency reliability (coefficient alpha) Reliability with each item removed To see effect of removing an item

SAS – Obtaining Item-Scale Correlations and Coefficient Alpha PROC CORR DATA=data-set-name ALPHA VAR (list of variables) Output: Coefficient alpha Item correlations Item-scale correlations corrected for overlap SAS Manual, Chapter 3: Assessing Scale Reliability with Coefficient Alpha

Testing Reliability in STATA www.stata.com/help.egi?alpha Alpha varlist [if] [in] [, options] NOTE: item-rest correlations are those corrected for overlap

What to Look For Review unstandardized coefficient alpha and item-total or item-scale correlations (corrected for overlap) Each item should correlate at least .30 with total Internal consistency should be at least .70

Item-Scale Correlations If any items correlate <.30 with the sum of the other items in the scale.. How much lower than .30? How many items? Does omitting this item increase reliability substantially?

Item-Scale Correlations (cont) If you try deleting items Do it one item at a time – removing one item changes all item-scale correlations Sometimes removing the worst item corrects other problems

What if Reliability is Too Low? How much lower than .70? Does removing items with low item-scale correlations increase alpha? For new scales under development Modify using item-scale criteria For standard scales (published) Report problems as caveats in your analyses Create a modified scale Report results using standard and modified scale

Creating Summated Ratings Scale Scores After final items determined (meet criteria for item-scale correlations and internal consistency reliability) Translate your “codebook” scoring rules into program code (SAS, SPSS): Reverse items in wrong direction (e.g., higher = better) Average all items - allows score if any item is answered Apply missing data rule if different e.g., if more than 50% items missing

Review Summated Scores Descriptive Statistics Review for out-of-range values, outliers, expected mean For scores with problems, review programming statements, locate errors and correct Repeat process until computer algorithm is producing accurate scores To test programming accuracy calculate scores by hand from 2 questionnaires check that they match computer generated scores

Summarize Measurement Characteristics (Handout) Present for each final scale: % missing Mean, standard deviation Observed range, possible range Floor and ceiling effects, skewness statistic Range of item-scale correlations Number of item-scale correlations > .30 Internal consistency reliability

Overview of Class 9 and 10 Analyzing pretest data Modifying/adapting measures Keeping track of your study measures Testing basic psychometric properties, creating summated ratings scales, and presenting measurement information Creating and presenting change scores

Two Basic Types of Change Scores Measured change Difference in scores between baseline and follow-up Perceived change How much change respondent reports (from some prior time period)

Change Scores are Important Variables! Creating change score variables is complex Requires thought ahead of time Don’t rely on your programmer Include specification of change scores in your codebook

Measured Change Example: measure administered at baseline and 1 month after treatment Pain in past 2 weeks 0-10 numeric scale, 10 = worst pain Hypothetical results for 1 person Time 1 (baseline) – score of 5 Time 2 (one month) – score of 8

How Should Change be Scored? Time 1 (baseline) - score of 5 Time 2 (one month) - score of 8 Two options: Option 1: time 2 minus time 1 Option 2: time 1 minus time 2

How Should Change be Scored? (cont) Time 1 (baseline) - score of 5 Time 2 (one month) - score of 8 Two options: Option 1: time 2 minus time 1 = 3 Option 2: time 1 minus time 2 = -3

Interpreting Change Score What do you want the change score to indicate? Positive change score = improving? Positive change score = worsening? Scoring “rule” depends on: Direction of scores on original measure (is higher score better or worse?) Which was subtracted from which?

Define Change Score In Codebook: Algorithms You want positive score = improvement If high score on measure is better Time 2 minus time 1 If high score on measure is worse Time 1 minus time 2 You want positive score = decline If high score on measure is better Time 1 minus time 2 If high score on measure is worse Time 2 minus time 1

Recommendation: Make Change Score Intuitively Meaningful If high score on measure = better Calculate change score so positive change score = improved Time 2 minus time 1 If high score on measure = worse Calculate change scores so positive change score = improved Time 1 minus time 2

Interpreting “Measured Change” Scores: What is Wrong? In a study predicting utilization of health care (outpatient visits) over a 1-year period as a function of self-efficacy… A results sentence: “Reduced utilization at one year was associated with level of self efficacy at baseline (p < .01) and with 6-month changes in self efficacy (p < .05).”

Interpreting “Measured Change” Scores: Making it Clearer “Reduced outpatient visits at one year were associated with lower levels of self efficacy at baseline (p < .01) and with 6-month improvements in self efficacy.”

Two Basic Types of Change Scores Measured change Difference in scores between baseline and follow-up Perceived change How much change respondent reports (from some prior time period)

Perceived Change (Retrospective Change) How much has your physical functioning changed since your surgery? -3 very much worse -2 much worse -1 worse 0 no change 1 better 2 much better 3 very much better

Perceived/Retrospective Change Perceived change enables respondent to define a concept in terms of what it means to them Measured change is a change on specific questions that were contained in a particular measure

Example of Measured vs. Perceived Change Measuring change in physical functioning 2 months after abdominal surgery Case: woman has more problems bending over than before surgery

Measured Change Since Abdominal Surgery Physical functioning measured at baseline and 2 months after surgery Difficulty walking Difficulty climbing stairs Measured change: change on these specific physical functions Measured change will not detect change in bending over

Measuring Perceived Change in Physical Functioning To what extent did your physical functioning change since just before your surgery? Much worse Worse No change Better Much better If person considers bending over as part of physical functioning, she will report becoming worse

Recommendations: Include Both Types of Measures Measured change enables Comparison with other studies May be more sensitive - has more scale levels Investigator defines clinically relevant outcomes Perceived/Retrospective change enables Person to report on domain using their own definition Picks up changes “unmeasured” by particular measure

Thank you! Final paper due by December 10 See handout (posted on syllabus during week 7)