Reliability, the Properties of Random Errors, and Composite Scores

Slides:



Advertisements
Similar presentations
Standardized Scales.
Advertisements

Chapter 8 Flashcards.
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
How good are our measurements? The last three lectures were concerned with some basics of psychological measurement: What does it mean to quantify a psychological.
Conceptualization and Measurement
Taking Stock Of Measurement. Basics Of Measurement Measurement: Assignment of number to objects or events according to specific rules. Conceptual variables:
Some (Simplified) Steps for Creating a Personality Questionnaire Generate an item pool Administer the items to a sample of people Assess the uni-dimensionality.
1 Reliability in Scales Reliability is a question of consistency do we get the same numbers on repeated measurements? Low reliability: reaction time High.
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Quiz Do random errors accumulate? Name 2 ways to minimize the effect of random error in your data set.
Validity In our last class, we began to discuss some of the ways in which we can assess the quality of our measurements. We discussed the concept of reliability.
Reliability, the Properties of Random Errors, and Composite Scores.
Data Analysis Outline What is an experiment? What are independent vs. dependent variables in an experimental study? What are our dependent measures/variables.
Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
Measurement: Reliability and Validity For a measure to be useful, it must be both reliable and valid Reliable = consistent in producing the same results.
Operational Definitions In our last class, we discussed (a) what it means to quantify psychological variables and (b) the different scales of measurement.
Quiz Name one latent variable Name 2 manifest variables that are indicators for the latent variable.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Reliability, Validity, & Scaling
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Goals for Today Review the basics of an experiment Learn how to create a unit-weighted composite variable and how/why it is used in psychology. Learn how.
User Study Evaluation Human-Computer Interaction.
Multiple linear indicators A better scenario, but one that is more challenging to use, is to work with multiple linear indicators. Example: Attraction.
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Operational Definitions In our last class, we discussed (a) what it means to quantify psychological variables and (b) the different scales of measurement.
Validity In our last class, we began to discuss some of the ways in which we can assess the quality of our measurements. We discussed the concept of reliability.
Multiple linear indicators A better scenario, but one that is more challenging to use, is to work with multiple linear indicators. Example: Attraction.
Operational definitions and latent variables As we discussed in our last class, many psychological variables of interest cannot be directly observed These.
Reliability, the Properties of Random Errors, and Composite Scores Week 7, Psych R. Chris Fraley
Chapter 2: Behavioral Variability and Research Variability and Research 1. Behavioral science involves the study of variability in behavior how and why.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Validity and Reliability in Instrumentation : Research I: Basics Dr. Leonard February 24, 2010.
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Psychology 3051 Psychology 305A: Theories of Personality Lecture 2 1.
Reliability When a Measurement Procedure yields consistent scores when the phenomenon being measured is not changing. Degree to which scores are free of.
Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Measurement and Scaling Concepts
Survey Methodology Reliability and Validity
MGMT 588 Research Methods for Business Studies
Reliability Analysis.
Ch. 5 Measurement Concepts.
Product Reliability Measuring
Catching Up: Review.
Assessment Theory and Models Part II
Virtual COMSATS Inferential Statistics Lecture-26
Measurement: Part 1.
Reliability.
CHAPTER 5 MEASUREMENT CONCEPTS © 2007 The McGraw-Hill Companies, Inc.
assessing scale reliability
Classical Test Theory Margaret Wu.
Measurement with Numbers Scaling: What is a Number?
پرسشنامه کارگاه.
Lecture 6 Structured Interviews and Instrument Design Part II:
5. Reliability and Validity
Questionnaire Reliability
Reliability and Validity of Measurement
PSY 614 Instructor: Emily Bullock, Ph.D.
Measurement: Part 1.
Evaluation of measuring tools: reliability
RESEARCH METHODS Lecture 18
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Reliability Analysis.
Psychological Measurement: Reliability and the Properties of Random Errors The last two lectures were concerned with some basics of psychological measurement:
Measurement: Part 1.
Presentation transcript:

Reliability, the Properties of Random Errors, and Composite Scores Lecture 7, Psych 350 - R. Chris Fraley http://www.yourpersonality.net/psych350/fall2012/

Reliability Reliability: the extent to which measurements are free of random errors. Random error: nonsystematic mistakes in measurement misreading a questionnaire item observer looks away when coding behavior response scale not quite fitting

Reliability What are the implications of random measurement errors for the quality of our measurements?

Reliability O = T + E + S O = T + E O = a measured score (e.g., performance on an exam) T = true score (e.g., the value we want) E = random error S = systematic error O = T + E (we’ll ignore S for now, but we’ll return to it later)

Reliability O = T + E The error becomes a part of what we’re measuring This is a problem if we’re operationally defining our variables using equivalence definitions because part of our measurement is based on the true value that we want and part is based on error. Once we’ve taken a measurement, we have an equation with two unknowns. We can’t separate the relative contribution of T and E. 10 = T + E

Reliability: Do random errors accumulate? Question: If we aggregate or average multiple observations, will random errors accumulate?

Reliability: Do random errors accumulate? Answer: No. If E is truly random, we are just as likely to overestimate T as we are to underestimate T. Height example

5’2 5’3 5’4 5’5 5’6 5’7 5’8 5’9 5’10 5’11 6 6’1 6’2 6’3 6’4 6’5 6’6 6’7 6’8 6’9 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81

Reliability: Do random errors accumulate? Note: The average of the seven O’s is equal to T

Composite scores These demonstrations suggest that one important way to help eliminate the influence of random errors is to aggregate multiple measurements of the same construct. Composite scores. use multiple questionnaire items in surveys of an attitude, behavior, or trait use more than one observer when coding behavior use observer- and self-reports when possible

Example: Self-esteem survey items 1. I feel that I'm a person of worth, at least on an equal plane with others. Strongly Disagree 1 2 3 4 5 Strongly Agree 2. I feel that I have a number of good qualities. Strongly Disagree 1 2 3 4 5 Strongly Agree 4. I am able to do things as well as most other people. Strongly Disagree 1 2 3 4 5 Strongly Agree

Composite self-esteem score = (4 + 5 + 3)/3 = 4 Example: Self-esteem survey items 1. I feel that I'm a person of worth, at least on an equal plane with others. Strongly Disagree 1 2 3 4 5 Strongly Agree 2. I feel that I have a number of good qualities. Strongly Disagree 1 2 3 4 5 Strongly Agree 4. I am able to do things as well as most other people. Strongly Disagree 1 2 3 4 5 Strongly Agree Composite self-esteem score = (4 + 5 + 3)/3 = 4

Two things to note about aggregation Reverse Keyed Items Some measurements are keyed in the direction opposite of the construct of interest. High values represent low values on the trait of interest.

Inappropriate composite self-esteem score = (5 + 5+ 1 + 4 + 1)/5 = 3.2 Example: Self-esteem survey items 1. I feel that I'm a person of worth, at least on an equal plane with others. Strongly Disagree 1 2 3 4 5 Strongly Agree 2. I feel that I have a number of good qualities. Strongly Disagree 1 2 3 4 5 Strongly Agree 3. All in all, I am inclined to feel that I am a failure. Strongly Disagree 1 2 3 4 5 Strongly Agree 4. I am able to do things as well as most other people. Strongly Disagree 1 2 3 4 5 Strongly Agree 5. I feel I do not have much to be proud of. Strongly Disagree 1 2 3 4 5 Strongly Agree Inappropriate composite self-esteem score = (5 + 5+ 1 + 4 + 1)/5 = 3.2

Reverse keying: Transform the measures such that high scores become low scores and vice versa. Example: Self-esteem survey items 1. I feel that I'm a person of worth, at least on an equal plane with others. Strongly Disagree 1 2 3 4 5 Strongly Agree 2. I feel that I have a number of good qualities. Strongly Disagree 1 2 3 4 5 Strongly Agree 3. All in all, I am inclined to feel that I am a failure. Strongly Disagree 1 2 3 4 5 Strongly Agree 4. I am able to do things as well as most other people. Strongly Disagree 1 2 3 4 5 Strongly Agree 5. I feel I do not have much to be proud of. Strongly Disagree 1 2 3 4 5 Strongly Agree Appropriate composite self-esteem score = (5 + 5+ 5 + 4 + 5)/5 = 4.8

A simple algorithm for reverse keying in SPSS or Excel New X = Max + Min - X Max represents the highest possible value (5 on the self-esteem scale). Min represents the lowest possible value (1 on the self-esteem scale).

Cautions: Two potential problems with aggregation Example: stress Person Heart rate Complaints Average/composite A 80 2 41 B 80 3 42 C 120 2 61 D 120 3 62

Cautions: Two potential problems with aggregation Example: stress Person Heart rate Complaints Average/composite A 80 2 41 B 80 3 42 C 120 2 61 D 120 3 62 The first problem is that the metric for the composite doesn’t make much sense. Person A: 2 complaints + 80 beats per minute = 41 complaints/beats per minute???

Two things to note about aggregation Second, the variables may have different variances. If this is true, then some indicators will “count” more in the average than others.

Example: stress Person Heart rate Complaints Average/composite B 80 3 42 C 120 2 61 D 120 3 62 The correlation between the composite and HR is .99. The correlation between the composite and Complaints is .05. Beats per minute Number of complaints

Two things to note about aggregation One common solution to these problems is to standardize the variables before aggregating them. Constant mean and variance

Variables with a large range/variance will influence the composite score more than variable with a small range. Standardization helps solve this problem. Person Heart rate(z) Complaints(z) Average A -.87 -.87 -.87 B -.87 .87 0 C .87 -.87 0 D .87 .87 .87 The correlation between the composite and HR is .71. The correlation between the composite and Complaints is .71.

Reliability: Estimating reliability Question: How can we quantify the reliability of our measurements? Answer: Two common ways: (a) test-retest reliability (b) internal consistency reliability

Reliability: Estimating reliability Test-retest reliability: Reliability assessed by measuring something at least twice at different time points. Test-retest correlation. The logic is as follows: If the errors of measurement are truly random, then the same errors are unlikely to be made more than once. Thus, to the degree that two measurements of the same thing agree, it is unlikely that those measurements contain random error.

r = .92 r = .27

Reliability: Estimating reliability Internal consistency: Reliability assessed by measuring something at least twice within the same broad slice of time. Split-half: based on an arbitrary split (e.g, comparing odd and even, first half and second half). Split-half correlation. Cronbach’s alpha (): based on the average of all possible split-half correlations.

The reliability of the composite (a) increases as the number of measurements (k) increases. In fact, the reliability of the composite can get relatively high even if the items themselves do not correlate strongly. Ave r = .50 Ave r = .25 Ave r = .10

Ave r = .10 Ave r = .10

Reliability: Final notes An important implication: As you increase the number of measures, the amount of random error in the averaged measurement decreases. An important assumption: The entity being measured is not changing. An important note: Common indices of reliability range from 0 to 1—in the metric of correlation coefficients; higher numbers indicate better reliability (i.e., less random error).