Chapter 5 Measurement, Reliability and Validity
CHAPTER OBJECTIVES STUDENTS SHOULD BE ABLE TO: Explain why measurement is important to the research process. Discuss the four levels of measurement and provide an example of each. Explain the concept of reliability in terms of observed score, true score, and error. Describe the two elements that can make up an error score. List methods for increasing reliability. Discuss four ways in which reliability can be examined. Provide a conceptual definition of validity. List the three traditional types of validity. Explain the relationship between reliability and validity.
CHAPTER OVERVIEW The Measurement Process Levels of Measurement Reliability and Validity: Why They Are Very, Very Important Validity The Relationship Between Reliability and Validity Closing (and Very Important) Thoughts
THE MEASUREMENT PROCESS
THE MEASUREMENT PROCESS Two definitions Stevens—“assignment of numerals to objects or events according to rules.” “…the assignment of values to outcomes.” Chapter foci Levels of measurement Reliability and validity
LEVELS OF MEASUREMENT Level of Measurement For Example Quality of Level Ratio Rachael is 5’ 10” and Gregory is 5’ 5” Absolute zero Interval Rachael is 5” taller than Gregory An inch is an inch is an inch Ordinal Rachael is taller than Gregory Greater than Nominal Rachael is tall and Gregory is short Different from Variables are measured at one of these four levels Qualities of one level are characteristic of the next level up The more precise (higher) the level of measurement, the more accurate is the measurement process
NOMINAL SCALE Qualities Example What You Can Say What You Can’t Say NOMINAL SCALE Qualities Example What You Can Say What You Can’t Say Assignment of labels Gender— (male or female) Preference— (like or dislike) Voting record— (for or against) Each observation belongs in its own category An observation represents “more” or “less” than another observation
ORDINAL SCALE Qualities Example What You Can Say What You Can’t Say Assignment of values along some underlying dimension Rank in college Order of finishing a race One observation is ranked above or below another. The amount that one variable is more or less than another
INTERVAL SCALE Qualities Example What You Can Say What You Can’t Say Equal distances between points Number of words spelled correctly Intelligence test scores Temperature One score differs from another on some measure that has equally appearing intervals The amount of difference is an exact representation of differences of the variable being studied
RATIO SCALE Qualities Example What You Can Say What You Can’t Say Meaningful and non-arbitrary zero Age Weight Time One value is twice as much as another or no quantity of that variable can exist Not much!
CONTINUOUS VERSUS DISCRETE VARIABLES Continuous variables Values can range along a continuum E.g., height Discrete variables (categorical) Values are defined by category boundaries E.g., gender
WHAT IS ALL THE FUSS? Measurement should be as precise as possible In psychology, most variables are probably measured at the nominal or ordinal level But—how a variable is measured can determine the level of precision
RELIABILITY AND VALIDITY: WHY THEY ARE VERY, VERY IMPORTANT
RELIABILITY AND VALIDITY Reliability—tool is consistent Validity—tool measures “what-it-should” Good assessment tools Rejection of Null hypotheses OR Acceptance of Research hypotheses
A CONCEPTUAL DEFINITION OF RELIABILITY Method Error Observed Score = True Score + Error Score Trait Error
A CONCEPTUAL DEFINITION OF RELIABILITY Method Error Observed Score = True Score + Error Score Trait Error Observed score Score actually observed Consists of two components True Score Error Score
A CONCEPTUAL DEFINITION OF RELIABILITY Method Error Observed Score = True Score + Error Score Trait Error True score Perfect reflection of true value for individual Theoretical score
A CONCEPTUAL DEFINITION OF RELIABILITY Method Error Observed Score = True Score + Error Score Trait Error Error score Difference between observed and true score
A CONCEPTUAL DEFINITION OF RELIABILITY Method Error Observed Score = True Score + Error Score Trait Error Method error is due to characteristics of the test or testing situation Trait error is due to individual characteristics Conceptually, reliability = Reliability of the observed score becomes higher if error is reduced!! True Score True Score + Error Score
INCREASING RELIABILITY Decreasing Error Increase sample size Eliminate unclear questions Standardize testing conditions Moderate the degree of difficulty of the tests Minimize the effects of external events Standardize instructions Maintain consistent scoring procedures
HOW RELIABILITY IS MEASURED Reliability is measured using a Correlation coefficient r test1•test2 Reliability coefficients Indicate how scores on one test change relative to scores on a second test Can range from -1.0 to +1.0 +1.00 = perfect reliability 0.00 = no reliability
TYPES OF RELIABILITY Type of Reliability What It Is How You Do It What the Reliability Coefficient Looks Like Test-Retest A measure of stability Administer the same test/measure at two different times to the same group of participants rtest1•test1 Parallel Forms A measure of equivalence Administer two different forms of the same test to the same group of participants rform1•form2 Inter-Rater A measure of agreement Have two raters rate behaviors and then determine the amount of agreement between them Percentage of agreements Internal Consistency A measure of how consistently each item measures the same underlying construct Correlate performance on each item with overall performance across participants Cronbach’s alpha Kuder-Richardson
VALIDITY
VALIDITY A valid test does what it was designed to do A valid test measures what it was designed to measure
A CONCEPTUAL DEFINITION OF VALIDITY Validity refers to the test’s results, not to the test itself Validity ranges from low to high, it is not “either/or” Validity must be interpreted within the testing context
TYPES OF VALIDITY Type of Validity What Is It? How Do You Establish It? Content A measure of how well the items represent the entire universe of items Ask an expert if the items assess what you want them to Criterion Concurrent A measure of how well a test estimates a criterion Select a criterion and correlate scores on the test with scores on the criterion in the present Predictive A measure of how well a test predicts a criterion Select a criterion and correlate scores on the test with scores on the criterion in the future Construct A measure of how well a test assesses some underlying construct Assess the underlying construct on which the test is based and correlate these scores with the test scores
HOW TO ESTABLISH CONSTRUCT VALIDITY OF A NEW TEST Correlate new test with an established test Show that people with and without certain traits score differently Determine whether tasks required on test are consistent with theory guiding test development
MULTITRAIT-MULTIMETHOD MATRIX Impulsivity Trait 2 Activity Level Method 1 Paper and Pencil Method 2 Activity Level Monitor Trait 1 Moderate Low Impulsivity Trait 2 Activity Level Convergent validity—different methods yield similar results Discriminant validity—different methods yield different results
THE RELATIONSHIP BETWEEN RELIABILITY AND VALIDITY
THE RELATIONSHIP BETWEEN RELIABILITY AND VALIDITY A valid test must be reliable But A reliable test need not be valid
CLOSING (AND VERY IMPORTANT) THOUGHTS
CLOSING (AND VERY IMPORTANT) THOUGHTS You must define a reliable and valid dependent variable or you will not know whether or not there truly is no difference between groups! Use a test with established and acceptable levels of reliability and validity. If you cannot do this, develop such a test for your thesis or dissertation (and do no more than that) OR change what you are measuring.
HAVE WE MET THE OBJECTIVES? CAN YOU: Explain why measurement is important to the research process? Discuss the four levels of measurement and provide an example of each? Explain the concept of reliability in terms of observed score, true score, and error? Describe the two elements that can make up an error score? List methods for increasing reliability? Discuss four ways in which reliability can be examined? Provide a conceptual definition of validity? List the three traditional types of validity? Explain the relationship between reliability and validity?