1 Reliability in Scales Reliability is a question of consistency do we get the same numbers on repeated measurements? Low reliability: reaction time High.

Slides:



Advertisements
Similar presentations
How good are our measurements? The last three lectures were concerned with some basics of psychological measurement: What does it mean to quantify a psychological.
Advertisements

Consistency in testing
Topics: Quality of Measurements
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
Some (Simplified) Steps for Creating a Personality Questionnaire Generate an item pool Administer the items to a sample of people Assess the uni-dimensionality.
Reliability and Validity checks S-005. Checking on reliability of the data we collect  Compare over time (test-retest)  Item analysis  Internal consistency.
Chapter 5 Reliability Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright ©2006.
The Department of Psychology
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Increasing your confidence that you really found what you think you found. Reliability and Validity.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Chapter 4 – Reliability Observed Scores and True Scores Error
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
1Reliability Introduction to Communication Research School of Communication Studies James Madison University Dr. Michael Smilowitz.
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
1 Hypothesis Testing Chapter 8 of Howell How do we know when we can generalize our research findings? External validity must be good must have statistical.
Reliability, the Properties of Random Errors, and Composite Scores.
Reliability and Validity of Research Instruments
RESEARCH METHODS Lecture 18
Reliability n Consistent n Dependable n Replicable n Stable.
Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.
Psych 231: Research Methods in Psychology
Research Methods in MIS
Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.
Measures of Central Tendency
Reliability, Validity, & Scaling
Measurement in Exercise and Sport Psychology Research EPHE 348.
Reliability Presented By: Mary Markowski, Stu Ziaks, Jules Morozova.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Reliability Lesson Six
Technical Adequacy Session One Part Three.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Reliability Chapter 3.  Every observed score is a combination of true score and error Obs. = T + E  Reliability = Classical Test Theory.
Reliability & Validity
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Experiment Basics: Variables Psych 231: Research Methods in Psychology.
Reliability, the Properties of Random Errors, and Composite Scores Week 7, Psych R. Chris Fraley
Chapter 2: Behavioral Variability and Research Variability and Research 1. Behavioral science involves the study of variability in behavior how and why.
1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.
Reliability n Consistent n Dependable n Replicable n Stable.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Reliability Ability to produce similar results when repeated measurements are made under identical conditions. Consistency of the results Can you get.
Chapter 6 - Standardized Measurement and Assessment
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Validity & Reliability. OBJECTIVES Define validity and reliability Understand the purpose for needing valid and reliable measures Know the most utilized.
Reliability When a Measurement Procedure yields consistent scores when the phenomenon being measured is not changing. Degree to which scores are free of.
Measuring Research Variables
WHS AP Psychology Unit 7: Intelligence (Cognition) Essential Task 7-3:Explain how psychologists design tests, including standardization strategies and.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.
Reliability Analysis.
Lecture 5 Validity and Reliability
Classical Test Theory Margaret Wu.
Journalism 614: Reliability and Validity
Reliability & Validity
Scoring: Measures of Central Tendency
Evaluation of measuring tools: reliability
RESEARCH METHODS Lecture 18
Reliability Analysis.
Reliability, the Properties of Random Errors, and Composite Scores
Inferential Statistics
Psychological Measurement: Reliability and the Properties of Random Errors The last two lectures were concerned with some basics of psychological measurement:
Chapter 8 VALIDITY AND RELIABILITY
Presentation transcript:

1 Reliability in Scales Reliability is a question of consistency do we get the same numbers on repeated measurements? Low reliability: reaction time High reliability: measuring weight Psychological tests fall inbetween

2 Reliability & Measurement error Measures are not perfectly reliable because they have error The “built in accuracy” of the scale Pokemon wristwatch vs. USN atomic clock We can express this as: X = T + e X = your measurement T = the “True score” e = the error involved in measuring it (+ or -)

3 Example: the effect of e Imagine we have someone with a “true” int score of 100. If your int scale has a large e, then you measurements will vary a lot (say from 60 all the way to 130) If your scale has a small e, your scale will vary a little (say from 90 to 110)

4 Measurements as distributions Think of e as variance in a distribution, with X as your mean Small e - scores clustered close to true score Large e - scores all over the place! (hard to say what the true score is)

5 More on the error Measures with a large e are dodgy (hides the true score) We can reduce the size of e, but not eliminate it completely Measuring reliability is measuring the impact of e

6 Different forms of reliability Reliability (“effect of e”) can be very hard to conceptualise To help, we break it up into 2 subclasses Temporal stability If I measure you today and tomorrow, do I get the same result? Internal consistency Are all the questions in the test measuring the same thing?

7 Temporal stability The big idea: If I test you now, and then I test you tomorrow, I should get the same result Why have it? Can’t measure changes otherwise! Tells us that we can trust results (small time error) Tells us that there is no learning effect

8 Measuring temporal stability How can we measure if a test is temporally stable? The problem: we have 2 sets of scores. We need to see if they are the same Solution: Use a correlation. If the two sets are strongly related, then they are basically the same

9 Example: Correlations & Stability Imagine a test with ten questions, and a person does it twice (on Monday and Wednesday): M: W: Are these scores the same? (r = 0.897)

10 Example: correlations & stability Now imagine a crappy scale: M: W: Are these scores basically the same (r = 0.211)

11 Different approaches to stability There are a two main ways of testing temporal stability Test-retest method: give the same test to the same people Alternate forms: give a highly similar test to the same people

12 Test-retest method Method: 1. Select a group of people 2. Give them your test 3. Get them to come back later 4. Give them the test again 5. Correlate the results to see

13 Things to note It must be the same people We want to know that if client X returns, we can measure that person again The amount of time between tests depends on your requirements The correlation value must be very high - above 0.85

14 Why it works We get 2 results from each person to compare this means we can draw rely on the test to work for the same people We use a lot of people in our assessment this means that we can rely on the test, regardless of who our client is The correlation tells us the degree to which the 2 tests agree (R 2 is the % they agree)

15 The learning effect What if you have a test where learning/practice can affect your score? Eg. class test The Test-retest method will always yield poor correlations people will always score higher marks the second time around This will make it look as if temporal stability is poor!

16 Alternate forms reliability Answer: do test-retest, but don’t use the same test twice Use a highly similar test In order for this to work, both forms must be equally difficult The more similar, the better

17 Making alternate forms of a test Simple to ensure both forms are equally difficult Make twice as many questions as you will want in the test Randomly divide them up into two halves Each half is a test! The random division ensures both forms are equally difficult

18 The procedure: alternate forms Once you have your 2 forms: Collect a sample of people Give them the first form of the test wait a while Give them the second form of the test Correlate the results If the correlation is high (>.85), you have stability

19 Which to use: alternate forms or test- retest? If you are measuring something which can be learned/perfected by practice - alternate forms If not, you could choose Test-retest if preferable, removes confound about difficulty In many cases, you don’t really know if learning is an issue Alternate forms is “safer”, but poorer statistically

20 What if you don’t have temporal stability? Temporal stability is not required for all tests Most important for tests which work longitudinally Very important if you want to track changes over time Excludes all “once off” tests (eg. aptitude tests)

21 Internal consistency A different type of reliability The big idea: Are all the questions in my test tapping into the same thing? (or, are some questions irrelevant) All tests require this property

22 Why it’s important Imagine we have an arithmetic ability test, with 4 questions: 1. What is 5 x 3 2. What is What is the capital of the Ukraine 4. What is 5 x 2 + 3

23 Why it’s important Item 3 does not contribute to measuring arithmetic Someone who is a maths wiz (should get 4/4) might only get 3/4 A complete maths idiot (should get 0/4) could get 1/4 It does not belong in this test! If we include it in our total, it will confuse us Items such as this become “third variables”

24 How do we know if an item belongs? We need to figure out if a particular item is testing the same thing as the others We can correlate the item’s scores with the scores of some other item we do know belongs High correlation (above 0.85) - it tests the same thing Low correlation (below 0.85) - it measures something else

25 Our example again Some people who know maths, will also know geography But not everyone! Correlate Q1 to Q3 - it will be weak Those who know arithmetic will know how to do the other items Correlate Q1 to Q2 or Q4, all will give a high correlation

26 Doing it for real Problem: how do we know which items are suspect? Any item could be at fault Not always ovious Solution - check them all Split half method Cronbach’s Alpha

27 Split half approach Basic idea: check one half of the test against the other half If first half correlates well to the other half, then they are tapping into the same thing Problem to overcome: each half of the test must be the same difficulty

28 Split half - procedure Give a bunch of people your test Decide on how to split the test in half Correlate the halves If the correlation is high (above 0.85), the test is reliable

29 Where to split? Problem: how do we split the test? First 10 Q vs last 10? Odd numbered Q vs Even numbered Q? Any method is acceptable, as long as the halves are of equivalent difficulty How do you show that? Not by correlation - paradox! (low r could be difficulty or reliability!)

30 Cronbach’s  coefficient A major problem with split-half approach How do you know that inside a half there aren’t a few bad items? Catches most, but not all Solution: Select another half to split at But: if you have the same number of bad items in each half, they balance out - hidden!

31 The splitting headache Imagine you have a few bad items, evenly spread in the test: (Black bars are bad items) If you use a first 3/ last 3 split, end up with one bad item in each half, so they are balanced out (hidden) If you use a even/odd split, they are balanced out as well (hidden) How do you split?

32 A solution to splitting Remember: we don’t know which the bad ones are Can’t make bizzare splits to work around them Solution: brute force! Work out the correlations between every possible split, and average them out!

33 Cronbach’s  Not to be confused with  (prob of Type I error) from significance tests! Works out the correlation between each half and each other half, and averages them out Impossible for bad items to “hide” by balancing out

34 Interpreting Cronbach’s  Gives numbers between 0 and 1 Needs to be very high (above 0.9) It is a measure of homogeneity of the test If your test is designed to measure more than one thing, the score will be low

35 Other forms of reliability Kuder-Richardson formula 20 (KR20) Like Cronbach’s alpha, but specialized for correct/incorrect type answers Inter-scorer reliability for judgement tests to what degree do several judges agree on the answer expressed as a correlation