Download presentation
Presentation is loading. Please wait.
1
Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5
2
Let’s take a quiz…
3
In a nutshell… Reliability is the consistency of a measurement. Today and next class we will look at: 6 ways to estimate reliability The standard error of measurement What influences reliability Cut scores One slide on usability or practical considerations
4
Reliability via a mathematical proof. X = T + E X = Obtained or observed score (fallible) T = True score (reflects stable characteristics) E = Measurement error (reflects random error)
5
Some initial thoughts… All measurement is susceptible to error. Any factor that introduces error into the measurement process effects reliability. To increase our confidence in scores, we try to detect, understand, and minimize random measurement error.
6
More initial thoughts Just like validity, when talking about reliability one talks about the scores, not the test or assessment. Validity is the sum of its parts, reliability is only valid in its parts. (let me explain…) One must have reliability to have validity. (Again, validity must have consistency.)
7
Error from Content Sampling… Many theorists believe the major source of random measurement error is “Content Sampling.” Since tests represent performance on only a sample of behavior, the adequacy of that sample is very important.
8
And from temporal instability… Random measurement error can also be the result “temporal instability:” Random and transient: - Situation-Centered influences (e.g., lighting & noise) - Person-Centered influences (e.g., fatigue, illness)
9
Or error comes from… Administration errors (e.g., incorrect instructions, inaccurate timing). Scoring errors (e.g., subjective scoring, clerical errors).
10
Back to correlations… -1.00 to 1.00 Let’s look on p.106 Correlation coefficients – 2 scores from same group Ex - # of kids and level of stress Validity coefficients are based on some other criterion – predicts or estimates Ex – SAT r GPA Reliability coefficients are based on results within the same procedure – measure same construct Let’s look at 6 options…
11
Method #1 – Test-Retest What if I gave you the exact same quiz right now? Again? How many of you asked classes ahead of you in the day what was on a quiz? This measures the “stability” of the test. The longer one goes before the retest, the more external factors influence the reliability Two examples GRE Scores and 5 years IEP re-evaluations every 3 years
12
Method #2 – Equivalent Forms Just make sure they are equivalent All standardized tests use this This measures the equivalence of the test Now your book uses the phrase, “in close succession” Like, the same time close
13
Method #3 – Test-retest with Equivalent Forms One gets both stability and equivalence How many took the SAT or ACT more than once? Improve? The best method in determining reliability is test-retest with equivalent forms
14
Method #4 – Split-halves This you can calculate with only 1 administration of the test. Let’s do it to your quiz Spearman’s Rho or Spearman-Brown formula = 2(r of the halves)/1+r Do an example… This is a reliability r that estimates the full reliability. This is used to measure internal consistency
15
Method #5 – Coefficient Alpha Also known as Kuder-Richardson Formula, or KR-20 if dichotomous variables Also is calculated with only one administration of the test to measure internal consistency Technically, it is the average of all split- half coefficients Should be used when the test is specific to a topic
16
Method #6 – Interrater Consistency Think of the Russian judge in Olympic figure skating How do they handle this now? Table on p.113 Table on p.114 is more likely Raters are trained using rubrics, etc… Back to figure skating…
17
In summing up these methods… Test-retest with equivalent forms is the best for determining reliability See p.115 Each one can be used for specific types of assessments. It is the assessment which determines which type of reliability you will report.
18
Standard Error of Measurement You scored x and you scored y; so what did you really score? Scores vary, this estimates how much. Computers make this somewhat doable. Let me show the bell curve and explain standard error of measurement (it’s kinda like standard deviations…) Use IQ for an example…
19
More on SE M High reliability coefficients gives small SE M, and vice-versa. Statistics can calculate a table… p.122 Really see p.117 SE M = 9.3, John scores a 722 and Mary a 732, are Mary and John different? Finally, two great things: The units one measures with in SE M are in the same units as the measurement itself, and The SE M is normally consistent from group to group on the same assessment.
20
Factors that influence reliability Have you ever taken a test in a hot room? Hungry? Tired? Three the book deals with
21
Factor #1 Number of Items or Assessment Tasks Generally, the more the items, the higher the reliability will be. Adding 10 very hard or very easy questions would not increase reliability at all, because…
22
Factor #2 Spread of Scores The larger the spread, the higher the reliability 90-100 or 0-200? My physics story
23
Factor #3 Objectivity Back to MC tests Back to the Russian judges
24
Fixed Standards (cut scores) Why my cousin-in-law says we test… Do they meet a standard? GWB Who cares the score, can they do it? Back to a rater table…sort of… p126 Cut scores are dangerous, where to put the line? SE M ?
25
Now for the practical… Are the tests easy to administer? Do you have time constraints? Can you easily score these? Do these scores apply to what you are after? Can you get an equivalent measure? And it’s all about money.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.