Reliability & Validity

Reliability & Validity
Chapter 4 Reliability & Validity

Reliability & Validity
Aids in determining test accuracy and dependability Reliability—the dependability or consistency of an instrument across time or items. Validity—the degree to which an instrument measures what it was designed to measure. Instruments should have both properties but may have only one (not that a strong of an instrument)

Correlation (r) Correlation—the degree of relationship between two variables. Two administrations of the same test Administration of equivalent forms Correlation coefficient ranges: to -1.00 Perfect positive correlation = +1.00 Perfect negative correlation = -1.00 No correlation = 0 Numbers closer to represent stronger relationships The greater degree of the relationship, the more reliable the instrument. The + does not indicate strength, but direction.

Scattergram Scattergrams provide a graphic representation of a data set and show a correlation. The more closely the dots on a scattergram approximate a straight line, the nearer to perfect the correlation.

Types of Correlation Positive Correlation Negative Correlation No
Variables with a positive relationship move in the same direction. Scores on variables increase simultaneously. High scores on one variable are associated with low scores on another variable. When data from two variables are not associated or have no relationship. No linear direction on a scattergram

Methods of Measuring Reliability
Pearson’s r Pearson’s Product Moment correlation Used with interval or ratio data Internal Consistency The consistency of items on an instrument to measure a skill, trait or domain. Test-retest Equivalent forms Split-half Kuder-Richardson formulas

Test-Retest Reliability
Test-retest reliability—the trait being measured is one that is stable over time. If the trait being measured remains constant, the re-administration of the instrument will result in scores similar to the first score. Important to conduct re-test shortly after first test to control for influencing variables. Difficulties: Too soon: Students may remember test items (practice effect) and score higher the second time. Too far: Greater influence of time variables (e.g., learning, maturation, etc.)

Equivalent (Alternate) Forms Reliability
Equivalent forms reliability Two forms of the same instrument are used. Items are matched for difficulty. Advantage: Two tests of the same difficulty level that can be administered within a short time frame without the influence of practice effects.

Internal Consistency Measures
Split-Half Reliability Takes all available items on a test and divides the items in half. Establishes reliability of half the test with the other half. Does not establish reliability of the entire test—reliability increases with the number of items. Kuder-Richardson 20 Used to check consistency across items of an instrument with right or wrong answers. Coefficient Alpha Used to check consistency across items of an instrument where credit varies across responses.

Interrater Reliability
The consistency of a test across examiners. One person administers a test, a second person rescores the test. The scores are then correlated to determine how much variability exists between the scores. Very important for subjective-scoring tests.

Which Type of Reliability is Best?
Three reliability types Consistency over time Consistency of items on a test Consistency of scorers Optimal r scores .60 is adequate .80 is very good (preferred) Which one is chosen depends upon the purpose of the assessment. Reliability coefficient is a group statistic and can be influenced by the make-up of the group. It is important to review the manual to determine the make-up of the group.

Standard Error of Measurement
Basic assumption of assessment: ERROR EXISTS Variables that affect scores exist for a variety of reasons: Poor testing environment Errors in the test Student variables (e.g., hungry, tired) This variance is called error and is the standard error of measurement. Instruments with small standard error of measurement are preferred. A single test may not accurately reflect a student’s true score.

Calculating Standard Error of Measurement
To estimate the amount of error present in an obtained score SEM = SD √ 1 – r SEM = Standard Error of Measurement SD = Standard Deviation r = Reliability coefficient SEM is based on normal distribution theory. Confidence Interval The range of scores for an obtained score + the SEM

Application of SEM The range of error and the range of a student’s score may vary substantially, which may change the interpretation of the score for placement purposes. SEM varies by age, grade and subtest. When SEM is applied to scores, discrepancies may not be significant.

Estimated True Scores A method of calculating the amount of error correlated with the distance of the score from the mean of the group. The further a score is from the mean, the greater chance for error. A true score is always assumed to be nearer to the mean than the obtained score. Estimated true scores can be used to establish a range of scores. Estimated True Scores = M + r (X - M) M = mean of group r = reliability coefficient X = obtained score

Test Validity Does the test actually measure what it is supposed to measure? Criterion-related validity: Comparing scores with other criteria known to be indicators of the same trait or skill Concurrent Validity: Two tests are given within a very short timeframe (often the same day). If scores are similar, the tests are said to be measuring the same trait. Predictive Validity: Measures how well an instrument can predict performance on some other variable.

Content Validity Ensuring that the items in a test are representative of content purported to be measured. PROBLEM: Teachers often generalize and assume the test covers more than it does (e.g., the WRAT-3 reading subtest only measures word recognition—not phonemic awareness, phonics, vocabulary, reading comprehension, etc.). Some of the variables of content validity may influence the manner in which results are obtained and can contribute to bias in testing. Presentation Format: The method by which items are presented to the student Response Mode: The method for the examinee to answer items.

Construct Validity A term used to describe a psychological trait, personality trait, psychological concept, attribute or theoretical characteristic. The construct must be clearly defined although they are often abstract concepts. Types of studies that can establish construct validity Developmental changes Correlations with other tests Factor analysis Internal consistency Convergent and discriminate validation Experimental interventions

Validity of Test ~v~ Validity of Use
Tests may be used inappropriately even though they are valid instruments. Results obtained may be used an in invalid manner. Tests may be biased and/or discriminate against different groups. Item bias, when an item is answered incorrectly a disproportionate number of times by one group compared to another. Predictive validity may predict accurately for one group and not another.

Reliability & Validity

Similar presentations

Presentation on theme: "Reliability & Validity"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reliability & Validity

Similar presentations

Presentation on theme: "Reliability & Validity"— Presentation transcript:

Similar presentations

About project

Feedback