Standardized Test Scores Common Representations for Parents and Students
Types of Standardized Tests Norm Referenced Test: Normed using large groups of test takers. Compares one taker to another. Measure achievement, predicts future performance. Criterion Referenced Test: Measure a student against a specific set of knowledge (criterion).
Criterion-Referenced Tests To determine whether each student has achieved specific skills or concepts. To find out how much students know before instruction begins and after it has finished. Measures specific skills which make up a designated curriculum. These skills are identified by teachers and curriculum experts.
Each skill is expressed as an instructional objective. Each individual is compared with a preset standard for acceptable achievement. The performance of other examinees is irrelevant. Each skill is tested by at least four items in order to obtain an adequate sample of student performance and to minimize the effect of guessing. The items which test any given skill are parallel in difficulty.
Norm-Referenced Tests To rank each student with respect to the achievement of others in broad areas of knowledge. To discriminate between high and low achievers. Measures broad skill areas sampled from a variety of textbooks, syllabi, and the judgments of curriculum experts.
Each skill is, usually, tested by less than four items. Items vary in difficulty. Items are selected that discriminate between high and low achievers. Each individual is compared with other examinees and assigned a score--usually expressed as a percentile, a grade equivalent score, or a stanine. Student achievement is reported for broad skill areas, although some norm-referenced tests do report student achievement in specific sub-areas.
The Bell Curve Standard Divations 2% 14% 34% 34% 14% 2%
Raw Score (RS) A raw score is the number of points earned from correct answers on a properly scored test. The RS should not be used directly in interpretation.
Percentile Rank Not to be confused with percentages, percentiles rank individuals within a group. Percentiles, defined on a scale of 1 to 99 with 50 being average (mean). This shows the percentage of scores in the group that are at or below a specific student's score.
Grade Equivalency Scores The first digit represents the year of the grade level and the second represents the month of that grade level. It is a misinterpretation of the GE to interpret it as an estimate of the grade in which a student should be placed. “If Mary, a second grader, made a GE of 4.7, her score is the same as the average score made by the students in the seventh month of the fourth grade on the same second-grade test that Mary took.”
Standard Nine (Stanine) Scores Show a comparison of student scores ranging from a low of 1 to a high of 9. Scores of are considered average.
Reliability The degree to which the test yields consistent results.
Validity Degree to which the test measures what it is supposed to measure
Types of Validity 1. Content Validity: Does the test reflect the area to be tested? (If it is a comprehension test, is there a vocabulary portion?) 2. Criterion Validity: a) Predictive - how well does the test predict future performance in the area tested?; b) Concurrent - How close do the results match tests that measure similar competencies? 3. Construct Validity: How closely do the items on the test match what you believe about the area tested? Does it match your philosophy of reading? 4. NEW!!!!!!! Consequential Validity: To what use will the test results be put and is it a fair use of these results?
Standard Error of Measurement Variability of the score if given to the same person a second time? (If the SEM is + 5 on a 50 point test then there is a 2/3 chance that your score the second time would fall between 45 and 50
Comparison – Norm / Criterion Referenced Tests