46-320-01 Tests and Measurements Intersession 2006
Writing Items DeVellis (1991) Cultural/ethnic sensitivity Define Item Pool Avoid long items Appropriate level of reading Avoid double-barreled items Mix positively and negatively worded items Cultural/ethnic sensitivity
Item Format Dichotomous format Two alternatives Pros: Ease of construction and scoring, absolute judgment Cons: memorization, chance of being correct
Item Format Polytomous format More than two alternatives Pros: less chance guessing, fast time, distractors Corrected scores: Guessing?
Item Format Likert format Category format Visual Analogue scale Degree of agreement Five alternatives vs. six Reverse scoring Category format 10-point scale – why 10? Remember context Visual Analogue scale 100 cm line
Item Format Checklist Q-Sort Usually adjectives Increases options (9) Form normal distribution
Item Analysis Purpose: shorten a test and increase reliability and validity Item difficulty Proportion who get the item correct Probability of chance Optimum level Variable difficulty (0.3 to 0.7) Internal criteria = test score
Discriminability Extreme group method Point Biserial method Discrimination index Negative discriminator Point Biserial method Small test n Higher correlation, better the item
Discrimination Item U (20) M L Difficulty (U+M+L) Discrimination (U-L) 1 15 9 7 31 8 2 20 16 56 4 3 19 18 46 10 11 37 -6 5 13 35 6 14 39
Table Explained Class n = 60 Discrimination: rough index = U – L Item Difficulty: U + M + L Items: 2 = too easy 7 = too difficult 4 & 5 = negative discriminative value
Further Item Analysis Response Options Item Group 1 2 3 4 5 Upper Lower 20 16 10 9 11 7 8
Discrimination Index: Percentages Percent Passing Index of Discrimination (D) Item Upper Lower 1 75 35 40 2 100 80 20 3 95 45 50 4 -30 5 55 6 7 25
Item Characteristic Curve X axis: total test score (trait estimate) Y axis: proportion of test-takers with the item correct Often use class intervals
Discriminability Best scenario
Item Response Theory Each item has an item characteristic curve Specific range of difficulty can be identified with a test characteristic curve Difficulty and discriminability Sample items Peaked conventional vs. rectangular conventional vs. adaptive
Criterion-Referenced Tests Specify objectives – aids learning Give test to two groups Exposed vs. not Antimode – cutting score Any problems with this?
Test Manuals Proprietary - qualifications Nonproprietary Standards for Educational and Psychological Testing *reflects changes in federal law and measurement trends affecting validity testing individuals with disabilities or different linguistic backgrounds new types of tests as well as new uses of existing tests * Taken from apa.org
Test Manuals Should include: Be critical! How to administer (standard conditions) How to score How to interpret Information on reliability, validity, norms Be critical!
Base Rates and Hit Rates What does this test contribute beyond what is already know? Cutting score not necessarily correct decision Hit rate vs. base rate comparison False negatives and false positives
Taylor-Russell Tables What does the test contribute beyond base? Need Definition of success Base rate Selection ratio Test validity coefficient Determines likelihood someone selected on basis of test will succeed
Taylor-Russell Tables Source: Fisher, Schoenfeldt, & Shaw (2003), Table 7.2
Taylor-Russell Tables Best: validity high, selection rate low Bad: validity low, selection rate high Useless: no validity Selecting low scorers?
Incremental Validity Unique information from using a test Predicting future behavior and self-ratings Prediction should consider: Simpler method? Less expensive method? Less subject strain?
Mental Measurements Yearbook Test reviews