Analyzing test data using Excel Gerard Seinhorst STANAG 6001 Testing Workshop 2018 Workshop C1 Kranjska Gora, Slovenia
WORKSHOP OBJECTIVES Understand how to describe and analyze test data, and draw conclusions from it Understand how item analysis can help you to make informed decisions about selection of items for the final test Take away any potential fear of statistics
ANALYSIS OF TEST DATA TEST ANALYSIS - Describing/analyzing test results and test population Measures of Central Tendency Measures of Dispersion Reliability estimates ITEM ANALYSIS - Describing/analyzing individual item characteristics Item Difficulty Item Discrimination Distractor Efficiency Descriptive Statistics
TEST ANALYSIS: Describing/Analyzing test results Measures of Central Tendency Gives us an indication of the typical score on a test Answers questions such as: In general, how did the test takers do on the test? Was the test easy or difficult for this group? How many test takers passed the test? Statistics: 1) Mean - average score 2) Mode - most frequent score 3) Median - middle point in a rank-ordered set of scores
TEST ANALYSIS: Describing/Analyzing test results Measures of Central Tendency When the mean, mode and median are all very similar, we have a “normal distribution” of scores (bell-shaped curve)
TEST ANALYSIS: Describing/Analyzing test results Measures of Central Tendency When the mean, mode and median are all very similar, we have a “normal distribution” of scores (bell-shaped curve) When they are not similar, the results are ‘skewed’
TEST ANALYSIS: Describing/Analyzing test results Measures of Dispersion Gives us an indication of how similar or spread out the scores are Answers questions such as: How much difference is there between the highest and lowest score? How similar were the test takers’ results? Are there any extreme scores (‘outliers’)? Statistics: 1) Range - difference between highest and lowest score 2) Standard Deviation - average distance of the scores from the mean
TEST ANALYSIS: Describing/Analyzing test results Standard Deviation (SD, s.d. or ) Low SD: scores are mostly close to the mean High SD: scores are more widely dispersed Example scores of test 1: 48, 49, 50, 51, 52 scores of test 2: 10, 20, 40, 80, 100 MEAN of both tests (250:5) = 50 RANGE test 1 (52 minus 48) = 4 test 2 (100 minus 10) = 90 STANDARD DEVIATION test 1 = 1.58 test 2 = 38.73
Bar Chart – Normal distribution
Bar Chart – Negatively skewed distribution
Bar Chart – Positively skewed distribution
ITEM ANALYSIS: Analyzing item characteristics Item analysis answers questions such as: How do individual items contribute to the total test results? How difficult was each item? Were some items too difficult or too easy? How effective were the distractors? Were there any items which more low achievers answered correctly than high achievers? Statistics: 1) Facility Value - percentage of test takers who answered the item correctly - measure of how well an item differentiates between high and low achievers 2) Discrimination Index - percentage of the test takers who chose a particular distractor; measure of how well a distractor is functioning 3) Distractor Efficiency
FACILITY VALUE (FV) FV is the item difficulty: the number of test takers that answered an item correctly, divided by the total number of test takers Example: if 14 out of 20 test takers answered item Y correctly, the FV of item Y is 14/20 = 70% or 0.70 Ranges from 0.00 to 1.00. The higher the FV, the easier the item. Recommended item difficulty is between 30% and 70% (between 0.30-0.70) Items with difficulties less than 30% or more than 85% have usually low discrimination and should either be revised or replaced. An exception might be at the beginning of a test where easier items (85% or higher) may be desirable. Items with a FV of less than 30% may be used to establish the ceiling of language ability.
ITEM DISCRIMINATION (DI) The degree to which test takers with high overall test scores also got a particular item correct Indicates how well an item distinguishes between high achievers and low achievers Calculation: (FVupperFVlower) FV upper group (1/3 of test takers with the highest test score) minus FV bottom group (1/3 of test takers with the lowest test score) Ranges from -1.00 to +1.00 Optimal values: .40 and above very good item .30 - .39 reasonably good item, possibly room for improvement .20 - .29 acceptable, but needing improvement <.20 poor item, to be rejected or revised
RELATIONSHIP BETWEEN FV AND DI ITEM TOP GROUP n=10 MIDDLE GROUP BOTTOM GROUP FV DI Item 1 10 (1.0) 100% 0.0 Item 2 8 (0.8) 93% +0.2 Item 3 4 (0.4) 80% +0.6 Item 4 1 (0.1) 70% +0.9 Item 5 (0.0) 66% +1.0 Item 6 5 (0.5) 50% Item 7 33% Item 8 9 (0.9) 30% Item 9 6 (0.6) 20% Item 10 2 (0.2) 6% Item 11 0% Item 12 -0.2 Item 13 -0.4 Item 14 -1.0
DISTRACTOR ANALYSIS Distractor Efficiency is the degree to which a distractor worked as intended, i.e., attracting the low achievers, but not the high achievers. The Distractor Efficiency is the number of test takers that selected that particular distractor, divided by the total number of test takers Example: A distractor that is chosen by less than 7% of the test takers (less than 0.07) is normally not functioning well and should be revised. However, bear in mind that the easier the item, the lower the distractor efficiency will be. Item # A * B C D Omitted 14 140 2 12 46 % selected 70% 1% 6% 23% 0%
OPTIMAL VALUES STATISTIC OPTIMAL VALUE Limitation TEST ANALYSIS ITEM ANALYSIS Mean, mode, median N/A * is affected by test taker ability, should be interpreted in relation to max. possible score Range N/A * SD N/A * FV 0.30 - 0.70 Depends on test population, test type/purpose DI > 0.40 Is affected by range of test takers’ ability Indicates only how often a distractor was chosen, not if it was chosen by a high achiever or low achiever Distractor Efficiency ≥ 0.07 * Note: Descriptive statistics do not have an optimal value – they merely describe and summarize test or population characteristics without one value a priori being ‘better’ than another
SMALL-GROUP WORK
SMALL-GROUP WORK 4 Activities Objectives hands-on practice to describe and analyze test data, and draw conclusions from it use item analysis to make informed decisions about the selection of items for the final test
SMALL-GROUP WORK - Instructions Each group should have a handout, a laptop with MS Excel, and at least one group member who is familiar with MS Excel The data files can be found on the flash drive or at Google Drive: https://drive.google.com/drive/folders/1uP2JhlLtmWwsp- rVl5bfMACQRN5A-zz6?usp=sharing Work on the activities until 15.30hrs Ask for help when needed Take a 20-minutes coffee break around 14.50hrs At 15.30hrs discussion of findings in plenary
Remember… Numbers are like people: torture them enough and they’ll tell you anything. ANONYMOUS
Using MS Excel to calculate descriptive statistics Number of test takers (n) =COUNT(range[student ID]) Number of test items (k) =COUNT(range[item#1…#40]) Mean =AVERAGE(range[total score]) Mode =MODE(range[total score]) Median =MEDIAN(range[total score]) Highest score =MAX(range[total score]) Lowest score =MIN(range[total score]) Range =MAX(range[total score])-MIN(range[total score]) Standard Deviation =STDEV.P(range[total score]) NB: (range) = the group of cells with data you want to analyze, e.g. (B3:B120)