Lies, Damned Lies & Statistical Analysis for Language Testing Stephen Walker UECA Assessment Symposium, Saturday, 14 July 2018
Hands up if you know what these mean? Dichotomous Vs Polytomous items P values Point-biserial correlations CTT Vs IRT
1 2 3 4 Presentation Aims Why do you need to do statistical analysis? How do you actually do it? 3 What information do you get? 4 How do you use the results?
Why do we need to do statistical analysis?
An Art and a Science “…good test developers and creative item writers are probably born rather than trained.” Charles Alderson
Statistical Analysis is… an absolutely essential, but often the most misunderstood step in developing a defensible test…
Numbers … - reveal how well items & tests work, or don’t work, and lead to an understanding of why provide feedback to test designers & item writers; as teachers we know the value of feedback to learning - are to applied statistics what language is to applied linguistics - help to make the results of tests meaningful and useful to test users
How do you actually do it?
Prepare the Data
Get yourself a Matrix Not that kind of Matrix
This kind of Matrix! Student ID Item 1 Item 2 Item 3 Item 4 Item 5 12345678 C D A B E F 98754321 11111111 22222222 G 33333333 44444444 55555555 66666666 77777777 88888888 99999999 10101010
A Control File Contains the answers Tells the software what to do Looks something like this
Get some software
What information can we get from different analyses?
P value P value = item difficulty = item facility = item easiness - the probability that examinees will get an item correct - to calculate P value, count the number of test takers who got it right and divide be the total number of test takers - the result is a proportion, like a percentage but on a 0-1 scale rather than 0-100
P value Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Ann Tony Jim Ruth Hong P Value 0.0 0.2 0.4 0.6 0.8 1.0 Everyone got Item 6 right. It’s very easy for these test takers. It’s P Value is 1.0 5÷5=1 This approach to calculating difficulty is sample-dependent. If we had a different sample of people, the statistics could be quite different. Only 1 person got this right. It’s difficult for these test takers. It’s P Value is 0.2 1÷5=0.2
P value interpretation Range Possible Interpretation Notes 0.0-0.3 Too difficult Your item might be mis-keyed or have other issues so need to be checked 0.3-0.7 Difficult to moderately difficult Test takers are finding items in this range challenging 0.7-0.9 Moderately easy Most test takers are getting these items correct 0.9-1.0 Too easy These items are too easy to provide much info on examinees, and can be detrimental to reliability.
Rpbis - point-biserial correlation Measures how well items differentiate between high and low ability test takers Ranges from -1.0 to 1.0 Items which discriminate well have higher Rpbis values but rarely above 0.5 A negative Rpbis means high-ability test takers answer incorrectly while those of low ability answer correctly. Usually indicates that the specified answer is actually wrong! 0.0-0.1 no to little discrimination (noise) Rpbis and P value are considered together
Rpbis value interpretation 0.20+ = Good items - higher ability test takers tend to get these items correct 0.10-0.20 = maybe OK item - review it 0.0-0.10 = Problems suggested - revise or replace <0.0 = Problematic items- replace NB: if the correct answer has a negative Rpbis and a distractor has a positive Rpbis the distractor is probably correct
Using the results within the test development cycle?
UQ-ICTE Reading & Listening Test Development Cycle
Pre-test Review Meeting Item writer team should be involved Use common wrong answers, item analysis results, pilot-test, script for listening tests, and the answer key and meet somewhere to discuss
Don’t forget to show your examples here Stephen!
Decisions made in Pre-test review Which items should be cut because they are too easy or too hard for these learners? Which items should be re-written? Which distractors are not tempting or too tempting because they are actually correct (double keys)? Are test takers lost?
I hope this presentation encourages you to: - use statistics as a tool to help you understand your own tests - produce better tests with evidence to support any claims made - explain to others why piloting & statistical analysis are an essential part of reliable test development - do the analysis yourself along with those involved in the test development cycle
Thank you Stephen Walker, Academic Manager: Assessment E: s.walker@icte.uq.edu.au T: (07) 3346 6770