Measurement Problems within Assessment: Can Rasch Analysis help us? Mike Horton Bipin Bhakta Alan Tennant
Mental Arithmetic - Test = = 2 x 3= 2 x 3= 18 ÷ 2 = 18 ÷ 2 = Arithmetic is one of the ‘3 R’s’. True or False? = Arithmetic is one of the ‘3 R’s’. True or False? = 17 x 13 = 17 x 13 =
Mental Arithmetic - Test 1 - Answers = = 9 2 x 3= 6 2 x 3= 6 18 ÷ 2= 9 18 ÷ 2= 9 Arithmetic is one of the ‘3 R’s’. True or False? = True Arithmetic is one of the ‘3 R’s’. True or False? = True 17 x 13 = x 13 = 221
Assumptions underpinning test score addition All questions must be mapping onto the same underlying construct All questions must be mapping onto the same underlying construct UnidimensionalityUnidimensionality All questions must be unbiased between groups All questions must be unbiased between groups Item Bias Differential Item Functioning (DIF)Item Bias Differential Item Functioning (DIF) Raw score is a sufficient statistic Raw score is a sufficient statistic
Mental Arithmetic - Test 1 – Potential Problems = = 9 2 x 3= 6 2 x 3= 6 18 ÷ 2= 9 18 ÷ 2= 9 Arithmetic is one of the ‘3 R’s’. True or False? = True Arithmetic is one of the ‘3 R’s’. True or False? = True 17 x 13 = x 13 = 221
Mental Arithmetic - Test 1 – Potential Problems = = 9 2 x 3= 6 2 x 3= 6 18 ÷ 2= 9 18 ÷ 2= 9 Arithmetic is one of the ‘3 R’s’. True or False? = True Arithmetic is one of the ‘3 R’s’. True or False? = True 17 x 13 = x 13 = 221 Plus: Item Bias -Gender DIF has been shown to be a particular problem in mathematics exams. (e.g. Scheuneman & Grima, 1997., Lane et al. 1996)
Mental Arithmetic - Test 2 17 x 13 = 17 x 13 = 47 x 64= 47 x 64= 768 ÷ 16= 768 ÷ 16= 53 2 = 53 2 = 7 3 = 7 3 =
Mental Arithmetic - Test 2 - Answers 17 x 13 = x 13 = x 64= x 64= ÷ 16= ÷ 16= = = = = 343
Assumptions of Test Equating (Holland & Dorans, 2006) Tests measure the same characteristic Tests measure the same characteristic Tests measure at the same level of difficulty Tests measure at the same level of difficulty Tests measure with the same level of accuracy Tests measure with the same level of accuracy
Requirements of Test Equating (Dorans & Holland, 2000) The tests should measure the same construct The tests should measure the same construct The measures from the tests should have the same reliability The measures from the tests should have the same reliability The function used to equate measures from one test to another should be inversely symmetrical The function used to equate measures from one test to another should be inversely symmetrical Examinees should be indifferent about which of the equated test forms will be administered Examinees should be indifferent about which of the equated test forms will be administered The function for equating tests should be invariant across subpopulations of examinees The function for equating tests should be invariant across subpopulations of examinees
Are these elements currently assessed? Unidimensionality – is assumed on face validity Unidimensionality – is assumed on face validity Cronbach’s alphaCronbach’s alpha Exam Difficulty Equivalence Exam Difficulty Equivalence Subjective proceduresSubjective procedures Classical Test Theory = sample dependentClassical Test Theory = sample dependent
What is Rasch Analysis?
Mesa Press, Chicago 1980
Rasch Analysis The Rasch model is a probabilistic unidimensional model The Rasch model is a probabilistic unidimensional model the easier the question the more likely the correct responsethe easier the question the more likely the correct response the more able the student, the more likely the question will be passed compared to a less able student.the more able the student, the more likely the question will be passed compared to a less able student. The model assumes that the probability that a student will correctly answer a question is a logistic function of the difference between the student's ability and the difficulty of the question The model assumes that the probability that a student will correctly answer a question is a logistic function of the difference between the student's ability and the difficulty of the question Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press, 1980
Assumptions of the Rasch Model Stochastic Ordering of Items Stochastic Ordering of Items Unidimensionality Unidimensionality Local Independence of Items Local Independence of Items
What Would We Expect When These People Meet These Items? Item 1Item 2Item 3 Person 1CorrectIncorrect Person 2Correct Incorrect Person 3CorrectIncorrectCorrect Person 4IncorrectCorrect Person 5Correct Easy Hard Least Most Able
What Would We Expect When These People Meet These Items? Item 1Item 2Item 3 Person 1CorrectIncorrect Person 2Correct Incorrect Person 3CorrectIncorrectCorrect Person 4IncorrectCorrect Person 5Correct Easy Hard Least Most Able
What Would We Expect When These People Meet These Items? Item 1Item 2Item 3 Person 1CorrectIncorrect Person 2Correct Incorrect Person 3CorrectIncorrectCorrect Person 4IncorrectCorrect Person 5Correct Easy Hard Least Most Able
What Would We Expect When These People Meet These Items? Item 1Item 2Item 3 Person 1CorrectIncorrect Person 2Correct Incorrect Person 3CorrectIncorrectCorrect Person 4IncorrectCorrect Person 5Correct EasyHard Least Most Able
What Would We Expect When These People Meet These Items? Item 1Item 2Item 3 Person 1CorrectIncorrect Person 2Correct Incorrect Person 3CorrectIncorrectCorrect Person 4IncorrectCorrect Person 5Correct Easy Hard Least Most Able
The Guttman Pattern Total Score
The Rasch Guttman Pattern Total Score
The Probabilistic Rasch Model % 88% 73% 27% 12% 5% Probability of a student’s success on an item Difference (in logits) between the ability of the student and the difficulty of the item student
Rasch Analysis When data fit the model, generalisability of Item difficulties beyond the specific conditions under which they were observed occurs (specific objectivity) When data fit the model, generalisability of Item difficulties beyond the specific conditions under which they were observed occurs (specific objectivity) In other words… In other words… Item Difficulties are not sample dependent as they are in Classical Test Theory Item Difficulties are not sample dependent as they are in Classical Test Theory
What Else Does Rasch Offer us? When data fit the Rasch Model, the assumptions of summation are met When data fit the Rasch Model, the assumptions of summation are met All questions must be mapping onto the same underlying constructAll questions must be mapping onto the same underlying construct All questions must be unbiased between groups (DIF)All questions must be unbiased between groups (DIF) Raw score is a sufficient statisticRaw score is a sufficient statistic We can then test for other things We can then test for other things Quality of DistractorsQuality of Distractors It gives us the mathematical basis to compare test scores via equating It gives us the mathematical basis to compare test scores via equating
Limitations of Rasch Analysis The model tests the internal psychometric properties The model tests the internal psychometric properties The model assumes unidimensionality The model assumes unidimensionality The model cannot set standards The model cannot set standards
Summary The Rasch model offers a unified framework under which all of the assumptions can be tested together The Rasch model offers a unified framework under which all of the assumptions can be tested together It gives us a lot of information about individual items which can be utilised to ensure that item and test construction is of a high quality It gives us a lot of information about individual items which can be utilised to ensure that item and test construction is of a high quality It provides a rigorous mathematical basis for test equating It provides a rigorous mathematical basis for test equating
References Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press, 1980 Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press, 1980 Dorans NJ & Holland PW. Population invariance and the equatability of tests: Basic theory and the linear case. Journal of Educational Measurement, 2000: 37; Dorans NJ & Holland PW. Population invariance and the equatability of tests: Basic theory and the linear case. Journal of Educational Measurement, 2000: 37; Holland PW & Dorans NJ. Linking and Equating. In RL Brennan (Ed.), Educational Measurement (4 th ed., p ). Westport, CT: American Council on Education and Praeger Publishers, Holland PW & Dorans NJ. Linking and Equating. In RL Brennan (Ed.), Educational Measurement (4 th ed., p ). Westport, CT: American Council on Education and Praeger Publishers, Scheuneman JD & Grima A. Characteristics of quantitative word items associated with differential item functioning for female and black examinees. Applied Measurement in Education, 1997; Scheuneman JD & Grima A. Characteristics of quantitative word items associated with differential item functioning for female and black examinees. Applied Measurement in Education, 1997; Lane S, Wang N, Magone, M. Gender-Related Differential Item Functioning on a Middle- School Mathematics Performance Assessment. Educational Measurement, 1996: 15(4); Lane S, Wang N, Magone, M. Gender-Related Differential Item Functioning on a Middle- School Mathematics Performance Assessment. Educational Measurement, 1996: 15(4); 21-27