Measurement Problems within Assessment: Can Rasch Analysis help us? Mike Horton Bipin Bhakta Alan Tennant.

Slides:



Advertisements
Similar presentations
Introduction to IRT/Rasch Measurement with Winsteps Ken Conrad, University of Illinois at Chicago Barth Riley and Michael Dennis, Chestnut Health Systems.
Advertisements

Implications and Extensions of Rasch Measurement.
DIF Analysis Galina Larina of March, 2012 University of Ostrava.
Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
Part II Sigma Freud & Descriptive Statistics
Part II Sigma Freud & Descriptive Statistics
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Item Response Theory in Health Measurement
Introduction to Item Response Theory
IRT Equating Kolen & Brennan, IRT If data used fit the assumptions of the IRT model and good parameter estimates are obtained, we can estimate person.
AN OVERVIEW OF THE FAMILY OF RASCH MODELS Elena Kardanova
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
Multiple Linear Regression Model
VERTICAL SCALING H. Jane Rogers Neag School of Education University of Connecticut Presentation to the TNE Assessment Committee, October 30, 2006.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Psych 231: Research Methods in Psychology
Chapter 12 Inferring from the Data. Inferring from Data Estimation and Significance testing.
Research Methods in MIS
© UCLES 2013 Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh.
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Why Scale -- 1 Summarising data –Allows description of developing competence Construct validation –Dealing with many items rotated test forms –check how.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Measurement and Data Quality
EPSY 8223: Test Score Equating
Identification of Misfit Item Using IRT Models Dr Muhammad Naveed Khalid.
Item Response Theory Psych 818 DeShon. IRT ● Typically used for 0,1 data (yes, no; correct, incorrect) – Set of probabilistic models that… – Describes.
Item Response Theory for Survey Data Analysis EPSY 5245 Michael C. Rodriguez.
Item Response Theory. What’s wrong with the old approach? Classical test theory –Sample dependent –Parallel test form issue Comparing examinee scores.
Reporting item response theory results Jeffrey B. Brookings Wittenberg University Presented at the SAMR/SWPA Symposium: Handy tips for communicating and.
The College Board: Expanding College Opportunity The College Board is a national nonprofit membership association dedicated to preparing, inspiring, and.
Measurement 102 Steven Viger Lead Psychometrician
Kaizen–What Can I Do To Improve My Program? F. Jay Breyer, Ph.D. Presented at the 2005 CLEAR Annual Conference September Phoenix,
Prototypical Level 4 Performances Students use a compensation strategy, recognizing the fact that 87 is two less than 89, which means that the addend coupled.
Modern Test Theory Item Response Theory (IRT). Limitations of classical test theory An examinee’s ability is defined in terms of a particular test The.
TESTS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Is the Force Concept Inventory Biased? Investigating Differential Item Functioning on a Test of Conceptual Learning in Physics Sharon E. Osborn Popp, David.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
Evaluating Measurement Equivalence between Hispanic and Non-Hispanic Responders to the English Form of the HINTS Information SEeking Experience (ISEE)
Research methods in clinical psychology: An introduction for students and practitioners Chris Barker, Nancy Pistrang, and Robert Elliott CHAPTER 4 Foundations.
1 EPSY 546: LECTURE 1 SUMMARY George Karabatsos. 2 REVIEW.
Scaling and Equating Joe Willhoft Assistant Superintendent of Assessment and Student Information Yoonsun Lee Director of Assessment and Psychometrics Office.
University of Ostrava Czech republic 26-31, March, 2012.
Estimation. The Model Probability The Model for N Items — 1 The vector probability takes this form if we assume independence.
Item Factor Analysis Item Response Theory Beaujean Chapter 6.
Latent regression models. Where does the probability come from? Why isn’t the model deterministic. Each item tests something unique – We are interested.
Item Response Theory in Health Measurement
FIT ANALYSIS IN RASCH MODEL University of Ostrava Czech republic 26-31, March, 2012.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
Chapter 6 - Standardized Measurement and Assessment
Rating Scale Examples. A helpful resource
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
 Youth Teasing and Bullying are a major public health problem  ~20% of youths report being bullied or bullying at school in a given year  160,000.
Ming Lei American Institutes for Research Okan Bulut Center for Research in Applied Measurement and Evaluation University of Alberta Item Parameter and.
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
Using Psychometric Analysis to Drive Mathematics Standardized Assessment Decision Making Mike Mazzarella George Mason University.
Using Rasch modeling to investigate the psychometric properties of the OSCE = 51.86* * *0.2 Aim To present a prototype of a validated.
Lesson 2 Main Test Theories: The Classical Test Theory (CTT)
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
IRT Equating Kolen & Brennan, 2004 & 2014 EPSY
Nonequivalent Groups: Linear Methods Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2 nd ed.). New.
Measurement: A Rasch Analysis of Malaysian Automotive Quality Management-Cost of Quality Scale (MAQM-CoQ Scale) Muhammad Shahar Hj Jusoh , PhD Rushami.
Adopting The Item Response Theory in Operations Management Research
Introduction to the Validation Phase
Classical Test Theory Margaret Wu.
Evaluation of measuring tools: reliability
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Rating Scale Examples.
Presentation transcript:

Measurement Problems within Assessment: Can Rasch Analysis help us? Mike Horton Bipin Bhakta Alan Tennant

Mental Arithmetic - Test = = 2 x 3= 2 x 3= 18 ÷ 2 = 18 ÷ 2 = Arithmetic is one of the ‘3 R’s’. True or False? = Arithmetic is one of the ‘3 R’s’. True or False? = 17 x 13 = 17 x 13 =

Mental Arithmetic - Test 1 - Answers = = 9 2 x 3= 6 2 x 3= 6 18 ÷ 2= 9 18 ÷ 2= 9 Arithmetic is one of the ‘3 R’s’. True or False? = True Arithmetic is one of the ‘3 R’s’. True or False? = True 17 x 13 = x 13 = 221

Assumptions underpinning test score addition All questions must be mapping onto the same underlying construct All questions must be mapping onto the same underlying construct UnidimensionalityUnidimensionality All questions must be unbiased between groups All questions must be unbiased between groups Item Bias  Differential Item Functioning (DIF)Item Bias  Differential Item Functioning (DIF) Raw score is a sufficient statistic Raw score is a sufficient statistic

Mental Arithmetic - Test 1 – Potential Problems = = 9 2 x 3= 6 2 x 3= 6 18 ÷ 2= 9 18 ÷ 2= 9 Arithmetic is one of the ‘3 R’s’. True or False? = True Arithmetic is one of the ‘3 R’s’. True or False? = True 17 x 13 = x 13 = 221

Mental Arithmetic - Test 1 – Potential Problems = = 9 2 x 3= 6 2 x 3= 6 18 ÷ 2= 9 18 ÷ 2= 9 Arithmetic is one of the ‘3 R’s’. True or False? = True Arithmetic is one of the ‘3 R’s’. True or False? = True 17 x 13 = x 13 = 221 Plus: Item Bias -Gender DIF has been shown to be a particular problem in mathematics exams. (e.g. Scheuneman & Grima, 1997., Lane et al. 1996)

Mental Arithmetic - Test 2 17 x 13 = 17 x 13 = 47 x 64= 47 x 64= 768 ÷ 16= 768 ÷ 16= 53 2 = 53 2 = 7 3 = 7 3 =

Mental Arithmetic - Test 2 - Answers 17 x 13 = x 13 = x 64= x 64= ÷ 16= ÷ 16= = = = = 343

Assumptions of Test Equating (Holland & Dorans, 2006) Tests measure the same characteristic Tests measure the same characteristic Tests measure at the same level of difficulty Tests measure at the same level of difficulty Tests measure with the same level of accuracy Tests measure with the same level of accuracy

Requirements of Test Equating (Dorans & Holland, 2000) The tests should measure the same construct The tests should measure the same construct The measures from the tests should have the same reliability The measures from the tests should have the same reliability The function used to equate measures from one test to another should be inversely symmetrical The function used to equate measures from one test to another should be inversely symmetrical Examinees should be indifferent about which of the equated test forms will be administered Examinees should be indifferent about which of the equated test forms will be administered The function for equating tests should be invariant across subpopulations of examinees The function for equating tests should be invariant across subpopulations of examinees

Are these elements currently assessed? Unidimensionality – is assumed on face validity Unidimensionality – is assumed on face validity Cronbach’s alphaCronbach’s alpha Exam Difficulty Equivalence Exam Difficulty Equivalence Subjective proceduresSubjective procedures Classical Test Theory = sample dependentClassical Test Theory = sample dependent

What is Rasch Analysis?

Mesa Press, Chicago 1980

Rasch Analysis The Rasch model is a probabilistic unidimensional model The Rasch model is a probabilistic unidimensional model the easier the question the more likely the correct responsethe easier the question the more likely the correct response the more able the student, the more likely the question will be passed compared to a less able student.the more able the student, the more likely the question will be passed compared to a less able student. The model assumes that the probability that a student will correctly answer a question is a logistic function of the difference between the student's ability and the difficulty of the question The model assumes that the probability that a student will correctly answer a question is a logistic function of the difference between the student's ability and the difficulty of the question Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press, 1980

Assumptions of the Rasch Model Stochastic Ordering of Items Stochastic Ordering of Items Unidimensionality Unidimensionality Local Independence of Items Local Independence of Items

What Would We Expect When These People Meet These Items? Item 1Item 2Item 3 Person 1CorrectIncorrect Person 2Correct Incorrect Person 3CorrectIncorrectCorrect Person 4IncorrectCorrect Person 5Correct Easy Hard Least Most Able

What Would We Expect When These People Meet These Items? Item 1Item 2Item 3 Person 1CorrectIncorrect Person 2Correct Incorrect Person 3CorrectIncorrectCorrect Person 4IncorrectCorrect Person 5Correct Easy Hard Least Most Able

What Would We Expect When These People Meet These Items? Item 1Item 2Item 3 Person 1CorrectIncorrect Person 2Correct Incorrect Person 3CorrectIncorrectCorrect Person 4IncorrectCorrect Person 5Correct Easy Hard Least Most Able

What Would We Expect When These People Meet These Items? Item 1Item 2Item 3 Person 1CorrectIncorrect Person 2Correct Incorrect Person 3CorrectIncorrectCorrect Person 4IncorrectCorrect Person 5Correct EasyHard Least Most Able

What Would We Expect When These People Meet These Items? Item 1Item 2Item 3 Person 1CorrectIncorrect Person 2Correct Incorrect Person 3CorrectIncorrectCorrect Person 4IncorrectCorrect Person 5Correct Easy Hard Least Most Able

The Guttman Pattern Total Score

The Rasch Guttman Pattern Total Score

The Probabilistic Rasch Model % 88% 73% 27% 12% 5% Probability of a student’s success on an item Difference (in logits) between the ability of the student and the difficulty of the item student

Rasch Analysis When data fit the model, generalisability of Item difficulties beyond the specific conditions under which they were observed occurs (specific objectivity) When data fit the model, generalisability of Item difficulties beyond the specific conditions under which they were observed occurs (specific objectivity) In other words… In other words… Item Difficulties are not sample dependent as they are in Classical Test Theory Item Difficulties are not sample dependent as they are in Classical Test Theory

What Else Does Rasch Offer us? When data fit the Rasch Model, the assumptions of summation are met When data fit the Rasch Model, the assumptions of summation are met All questions must be mapping onto the same underlying constructAll questions must be mapping onto the same underlying construct All questions must be unbiased between groups (DIF)All questions must be unbiased between groups (DIF) Raw score is a sufficient statisticRaw score is a sufficient statistic We can then test for other things We can then test for other things Quality of DistractorsQuality of Distractors It gives us the mathematical basis to compare test scores via equating It gives us the mathematical basis to compare test scores via equating

Limitations of Rasch Analysis The model tests the internal psychometric properties The model tests the internal psychometric properties The model assumes unidimensionality The model assumes unidimensionality The model cannot set standards The model cannot set standards

Summary The Rasch model offers a unified framework under which all of the assumptions can be tested together The Rasch model offers a unified framework under which all of the assumptions can be tested together It gives us a lot of information about individual items which can be utilised to ensure that item and test construction is of a high quality It gives us a lot of information about individual items which can be utilised to ensure that item and test construction is of a high quality It provides a rigorous mathematical basis for test equating It provides a rigorous mathematical basis for test equating

References Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press, 1980 Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press, 1980 Dorans NJ & Holland PW. Population invariance and the equatability of tests: Basic theory and the linear case. Journal of Educational Measurement, 2000: 37; Dorans NJ & Holland PW. Population invariance and the equatability of tests: Basic theory and the linear case. Journal of Educational Measurement, 2000: 37; Holland PW & Dorans NJ. Linking and Equating. In RL Brennan (Ed.), Educational Measurement (4 th ed., p ). Westport, CT: American Council on Education and Praeger Publishers, Holland PW & Dorans NJ. Linking and Equating. In RL Brennan (Ed.), Educational Measurement (4 th ed., p ). Westport, CT: American Council on Education and Praeger Publishers, Scheuneman JD & Grima A. Characteristics of quantitative word items associated with differential item functioning for female and black examinees. Applied Measurement in Education, 1997; Scheuneman JD & Grima A. Characteristics of quantitative word items associated with differential item functioning for female and black examinees. Applied Measurement in Education, 1997; Lane S, Wang N, Magone, M. Gender-Related Differential Item Functioning on a Middle- School Mathematics Performance Assessment. Educational Measurement, 1996: 15(4); Lane S, Wang N, Magone, M. Gender-Related Differential Item Functioning on a Middle- School Mathematics Performance Assessment. Educational Measurement, 1996: 15(4); 21-27