A controversy in PISA and other large- scale assessments: the trade-off between model fit, invariance and validity David Andrich CEM: 30 years of Evidence in Education London : 23 September 2014
Program for International Student Assessment - PISA Many uses and misuses e.g. may reject the program e.g. may reject the methodology Will consider one methodological attack here
General assessment plan in PISA To cover the curriculum, multiple booklets (16) with links are used in each country Students do different booklets All countries receive the same booklets Place different booklets on the same scale Use a probabilistic model for this purpose The model estimates are then involved in comparing countries
A methodological attack - DIF Valid comparisons - items should work invariantly among countries (same relative difficulty in all countries) Not invariant - said to have differential item functioning (DIF) If DIF – what can be done about it? If DIF – can comparisons be made valid? It depends!
The presentation 1.Distinguish between causal and index variables 2.Imagine the assessment of physics in multiple domains 3.Set up an idealised assessment design in three countries 4.Illustrate the model used and the concepts of (a) fit to the model (b) DIF 5. Show tension between model fit and validity.
Causal and Index Variables Stenner, A. J., et. al.(2008). Formative and reflective models: Can a Rasch analysis tell the difference? Rasch Measurement Transactions, 22, 1059 – 1060.
Causal and Index variables Causal Example E.G heat– indicated by thermometers Change in heat cause change on the thermometer (i)Same changes on all thermometers (ii)Thermometers are exchangeable Index Example E.G Indicators of SES education, occupational prestige, income, and neighbourhood (i)Change in one indicator does not change other indicators (ii)Indicators not exchangeable
Science proficiency – in light Assessment understanding of light (relatively thin variable) Items of a test related to the curriculum on light Causal variable – understanding of light governs performance on all items of the test Items in principle exchangeable (avoid effect of teaching to the test)
Assess a broad physics construct
Students from three countries Simulation of an idealisation of the PISA controversy All countries of equal proficiency Item difficulties similar in the 5 domains – 8 items each All items administered to all countries Have some DIF by domains
Model and Fit of Item 21 - Sound
Model and DIF, 17 – 24, Sound C1 > C2, C3
Model and Fit Item 29 – Electricity and Magnetism
Model and DIF, 25 – 32, Elec & Mag C3 > C1, C2
Resolve items by country: Sound
Split items by country: Elec and mag.
Summary of Means
Summary: DIF and Interpretation Split on a domain is equivalent to deleting it Most valid interpretation? Depends on source of DIF! Artefact or substantive Cannot be answered only statistically. Understand DIF, test and curriculum implications
Thank You