Daniel Muijs Saad Chahine

Daniel Muijs Saad Chahine
Systemic Use of Value-Added and Item Response Models for Accountability Daniel Muijs Saad Chahine

University of Southampton
Introduction Growth of use of student attainment data in education systems worldwide, both for accountability and school improvement purposes Requires high quality data, leading to increased use of statistical models, such as Value-Added and Item Response Theory Question then is what a high quality model looks like? Daniel Muijs University of Southampton

Objectives How are VA and IRT models used in international education systems, both in terms of accountability and school improvement? To what extent do such systems use the models in a technically valid and reliable way? What are the consequences of the use of these models in these disparate education systems? Daniel Muijs University of Southampton

Definitions Value Added Assessment (VAA): assessment and mathematical modelling used to measure change over time. Used for School Improvement Used for Accountability Value Added Models (VAMs) are the statistical models used to estimate the growth in learning. Data can be analysed at various levels (e.g., teachers or students, schools, school board) and may or may not include indicators such as background variables of students and communities. Item Response Models are latent trait based measurement models which assume scores on a test item is determined by two factors: Item Difficulty and Person Ability Daniel Muijs University of Southampton

Methodology Systematic literature review and case studies 6 step process: Defining the research questions and the inclusion criteria for selecting papers. Key criteria were: Does the paper provide full information about methods used, including variables entered into the models and assumptions underlying them? Does the paper provide full information on population, sample and conditions of use of any models and instruments used? Does the paper employ methods that are practical and applicable to real-world problems and situations? Searching the population of articles. Used academic, official and ‘grey’ literature Daniel Muijs University of Southampton

Methodology 6 step process: Selecting articles for inclusion using scoring system (met, partly met, not met) for each criterion above which allowed us to more objectively select papers for inclusion. Look for any risk of potential bias in the articles selected, this both with respect to methodology and omission of confounding factors and unintended consequences. A three-point scale was used. Interpret findings, taking into account the importance of context and the practical implications Select cases for in-depth study, based on three criteria: Maximum variation to include as diverse a range of cases as possible; Quality of evidence as determined through the procedures above; Transferability to contexts other than where the case took place. Six cases were selected Daniel Muijs University of Southampton

Case study 1: Using IRT in a national testing system: The SIMCE system in Chile National assessment model for Chile Since 2000, Item Response Modelling has been used to model the SIMCE tests. 3-pl model used The test for each subject is composed of 4 forms. Forms are equated through anchor items Part of simple VA accountability model (SNED). Data published for school choice Formative role through school reports Seen as reliable and valid, though not necessarily used as intended and VA model simplistic Daniel Muijs University of Southampton

Case study 2: The CITO Pupil Following System in Dutch Primary Education Battery of tests undertaken by schools in the Netherlands. 85% of Dutch primary schools take part in the central test at end of primary school. 75% do additional tests in Y5, Y6 and Y7. Value-added results are published for each school based on the final test 1-pl IRT model used for test development and equating purposes VA model incorporates parental SES and pupil characteristics such as ethnicity as well as the prior attainment tests. Used as part of inspectorate accountability system Used formatively through individual pupil reports which allow target- setting and school report for self-evaluation Tests are reliable and valid, though accountability and pupil advice uses controversial Daniel Muijs University of Southampton

Case study 3: Value-added systems in English secondary education
Range of systems exist, following pioneering work at CEM-centre in 1980’s National Value-Added system for accountability introduced in 1999, discontinued in 2010, though individual schools still receive VA data Range of NG initiatives including CEM-Centre and FFT National system used VA and CVA, based on multilevel models but not IRT CEM-centre does use IRT in developing its baseline tests Data used for pupil target-setting and school improvement England is at forefront of data use, but national tests and systems are not always reliable and stable Daniel Muijs University of Southampton

Case study 4: Value-added accountability at the Teacher Level: The Tennessee VAAS Oldest VA system in US, started in 92 (Tennessee Education Improvement Act) VA at teacher level as well as school Based on TCAP tests designed using 3-pl IRT model and aligned to Tennessee curriculum Saunder’s Layered Fixed Effects Model (purely test based) Used as part of teacher evaluation (35% of total evaluation score) TCAP tests seen as reliable and valid, VA model statistically sophisticated Criticism of use of data for teacher evaluation and lack of model transparency Daniel Muijs University of Southampton

Case study 5: A new kid on the block: The New York Value Added model
Relatively recent system Multilevel modelling based VA model that includes contextual variables Based on standardised tests developed using 3-pl model and aligned to NYC curriculum Used for accountability at school and teacher levels, and school improvement through provision of reports to schools Individual data controversially published, and system seen as lacking transparency Daniel Muijs University of Southampton

Case study 6: International studies and their use of measurement
Two studies looked at: TIMSS/PIRLS and PISA TIMSS/PIRLS: mix of tests and surveys. Tests developed using 2-pl and 3-pl IRT models (depending on item format). Matrix sampling and IRT equating used, resulting in plausible values approach to developing scores PISA: mix of tests and surveys. Tests constructed using the Rasch model. Uses multiple forms and plausible values approach Both studies designed to very high technical standards and unique in breadth and depth of comparative international data Criticism of possibility of valid comparisons, and uses made of studies in educational policy contexts Daniel Muijs University of Southampton

Conclusion VA models increasingly combined with IRT for test development and equating Typically multiple goals: accountability and school improvement, and sometimes pupil advise and target setting. This can be problematic. VA/IRT models are more reliable in terms of accountability than others, but… Developing VA/IRT models entails choices: What variables to include? What levels to include? What to test? Daniel Muijs University of Southampton

Conclusion Accountability often based on overly simplistic simple scores w/o acknowledgement of uncertainty Effective use of models requires: Clear and accessible data presentation Training for users Most useful when pupil-level data is provided as well as school and/or teacher level Complex, but valuable! Daniel Muijs University of Southampton

Daniel Muijs Saad Chahine

Similar presentations

Presentation on theme: "Daniel Muijs Saad Chahine"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Daniel Muijs Saad Chahine

Similar presentations

Presentation on theme: "Daniel Muijs Saad Chahine"— Presentation transcript:

Similar presentations

About project

Feedback