Download presentation
Presentation is loading. Please wait.
Published byMiranda McKenzie Modified over 9 years ago
1
Maintaining, adjusting and generalizing standards and cut-scores Robert Coe, Durham University Standard setting in the Nordic countries Centre for Educational Measurement, University of Oslo (CEMO) Oslo, 22 September 2015 @ProfCoe Slides available at: www.twitter.com/ProfCoe
2
∂ Different meanings of ‘standards’ Does it test something sensible? Is the content complex & extensive? Are the questions hard? Has the design/development followed the rules? Has it been marked properly? Are the scores reliable? Does it actually measure the intended construct? Can the outcomes (grades/scores) be used as desired? Does a particular cut-point indicate the same as –Its ‘equivalent’ in previous versions –Some kind of equivalent in other assessments –Some specified level of performance 2
3
∂ In England we are worried about Standards over time Standards across qualifications, subjects, or specifications within the same broad qualification Standards between awarding organisations, or assessment processes Standards across countries Standards between groups of candidates (e.g. males/females, rich/poor) 3
4
∂ Comparability (Newton, 2010) Candidates who score at linked (grade boundary) marks must be the same in terms of … –the character of their attainments (phenomenal) –the causes of their attainments (causal) –the extent to which their attainments predict their future success (predictive) 4
5
∂ Comparability (Coe, Newton & Elliott, 2012) Any rational claim about the comparability of grades in different qualifications amounts to a claim that those grades can be treated as interchangeable for some purpose or interpretation. We should talk about the comparability of grades or scores (rather than of qualifications), since these are the outcomes of an assessment that are interpreted and used. Most interpretations of a grade achieved in an examination relate directly to the candidate. In other words, we are interested in what the grade tells us about the person who achieved it, and inferring characteristics of the person from the observed performance. Any claim about interchangeability relates to a particular construct. 5
6
∂ Test development and standards Theoretical Specify the construct Develop the assessments to measure it Use equating/linking procedures to link key cut-points Candidates with linked scores are equivalent (wrt the construct) Pragmatic Assessments evolve, shaped by –Explicit constructs –Past practice –User requirements (wide range of different uses & purposes) –Political drivers –Pragmatic constraints Comparability defined by public opinion (Cresswell, 1996, 2012) 6
7
∂ An integration: rational and pragmatic Consider the different ways exam results are used (interchangeably) Identify an implied construct for each (in terms of which they are interchangeable) Develop a defensible method for minimising unfairness and undesirable behaviour that results from these interchangeability requirements 7
8
∂ Use / interpretationImplied construct Interchangeability requirement 1The claim by teachers in the 2012 GCSE English dispute that students who met the criteria deserve a C The grade indicates specific competences within the subject domain that have been demonstrated on the assessment occasion. Performance judged to meet the same ‘criteria’ gets the same grade on different occasions, specifications, boards 2The use of a B in GCSE maths as a filter for A level study in maths. The grade indicates specific competences within the subject domain that the candidate is likely to be able reproduce in the future. Grades (across occasions, specifications, boards) represent the same level of the construct (mathematics) 8
9
∂ Use / interpretationImplied construct Interchangeability requirement 3The use of ‘5A*-C EM’ (at least 5 grade Cs inc Eng & math) at GCSE as a filter for any A level study The grade indicates competences transferable to other academic study that the candidate is likely to be able reproduce in the future. Grades achieved in different combinations of subjects and other allowable qualifications must be equivalent in terms of their predictions for subsequent academic outcomes. 4Employers requiring job applicants to have ‘5A*-C EM’. The grade indicates competences transferable to employment contexts that the candidate is likely to be able reproduce in the future. Grades achieved in maths and English (across occasions, specifications, boards) must predict the same level of relevant, reproducible workplace competences. 9
10
∂ Use / interpretationImplied constructInterchangeability requirement 5Use of GCSE results in league tables to judge schools Average grades for a class or school (especially if referenced against prior attainment) indicate the impact (and hence quality) of the teaching experienced. Grades achieved in different combinations of subjects and other allowable qualifications must be equivalent in terms of some measure of the teaching (quality and quantity) that is typically (after controlling for pre- existing or irrelevant differences) associated with those outcomes. 6Comparison of GCSE results of different types of school to justify impact of policy. Average grades across the jurisdiction indicate the impact (and hence quality) of the system’s schooling provision. As in 5 10
11
∂ Grade C in GCSE French could be made comparable to the same grade In French in previous years (or parallel specifications), in terms of what is specified to be demonstrated In French in previous years (or parallel specifications), or in other languages, in terms of the candidate’s ability to communicate in the target language In other (academic) subjects, in terms of their prediction of subsequent attainment in other (academic) subjects In other subjects, in terms of how hard it is to get students to reach this level 11
12
∂ It follows that … We cannot talk about standards (setting or maintaining) until we decide which of these uses/interpretations we want to support In at least some cases the different uses/interpretations will be incompatible If we want the ‘standard’ to be captured in the outcome (score/grade) we have to prioritise (or optimise) Alternatively, we can use different equivalences for different uses 12
13
∂ A level data
14
∂ 14
15
A taxonomy of standard setting and maintaining methods (From Coe & Walker, 2013) 15
16
∂ Judgement-based methods Criterion-based judgement –Judgement against specific competences –Judgement against overall grade descriptors Item-based judgement –Angoff method –Bookmark method Comparative judgment –Cross-moderation –Paired comparison Judgement of demand –CRAS (complexity, resources, abstractness, strategies) 16
17
∂ Equating methods Classical equating models –Linear equating –Equipercentile equating IRT equating –Rasch model –Other IRT models Equating designs –Equivalent groups –Common persons –Common items 17
18
∂ Linking/comparability methods Reference/anchor test –Concurrent –Prior Common candidate methods –Subject pairs –Subject matrix –Latent trait Pre-testing designs (when high-stakes & released) –Live testing with additional future trial test items –Random future test versions within live testing –Low-stakes pre-testing two versions in counterbalanced trial –Low-stakes pre-testing with an anchor test Norm/cohort referencing –Pure cohort referencing –Adjusted cohort referencing 18
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.