Maintaining, adjusting and generalizing standards and cut-scores Robert Coe, Durham University Standard setting in the Nordic countries Centre for Educational.

Maintaining, adjusting and generalizing standards and cut-scores Robert Coe, Durham University Standard setting in the Nordic countries Centre for Educational Measurement, University of Oslo (CEMO) Oslo, 22 September 2015 @ProfCoe Slides available at: www.twitter.com/ProfCoe

∂ Different meanings of ‘standards’  Does it test something sensible?  Is the content complex & extensive?  Are the questions hard?  Has the design/development followed the rules?  Has it been marked properly?  Are the scores reliable?  Does it actually measure the intended construct?  Can the outcomes (grades/scores) be used as desired?  Does a particular cut-point indicate the same as –Its ‘equivalent’ in previous versions –Some kind of equivalent in other assessments –Some specified level of performance 2

∂ In England we are worried about  Standards over time  Standards across qualifications, subjects, or specifications within the same broad qualification  Standards between awarding organisations, or assessment processes  Standards across countries  Standards between groups of candidates (e.g. males/females, rich/poor) 3

∂ Comparability (Newton, 2010)  Candidates who score at linked (grade boundary) marks must be the same in terms of … –the character of their attainments (phenomenal) –the causes of their attainments (causal) –the extent to which their attainments predict their future success (predictive) 4

∂ Comparability (Coe, Newton & Elliott, 2012)  Any rational claim about the comparability of grades in different qualifications amounts to a claim that those grades can be treated as interchangeable for some purpose or interpretation.  We should talk about the comparability of grades or scores (rather than of qualifications), since these are the outcomes of an assessment that are interpreted and used.  Most interpretations of a grade achieved in an examination relate directly to the candidate. In other words, we are interested in what the grade tells us about the person who achieved it, and inferring characteristics of the person from the observed performance.  Any claim about interchangeability relates to a particular construct. 5

∂ Test development and standards Theoretical  Specify the construct  Develop the assessments to measure it  Use equating/linking procedures to link key cut-points  Candidates with linked scores are equivalent (wrt the construct) Pragmatic  Assessments evolve, shaped by –Explicit constructs –Past practice –User requirements (wide range of different uses & purposes) –Political drivers –Pragmatic constraints  Comparability defined by public opinion (Cresswell, 1996, 2012) 6

∂ An integration: rational and pragmatic  Consider the different ways exam results are used (interchangeably)  Identify an implied construct for each (in terms of which they are interchangeable)  Develop a defensible method for minimising unfairness and undesirable behaviour that results from these interchangeability requirements 7

∂ Use / interpretationImplied construct Interchangeability requirement 1The claim by teachers in the 2012 GCSE English dispute that students who met the criteria deserve a C The grade indicates specific competences within the subject domain that have been demonstrated on the assessment occasion. Performance judged to meet the same ‘criteria’ gets the same grade on different occasions, specifications, boards 2The use of a B in GCSE maths as a filter for A level study in maths. The grade indicates specific competences within the subject domain that the candidate is likely to be able reproduce in the future. Grades (across occasions, specifications, boards) represent the same level of the construct (mathematics) 8

∂ Use / interpretationImplied construct Interchangeability requirement 3The use of ‘5A*-C EM’ (at least 5 grade Cs inc Eng & math) at GCSE as a filter for any A level study The grade indicates competences transferable to other academic study that the candidate is likely to be able reproduce in the future. Grades achieved in different combinations of subjects and other allowable qualifications must be equivalent in terms of their predictions for subsequent academic outcomes. 4Employers requiring job applicants to have ‘5A*-C EM’. The grade indicates competences transferable to employment contexts that the candidate is likely to be able reproduce in the future. Grades achieved in maths and English (across occasions, specifications, boards) must predict the same level of relevant, reproducible workplace competences. 9

∂ Use / interpretationImplied constructInterchangeability requirement 5Use of GCSE results in league tables to judge schools Average grades for a class or school (especially if referenced against prior attainment) indicate the impact (and hence quality) of the teaching experienced. Grades achieved in different combinations of subjects and other allowable qualifications must be equivalent in terms of some measure of the teaching (quality and quantity) that is typically (after controlling for pre- existing or irrelevant differences) associated with those outcomes. 6Comparison of GCSE results of different types of school to justify impact of policy. Average grades across the jurisdiction indicate the impact (and hence quality) of the system’s schooling provision. As in 5 10

∂ Grade C in GCSE French could be made comparable to the same grade  In French in previous years (or parallel specifications), in terms of what is specified to be demonstrated  In French in previous years (or parallel specifications), or in other languages, in terms of the candidate’s ability to communicate in the target language  In other (academic) subjects, in terms of their prediction of subsequent attainment in other (academic) subjects  In other subjects, in terms of how hard it is to get students to reach this level 11

∂ It follows that …  We cannot talk about standards (setting or maintaining) until we decide which of these uses/interpretations we want to support  In at least some cases the different uses/interpretations will be incompatible  If we want the ‘standard’ to be captured in the outcome (score/grade) we have to prioritise (or optimise)  Alternatively, we can use different equivalences for different uses 12

∂ A level data

∂ 14

A taxonomy of standard setting and maintaining methods (From Coe & Walker, 2013) 15

∂ Judgement-based methods  Criterion-based judgement –Judgement against specific competences –Judgement against overall grade descriptors  Item-based judgement –Angoff method –Bookmark method  Comparative judgment –Cross-moderation –Paired comparison  Judgement of demand –CRAS (complexity, resources, abstractness, strategies) 16

∂ Equating methods  Classical equating models –Linear equating –Equipercentile equating  IRT equating –Rasch model –Other IRT models  Equating designs –Equivalent groups –Common persons –Common items 17

∂ Linking/comparability methods  Reference/anchor test –Concurrent –Prior  Common candidate methods –Subject pairs –Subject matrix –Latent trait  Pre-testing designs (when high-stakes & released) –Live testing with additional future trial test items –Random future test versions within live testing –Low-stakes pre-testing two versions in counterbalanced trial –Low-stakes pre-testing with an anchor test  Norm/cohort referencing –Pure cohort referencing –Adjusted cohort referencing 18

Maintaining, adjusting and generalizing standards and cut-scores Robert Coe, Durham University Standard setting in the Nordic countries Centre for Educational.

Similar presentations

Presentation on theme: "Maintaining, adjusting and generalizing standards and cut-scores Robert Coe, Durham University Standard setting in the Nordic countries Centre for Educational."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Maintaining, adjusting and generalizing standards and cut-scores Robert Coe, Durham University Standard setting in the Nordic countries Centre for Educational.

Similar presentations

Presentation on theme: "Maintaining, adjusting and generalizing standards and cut-scores Robert Coe, Durham University Standard setting in the Nordic countries Centre for Educational."— Presentation transcript:

Similar presentations

About project

Feedback