Maintaining, adjusting and generalizing standards and cut-scores Robert Coe, Durham University Standard setting in the Nordic countries Centre for Educational.

Slides:



Advertisements
Similar presentations
Effective Self Evaluation – writing a good SEF
Advertisements

Quality Control in Evaluation and Assessment
[Insert faculty Banner] Consistency of Assessment
E-asTTle Writing All you ever wanted to know……. “Launched in November 2007, the Revised New Zealand Curriculum sets the direction for teaching and learning.
Key Stage 3 National Strategy Standards and assessment: session 3.
Assessment Assessment should be an integral part of a unit of work and should support student learning. Assessment is the process of identifying, gathering.
30 Years of Evidence on the Comparability of Exam Standards: Myths, Fiascos and Unrealistic Expectations Paul E. Newton Centre for Evaluation & Monitoring,
[Insert faculty Banner] Consistency of Assessment
GCSE Crossover Coursework Pre1914 texts: Shakespeare and the Prose Study.
Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.
Standard setting and maintenance for Reformed GCSEs Robert Coe.
Aim to provide key guidance on assessment practice and translate this into writing assignments.
Consistency of Assessment
Test Validity: What it is, and why we care.
VALIDITY.
The reform of A level qualifications in the sciences Dennis Opposs SCORE seminar on grading of practical work in A level sciences, 17 October 2014, London.
Reforms to GCSE, AS and A level qualifications September 2014.
Dr. Robert Mayes University of Wyoming Science and Mathematics Teaching Center
Validity Lecture Overview Overview of the concept Different types of validity Threats to validity and strategies for handling them Examples of validity.
Consistency of assessment Technology subjects (7-12)
Mapping Science Education Policy in Developing Countries Keith M Lewin.
Performance Descriptors Consultation October 2014 Summary.
Overall Teacher Judgements
PLAN AND ORGANISE ASSESSMENT. By the end of this session, you will have an understanding of what is assessment, competency based assessment, assessment.
CONNECT WITH CAEP | | CAEP Standard 3: Candidate quality, recruitment and selectivity Jennifer Carinci,
Classroom Assessments Checklists, Rating Scales, and Rubrics
Validity & Practicality
Qualifications Update: Modern Languages Qualifications Update: Modern Languages.
Achievethecore.org 1 Setting the Context for the Common Core State Standards Sandra Alberti Student Achievement Partners.
YEAR 10 GUIDANCE EVENING Progress 8 The Government have introduced a new value-added performance measure which will replace 5+ A*-C inc Maths/English.
Programme Objectives Analyze the main components of a competency-based qualification system (e.g., Singapore Workforce Skills) Analyze the process and.
EYFS – and the OFSTED Framework Sue Monypenny Senior Education Standards and Effectiveness Officer.
Measurement Validity.
Progress 8 – preparing for the new measure… Tuesday 23 rd September 2014.
New Advanced Higher Subject Implementation Events Statistics Unit Assessment at Advanced Higher.
Scholarship Scholarship is an award rather than a qualification.
Validity Validity is an overall evaluation that supports the intended interpretations, use, in consequences of the obtained scores. (McMillan 17)
Assessment at KS4 Bury C of E High School Engaging Parents Information.
KEY STAGE 2 SATS Session Aims To understand what SATs are and why we have them. What will be different in SATs 2016? To share timetable for SATs.
YEAR 9 OPTIONS INFORMATION EVENING 2nd December 2015
Assessing Learning Outcomes Polices, Progress and Challenges 1.
The Key Stage 4 Curriculum Paul Miller, Deputy Headteacher
Changes to assessment and reporting of children’s attainment A guide for Parents and Carers Please use the SPACE bar to move this slideshow at your own.
Development of the Egyptian Code of Practice for Student Assessment Lamis Ragab, MD, MHPE Hala Salah, MD.
1 A Century of Testing: Ideas on Solving Enduring Accountability and Assessment Problems UCLA, Los Angeles 8-9 September 2005 Barry McGaw Director for.
What are competencies?  Emphasize life skills and evaluate mastery of those skills according to actual leaner performance.  Competencies consist of.
AS and A level Reform HE Admissions Seminar 8 March 2016.
A case study. Content School context Challenges Outcomes Curriculum pathways What works in our context Process Ofsted & progression to HE – a view.
Navigating the Curriculum New curriculum challenges New ways of reporting progress.
2 What are Functional Skills? How do they fit in and how will they be assessed?
Monitoring Attainment and Progress from September 2016 John Crowley Senior Achievement Adviser.
Copyright © Springer Publishing Company, LLC. All Rights Reserved. DEVELOPING AND USING TESTS – Chapter 11 –
Regulation of Statutory National Assessments l. Contents ■Ofqual Responsibilities ■Regulation at GCSE ■The Regulatory Framework □Statutory Objectives.
Qualification Reform Update from OFQUAL.
Classroom Assessments Checklists, Rating Scales, and Rubrics
Daniel Muijs Saad Chahine
What is moderation and why should we moderate?
An overview of the changes to GCE biology in England
VALIDITY by Barli Tambunan/
Introduction to the Validation Phase
Assessment Framework and Test Blueprint
Key findings on comparability of language testing in Europe ECML Colloquium 7th December 2016 Dr Nick Saville.
Journalism 614: Reliability and Validity
Classroom Assessments Checklists, Rating Scales, and Rubrics
Why and how are GCSEs changing?
Bursting the assessment mythology: A discussion of key concepts
EXAM PREPARATION: IAPS AND LoR QUESTIONS
COMPETENCIES & STANDARDS
Assessment Literacy: Test Purpose and Use
Assessment Who Should Assess?
Presentation transcript:

Maintaining, adjusting and generalizing standards and cut-scores Robert Coe, Durham University Standard setting in the Nordic countries Centre for Educational Measurement, University of Oslo (CEMO) Oslo, 22 September Slides available at:

∂ Different meanings of ‘standards’  Does it test something sensible?  Is the content complex & extensive?  Are the questions hard?  Has the design/development followed the rules?  Has it been marked properly?  Are the scores reliable?  Does it actually measure the intended construct?  Can the outcomes (grades/scores) be used as desired?  Does a particular cut-point indicate the same as –Its ‘equivalent’ in previous versions –Some kind of equivalent in other assessments –Some specified level of performance 2

∂ In England we are worried about  Standards over time  Standards across qualifications, subjects, or specifications within the same broad qualification  Standards between awarding organisations, or assessment processes  Standards across countries  Standards between groups of candidates (e.g. males/females, rich/poor) 3

∂ Comparability (Newton, 2010)  Candidates who score at linked (grade boundary) marks must be the same in terms of … –the character of their attainments (phenomenal) –the causes of their attainments (causal) –the extent to which their attainments predict their future success (predictive) 4

∂ Comparability (Coe, Newton & Elliott, 2012)  Any rational claim about the comparability of grades in different qualifications amounts to a claim that those grades can be treated as interchangeable for some purpose or interpretation.  We should talk about the comparability of grades or scores (rather than of qualifications), since these are the outcomes of an assessment that are interpreted and used.  Most interpretations of a grade achieved in an examination relate directly to the candidate. In other words, we are interested in what the grade tells us about the person who achieved it, and inferring characteristics of the person from the observed performance.  Any claim about interchangeability relates to a particular construct. 5

∂ Test development and standards Theoretical  Specify the construct  Develop the assessments to measure it  Use equating/linking procedures to link key cut-points  Candidates with linked scores are equivalent (wrt the construct) Pragmatic  Assessments evolve, shaped by –Explicit constructs –Past practice –User requirements (wide range of different uses & purposes) –Political drivers –Pragmatic constraints  Comparability defined by public opinion (Cresswell, 1996, 2012) 6

∂ An integration: rational and pragmatic  Consider the different ways exam results are used (interchangeably)  Identify an implied construct for each (in terms of which they are interchangeable)  Develop a defensible method for minimising unfairness and undesirable behaviour that results from these interchangeability requirements 7

∂ Use / interpretationImplied construct Interchangeability requirement 1The claim by teachers in the 2012 GCSE English dispute that students who met the criteria deserve a C The grade indicates specific competences within the subject domain that have been demonstrated on the assessment occasion. Performance judged to meet the same ‘criteria’ gets the same grade on different occasions, specifications, boards 2The use of a B in GCSE maths as a filter for A level study in maths. The grade indicates specific competences within the subject domain that the candidate is likely to be able reproduce in the future. Grades (across occasions, specifications, boards) represent the same level of the construct (mathematics) 8

∂ Use / interpretationImplied construct Interchangeability requirement 3The use of ‘5A*-C EM’ (at least 5 grade Cs inc Eng & math) at GCSE as a filter for any A level study The grade indicates competences transferable to other academic study that the candidate is likely to be able reproduce in the future. Grades achieved in different combinations of subjects and other allowable qualifications must be equivalent in terms of their predictions for subsequent academic outcomes. 4Employers requiring job applicants to have ‘5A*-C EM’. The grade indicates competences transferable to employment contexts that the candidate is likely to be able reproduce in the future. Grades achieved in maths and English (across occasions, specifications, boards) must predict the same level of relevant, reproducible workplace competences. 9

∂ Use / interpretationImplied constructInterchangeability requirement 5Use of GCSE results in league tables to judge schools Average grades for a class or school (especially if referenced against prior attainment) indicate the impact (and hence quality) of the teaching experienced. Grades achieved in different combinations of subjects and other allowable qualifications must be equivalent in terms of some measure of the teaching (quality and quantity) that is typically (after controlling for pre- existing or irrelevant differences) associated with those outcomes. 6Comparison of GCSE results of different types of school to justify impact of policy. Average grades across the jurisdiction indicate the impact (and hence quality) of the system’s schooling provision. As in 5 10

∂ Grade C in GCSE French could be made comparable to the same grade  In French in previous years (or parallel specifications), in terms of what is specified to be demonstrated  In French in previous years (or parallel specifications), or in other languages, in terms of the candidate’s ability to communicate in the target language  In other (academic) subjects, in terms of their prediction of subsequent attainment in other (academic) subjects  In other subjects, in terms of how hard it is to get students to reach this level 11

∂ It follows that …  We cannot talk about standards (setting or maintaining) until we decide which of these uses/interpretations we want to support  In at least some cases the different uses/interpretations will be incompatible  If we want the ‘standard’ to be captured in the outcome (score/grade) we have to prioritise (or optimise)  Alternatively, we can use different equivalences for different uses 12

∂ A level data

∂ 14

A taxonomy of standard setting and maintaining methods (From Coe & Walker, 2013) 15

∂ Judgement-based methods  Criterion-based judgement –Judgement against specific competences –Judgement against overall grade descriptors  Item-based judgement –Angoff method –Bookmark method  Comparative judgment –Cross-moderation –Paired comparison  Judgement of demand –CRAS (complexity, resources, abstractness, strategies) 16

∂ Equating methods  Classical equating models –Linear equating –Equipercentile equating  IRT equating –Rasch model –Other IRT models  Equating designs –Equivalent groups –Common persons –Common items 17

∂ Linking/comparability methods  Reference/anchor test –Concurrent –Prior  Common candidate methods –Subject pairs –Subject matrix –Latent trait  Pre-testing designs (when high-stakes & released) –Live testing with additional future trial test items –Random future test versions within live testing –Low-stakes pre-testing two versions in counterbalanced trial –Low-stakes pre-testing with an anchor test  Norm/cohort referencing –Pure cohort referencing –Adjusted cohort referencing 18