Designing an assessment system

Slides:



Advertisements
Similar presentations
Evidence & Preference: Bias in Scoring TEDS-M Scoring Training Seminar Miami Beach, Florida.
Advertisements

Assessment Systems for the Future: the place of assessment by teachers A project of the Assessment Reform Group, funded by the Nuffield Foundation.
National assessment: how to make it better Dylan Wiliam King’s College London.
Student Learning Targets (SLT) You Can Do This! Getting Ready for the School Year.
1 Interpretation and use. 2 The walls inside are plastered with laboriously made graphs…
Becoming a Teacher Ninth Edition
Overall Teacher Judgements
ASSESSMENT 19 September 2015Laila N. Boisselle 1.
Classroom Assessments Checklists, Rating Scales, and Rubrics
Stephen C. Court Educational Research and Evaluation, LLC A Presentation at the First International Conference on Instructional Sensitivity Achievement.
Measuring Complex Achievement
1. What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational.
Environmental Management System Definitions
What kinds of assessment support learning of key competences? Dylan Wiliam EC seminar on the assessment of key competences Brussels, Belgium,
CT 854: Assessment and Evaluation in Science & Mathematics
1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of.
© Institute for Fiscal Studies Dynamic scoring: attractions, challenges and trade-offs Stuart Adam Antoine Bozio HMRC/ESRC International Conference on.
ARG symposium discussion Dylan Wiliam Annual conference of the British Educational Research Association; London, UK:
Foundations of American Education: Perspectives on Education in a Changing World, 15e © 2011 Pearson Education, Inc. All rights reserved. Chapter 11 Standards,
Some Definitions Monitoring – the skill of effectively over- viewing and analysing a learning situation Assessment – is the closer examination of pupil’s.
Commission on School Reform 13th April OECD - some positives  Above average levels in PISA science and reading  Around average in PISA maths 
Knowing What Students Know Ganesh Padmanabhan 2/19/2004.
Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.
Interpreting and Summarizing Published Research Cruise, P., & Fisher, S. (2002)Teachers’ Attitudes toward No Child Left Behind Cruise, P., & Fisher, S.
CfE Higher Modern Studies Overview of course (3 units):
Curriculum Forum Secondary Tuesday 6 June 2017
EVALUATING EPP-CREATED ASSESSMENTS
Classroom Assessments Checklists, Rating Scales, and Rubrics
Margaret Lombe, Ph.D. GSSW, Boston College
Program Evaluation ED 740 Study Team Project Program Evaluation
Learning and Development Developing leaders and managers
GENDER TOOLS FOR ENERGY PROJECTS Module 2 Unit 2
Perception and Communication
Sex, Lies and Evaluation
Assessment of Learning 1
Classroom Assessment A Practical Guide for Educators by Craig A
Ed Reform in Washington State 4.5, 4.6
Professional Learning Communities
SAT Notes: Please get out your notebook and turn to the writing section. We are taking notes today.
Developing Decision-Making Skills
Classroom Assessments Checklists, Rating Scales, and Rubrics
Classroom Assessment Validity And Bias in Assessment.
Mechanical, Meaningful, and Communicative Practice
the BIG 3 OVERVIEW OF CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT
H070 Topic Title H470 Topic Title.
Director, Institutional Research
Quality and Qualifications Ireland and its Functions
Mapping it Out! Practical Tools to Use Assessment Well
Learning and Development Developing leaders and managers
What do you know when you know the test results
..
Engagement Follow-up Resources
Predetermined Objectives – 2013/14
Assessment 101 Zubair Amin MD MHPE.
Objectives 1. A definition of planning and an understanding of the purposes of planning 2. Insights into how the major steps of the planning process are.
Dylan Wiliam, Institute of Education, University of London
Engagement Follow-up Resources
Assessment for Learning
Designing an assessment system
Mechanical, Meaningful, and Communicative Practice
Pest Risk Analysis (PRA) Stage 2: Pest Risk Assessment
Statistical Data Analysis
February 21-22, 2018.
Evaluation and Testing
jot down your thoughts re:
TESTING AND EVALUATION IN EDUCATION GA 3113 lecture 1
Creating Common Formative Assessments that Advance Student Learning
Why do we assess?.
EDUC 2130 Quiz #10 W. Huitt.
Create Your Own Project
Presentation transcript:

Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of London www.dylanwiliam.net

Overview The purposes of assessment The structure of the assessment system The locus of assessment The extensiveness of the assessment Assessment format Scoring models Quality issues The role of teachers Contextual issues

Functions of assessment Three functions of assessment: For evaluating institutions (evaluative) For describing individuals (summative) For supporting learning Monitoring learning: Whether learning is taking place Diagnosing (informing) learning: What is not being learnt Forming learning: What to do about it No system can easily support all three functions Traditionally, we have grouped the first two, and ignored the third Learning is sidelined; summative and evaluative functions are weakened Instead, we need to separate the first (evaluative) from the other two

The Lake Wobegon effect “All the women are strong, all the men are good-looking, and all the children are above average.” Garrison Keillor Scores • Time

Goodhart’s law All performance indicators lose their usefulness when used as objects of policy Privatization of British Rail Targets in the Health Service “Bubble” students in high-stakes settings

Reconciling different pressures The “high-stakes” genie is out of the bottle, and we cannot put it back The clearer you are about what you want, the more likely you are to get it, but the less likely it is to mean anything The only thing left to us is to try to develop “tests worth teaching to” This is fundamentally an issue of validity.

Validity Validity is a property of inferences, not of assessments “One validates, not a test, but an interpretation of data arising from a specified procedure” (Cronbach, 1971; emphasis in original) No such thing as a valid (or indeed invalid) assessment No such thing as a biased assessment A pons asinorum for thinking about assessment

Threats to validity Inadequate reliability Construct-irrelevant variance The assessment includes aspects that are irrelevant to the construct of interest the assessment is “too big” Construct under-representation The assessment fails to include important aspects of the construct of interest the assessment is “too small” With clear construct definition all of these are technical—not value—issues

Two key challenges Construct-irrelevant variance Sensitivity to instruction Construct under-representation Extensiveness of assessment

Sensitivity to instruction 1 year Distribution of attainment on an item highly sensitive to instruction

Sensitivity to instruction (2) 1 year Distribution of attainment on an item moderately sensitive to instruction

Sensitivity to instruction (3) 1 year Distribution of attainment on an item relatively insensitive to instruction

Sensitivity to instruction (4) 1 year Distribution of attainment on an item completely insensitive to instruction

Consequences (1)

Consequences (2)

Consequences (3)

Insensitivity to instruction Primarily attributable to the fact that learning is slower than assumed Exacerbated by the normal mechanisms of test development Leads to erroneous attributions about the effects of schooling

A sensitivity to instruction index Test Sensitivity index IQ-type test (insensitive) NAEP 6 TIMSS 8 ETS “STEP” tests (1957) ITBS 10 Completely sensitive test 100

Extensiveness of assessment Using teacher assessment in certification is attractive: Increases reliability (increased test time) Increases validity (addresses aspects of construct under-representation) But problematic Lack of trust (“Fox guarding the hen house”) Problems of biased inferences (construct-irrelevant variance) Can introduce new kinds of construct under-representation

The challenge To design an assessment system that is: Distributed So that evidence collection is not undertaken entirely at the end Synoptic So that learning has to accumulate

A possible model All students are assessed at test time Different students in the same class are assigned different tasks The performance of the class defines an “envelope” of scores, e.g. Advanced: 5 students Proficient: 8 students Basic: 10 students Below basic: 2 students Teacher allocates levels on the basis of whole-year performance

Benefits and problems Benefits The only way to teach to the test is to improve everyone’s performance on everything (which is what we want!) Validity and reliability are enhanced Problems Students’ scores are not “inspectable” Assumes student motivation

The effects of context Beliefs about what constitutes learning; Beliefs in the reliability and validity of the results of various tools; A preference for and trust in numerical data, with bias towards a single number; Trust in the judgments and integrity of the teaching profession; Belief in the value of competition between students; Belief in the value of competition between schools; Belief that test results measure school effectiveness; Fear of national economic decline and education’s role in this; Belief that the key to schools’ effectiveness is strong top-down management;

Conclusion There is no “perfect” assessment system anywhere. Each nation’s assessment system is exquisitely tuned to local constraints and affordances. Assessment practices have impacts on teaching and learning which may be strongly amplified or attenuated by the national context. The overall impact of particular assessment practices and initiatives is determined at least as much by culture and politics as it is by educational evidence and values.

Conclusion (2) It is probably idle to draw up maps for the ideal assessment policy for a country, even although the principles and the evidence to support such an ideal might be clearly agreed within the ‘expert’ community. Instead, focus on those arguments and initiatives which are least offensive to existing assumptions and beliefs, and which will nevertheless serve to catalyze a shift in them while at the same time improving some aspects of present practice.

Questions? Comments? Institute of Education University of London 20 Bedford Way London WC1H 0AL Tel +44 (0)20 7612 6000 Fax +44 (0)20 7612 6126 Email info@ioe.ac.uk