Www.ioe.ac.uk What kinds of assessment support learning of key competences? Dylan Wiliam EC seminar on the assessment of key competences Brussels, Belgium,

Slides:



Advertisements
Similar presentations
Performance Assessment
Advertisements

The Journey – Improving Writing Through Formative Assessment Presented By: Sarah McManus, Section Chief, Testing Policy & Operations Phyllis Blue, Middle.
School Based Assessment and Reporting Unit Curriculum Directorate
Know role of and characteristics of effective feedback
Formative Assessment: Looking beyond the techniques Dr Jeremy Hodgen Kings College London.
© Myra Young Assessment All rights reserved. Provided for the use of participants in AM circles in North Lanarkshire Council.
Learning, assessment and technology: in that order Keynote address to AMEE conference September 2009; Malaga, Spain Dylan Wiliam Institute of Education,
Formative assessment and contingency in the regulation of learning processes Contribution to a Symposium entitled “Toward a theory of classroom assessment.
Session Outcomes Explain how assessment contributes to the learning process Use a model of feedback to enhance student learning Identify a range of feedback.
Re-viewing the CMI Framework (Re-view: Take another look, see anew)
Assessing Student Learning: Using the standards, progression points and assessment maps Workshop 1: An overview FS1 Student Learning.
The search for the ‘dark matter’ of teacher quality Dylan Wiliam
Integrating assessment with instruction: what will it take to make it work? Dylan Wiliam.
TWSSP Summer Academy June 24-28, Celebrations.
Diving Deeper into Formative Assessment: Research and Resources Fran Clay, Elementary Language Arts Leslie Grahn, World Languages Scott Ruehl, Mt. Hebron.
Research perspectives and formative assessment ASME Conference: Researching Medical Education, November 2009: RIBA, London Dylan Wiliam.
A School Approach to Designing for Learning Learning Intentions : To know that purposefully designing for learning that is contextually appropriate, strengthens.
How can we collect relevant evidence of student understanding?
Enhance classroom discourse through effective questioning with PLC support Engage students to work with teachers to improve classroom discourse.
Consistency of Assessment
Do we need to Assess for Learning? Concordia University Michael Pellegrin, MEESR March 2015.
Problem Based Lessons. Training Objectives 1. Develop a clear understanding of problem-based learning and clarify vocabulary issues, such as problem vs.
Principles of High Quality Assessment
Measuring Learning Outcomes Evaluation
Validity Lecture Overview Overview of the concept Different types of validity Threats to validity and strategies for handling them Examples of validity.
Understanding Validity for Teachers
Formative Assessment Lessons General Information.
FLCC knows a lot about assessment – J will send examples
Formative Assessments
CLASSROOM ASSESSMENT FOR STUDENT LEARNING
Goal Understand the impact on student achievement from effective use of formative assessment, and the role of principals, teachers, and students in that.
Network of School Planners in Ireland Mark Fennell 28 th April 2012 Implementing effective changes to improve student learning:
August 3,  Review “Guiding Principles for SLO Assessment” (ASCCC, 2010)  Review Assessment Pulse Roundtable results  Discuss and formulate our.
Truly Transformational Learning Practices: An Analysis of What Moves in the Best Classrooms Dylan Wiliam
SEISMIC Whole School and PLC Planning Day Tuesday, August 13th, 2013.
How can assessment support learning? Keynote address to Network Connections Pittsburgh, PA; February 9th, 2006 Dylan Wiliam, Educational Testing Service.
Classroom Assessments Checklists, Rating Scales, and Rubrics
1 Issues in Assessment in Higher Education: Science Higher Education Forum on Scientific Competencies Medellin-Colombia Nov 2-4, 2005 Dr Hans Wagemaker.
Measuring Complex Achievement
1. What do you know when you know the test results? The meanings of educational assessments Annual Conference of the International Association for Educational.
Integrating assessment with instruction to keep learning on track Plenary address to NSTA Convention on Science assessment: research and practical approaches.

Transforming lives through learning
CT 854: Assessment and Evaluation in Science & Mathematics
Formative assessment: definitions and relationships
Fourth session of the NEPBE II in cycle Dirección de Educación Secundaria February 25th, 2013 Assessment Instruments.
Presenter Shereen Khan August 17 th,  A school based activity that engages teachers in meaningful, non-judgmental and on-going instructional dialogue.
“A Truthful Evaluation Of Yourself Gives Feedback For Growth and Success” Brenda Johnson Padgett Brenda Johnson Padgett.
1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of.
Summative vs. Formative Assessment. What Is Formative Assessment? Formative assessment is a systematic process to continuously gather evidence about learning.
The Power of Feedback Hattie & Timperley (2007) from Review of Educational Research, 77(1)
Assessing instructional and assessment practice: What makes a lesson formative? CRESST conference, September 2004 UCLA Sunset Village, CA Dylan Wiliam.
P.R.I.D.E. School Professional Day :45 am- 3:30 pm.
ARG symposium discussion Dylan Wiliam Annual conference of the British Educational Research Association; London, UK:
National Standards in Reading & Writing Sources : NZ Ministry of Education websites. G Thomas, J Turner.
1 WebCast # 1 October 17, Inquiry – A Starting Point Educators with an inquiry habit of mind do not presume an outcome; instead they allow for.
The Value in Formative Assessment Prepared By: Jen Ramos.
The Literacy and Numeracy Secretariat Le Secrétariat de la littératie et de la numératie October – octobre 2007 The School Effectiveness Framework A Collegial.
Assessment: Results & Implications for Instruction Parent meeting – October 13, 2011.
Dylan Wiliam Why and How Assessment for Learning Works
Formative assessment: what it is and what it’s not Dylan Wiliam
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
Evaluation and Assessment Evaluation is a broad term which involves the systematic way of gathering reliable and relevant information for the purpose.
Implementing Formative Assessment Processes: What's Working in Schools and Why it is Working Sophie Snell & Mary Jenatscheck.
Sustaining the development of formative assessment with teacher learning communities Dylan Wiliam Keynote presentation Bedfordshire Headteachers’ Conference,
Consistency of Teacher Judgement
Designing an assessment system
Quality in formative assessment
Designing an assessment system
Why do we assess?.
Presentation transcript:

What kinds of assessment support learning of key competences? Dylan Wiliam EC seminar on the assessment of key competences Brussels, Belgium, 15 October

Overview of presentation Functions of assessment  Evaluative  Summative  Formative Validity and the consequences of assessment Formative assessment Designing systems for assessing key competences

Functions of assessment Evaluative (E)  For evaluating institutions, curricula and organizations Summative (S)  For describing individuals Formative (F)  For supporting learning

Examples of assessment systems E  NAEP, “No Child Left Behind” S  Baccalaureat, Abitur, Matura E+S  GCSE (England) E+S+F  National Curriculum Assessment (England)

5 Validity Validity is a property of inferences, not of assessments “One validates, not a test, but an interpretation of data arising from a specified procedure” (Cronbach, 1971; emphasis in original) The phrase “A valid test” is therefore a category error (like “A happy rock”)  No such thing as a valid (or indeed invalid) assessment  No such thing as a biased assessment Reliability is a pre-requisite for validity  Talking about “reliability and validity” is like talking about “swallows and birds”  Validity includes reliability

6 Modern conceptions of validity Validity subsumes all aspects of assessment quality  Reliability  Representativeness (content coverage)  Relevance  Predictiveness But not impact (Popham: right concern, wrong concept) “Validity is an integrative evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment” (Messick, 1989 p. 13)

7 Threats to validity Inadequate reliability Construct-irrelevant variance  Differences in scores are caused, in part, by differences not relevant to the construct of interest  The assessment assesses things it shouldn’t  The assessment is “too big” Construct under-representation  Differences in the construct are not reflected in scores  The assessment doesn’t assess things it should  The assessment is “too small” With clear construct definition all of these are technical—not value— issues

Be careful what you wish for… Campbell’s law (US) | Goodhart’s law (UK)  “All performance indicators lose their usefulness when used as objects of policy”  The clearer you are about what you want, the more likely you are to get it, but the less likely it is to mean anything  Where the evaluative function is paramount, the challenge is to find “tests worth teaching to”.

9 “All the women are strong, all the men are good-looking, and all the children are above average.” Garrison Keillor The Lake Wobegon effect revisited

10 Achievement of English 16-year-olds

11 Consequential validity? No such thing! As has been stressed several times already, it is not that adverse social consequences of test use render the use invalid, but, rather, that adverse social consequences should not be attributable to any source of test invalidity such as construct-irrelevant variance. If the adverse social consequences are empirically traceable to sources of test invalidity, then the validity of the test use is jeopardized. If the social consequences cannot be so traced—or if the validation process can discount sources of test invalidity as the likely determinants, or at least render them less plausible—then the validity of the test use is not overturned. Adverse social consequences associated with valid test interpretation and use may implicate the attributes validly assessed, to be sure, as they function under the existing social conditions of the applied setting, but they are not in themselves indicative of invalidity. (Messick, 1989, p )

Centrality of construct definition Construct definition is essential to effective assessment Allows clear distinction between adverse impact and bias  (and anyway, bias is a property of inferences, not of instruments Examples of how construct definition distinguishes impact and bias  Mental rotation of three-dimensional solids  Testing for admission to higher education  Testing of English language learners

A brief history of formative assessment “Formative assessment” has been used to describe:  The time at which the assessment is scheduled  Any assessment taken before the last one  A purpose for assessing  “Assessment for learning”  A function that the assessment outcomes serve  Assessments that change teaching  Formative use of assessments

Feedback metaphor Feedback in engineering  Positive feedback  Leads to explosive increase or collapse (bad!)  Negative feedback  Leads to asymptotic convergence to, or damped oscillation about, a stable equilibrium Components of a feedback system  data on the actual level of some measurable attribute;  data on the reference level of that attribute;  a mechanism for comparing the two levels and generating information about the ‘gap’ between the two levels;  a mechanism by which the information can be used to alter the gap. To an engineer, information is therefore feedback only if the information fed back is used in reducing the gap between actual and desired states.

Relevant studies Fuchs & Fuchs (1986) Natriello (1987) Crooks (1988) Banger-Drowns, et al. (1991) Kluger & DeNisi (1996) Black & Wiliam (1998) Nyquist (2003) Dempster (1991, 1992) Elshout-Mohr (1994) Brookhart (2004) Allal & Lopez (2005) Köller (2005) Brookhart (2007) Wiliam (2007) Hattie & Timperley (2007) Shute (2008)

Feedback Kinds of feedback in Higher Education (Nyquist, 2003)  Weaker feedback only  Knowledge of results (KoR)  Feedback only  KoR + clear goals or knowledge of correct results (KCR)  Weak formative assessment  KCR+ explanation (KCR+e)  Moderate formative assessment  (KCR+e) + specific actions for gap reduction  Strong formative assessment  (KCR+e) + activity

Effect of formative assessment (HE) NEffect* Weaker feedback only Feedback only Weaker formative assessment Moderate formative assessment Strong formative assessment *corrected values

The formative assessment hi-jack… Long-cycle  Span: across units, terms  Length: four weeks to one year  Impact: Student monitoring; curriculum alignment Medium-cycle  Span: within and between teaching units  Length: one to four weeks  Impact: Improved, student-involved, assessment; teacher cognition about learning Short-cycle  Span: within and between lessons  Length:  day-by-day: 24 to 48 hours  minute-by-minute: 5 seconds to 2 hours  Impact: classroom practice; student engagement

Functions of assessment For evaluating institutions, organizations and curricula For describing individuals For supporting learning  Monitoring learning  Whether learning is taking place  Diagnosing (informing) learning  What is not being learnt  Instructionally tractable  What to do about it

Formative assessment: a new definition “An assessment functions formatively to the extent that evidence about student achievement elicited by the assessment is interpreted and used to make decisions about the next steps in instruction that are likely to be better, or better founded, than the decisions that would have been taken in the absence of that evidence. Formative assessment therefore involves the creation of, and capitalization upon, moments of contingency (short, medium and long cycle) in instruction with a view to regulating learning (proactive, interactive, and retroactive).” (Wiliam, 2009)

Some principles A commitment to formative assessment  Does not entail any view of what is to be learned  Does not entail any view of what happens when learning takes place

The learning milieu Feedback must cause a cognitive engagement in learning  Mastery orientation vs. performance orientation (Dweck)  Growth pathway vs. well-being pathway (Boekaerts)

Defining formative assessment Key processes  Establishing where the learners are in their learning  Establishing where they are going  Working out how to get there Participants  Teachers  Peers  Learners

Aspects of formative assessment Where the learner is going Where the learner isHow to get there Teacher Clarify and share learning intentions Engineering effective discussions, tasks and activities that elicit evidence of learning Providing feedback that moves learners forward Peer Understand and share learning intentions Activating students as learning resources for one another Learner Understand learning intentions Activating students as owners of their own learning

Five “key strategies”… Clarifying, understanding, and sharing learning intentions  curriculum philosophy Engineering effective classroom discussions, tasks and activities that elicit evidence of learning  classroom discourse, interactive whole-class teaching Providing feedback that moves learners forward  feedback Activating students as learning resources for one another  collaborative learning, reciprocal teaching, peer-assessment Activating students as owners of their own learning  metacognition, motivation, interest, attribution, self-assessment (Wiliam & Thompson, 2007)

…and one big idea Use evidence about learning to adapt instruction to meet student needs

Examples of techniques Learning intentions  “sharing exemplars” Eliciting evidence  “mini white-boards” Providing feedback  “match the comments to the essays” Students as owners of their learning  “coloured cups” Students as learning resources  “pre-flight checklist”

28 So how do we design assessments? Reliability requires random sampling from the domain of interest Increasing reliability requires increasing the size of the sample Using teacher assessment in certification is attractive:  Increases reliability (increased test time)  Increases validity (addresses aspects of construct under-representation) But problematic  Lack of trust (“Fox guarding the hen house”)  Problems of biased inferences (construct-irrelevant variance)  Can introduce new kinds of construct under-representation

Progression in understanding light 1Know that light comes from different sources 2Know that light passes through some materials and not others, and that when it does not, shadows may be formed 3Know that light can be made to change direction, and that shiny surfaces can form images 4Know that light travels in straight lines, and this can be used to explain the formation of shadows 5Understand how light is reflected 6Understand how prisms and lenses refract and disperse light 7Be able to describe how simple optical devices work 8Understand refraction as an effect of differences of velocities in different media 9 [nothing new at this level] 10Understand the processes of dispersion, interference, diffraction and polarisation of light.

30 The challenge To design an assessment system that is:  Distributed  So that evidence collection is not undertaken entirely at the end  Synoptic  So that learning has to accumulate  Extensive  So that all important aspects are covered (breadth and depth)  Progressive  So that assessment outcomes relate to learning progressions  Manageable  So that costs are proportionate to benefits  Trusted  So that stakeholders have faith in the outcomes

31 The effects of context Beliefs  about what constitutes learning;  in the value of competition between students;  in the value of competition between schools;  that test results measure school effectiveness;  about the trustworthiness in numerical data, with bias towards a single number;  that the key to schools’ effectiveness is strong top-down management;  that teachers need to be told what to do, or conversely that they have all the answers.

32 Conclusion There is no “perfect” assessment system anywhere. Each nation’s assessment system is exquisitely tuned to local constraints and affordances. Every country’s assessment system works in practice but not in theory. Assessment practices have impacts on teaching and learning which may be strongly amplified or attenuated by the national context. The overall impact of particular assessment practices and initiatives is determined at least as much by culture and politics as it is by educational evidence and values.

33 Conclusion (2) It is probably idle to draw up maps for the ideal assessment policy for a country, even although the principles and the evidence to support such an ideal might be clearly agreed within the ‘expert’ community. Instead, it seems likely that it will be more productive to focus on those arguments and initiatives that are least offensive to existing assumptions and beliefs, and that will nevertheless serve to catalyze a shift in those assumptions and beliefs while at the same time improving some aspects of present practice.