Formative and summative classroom assessment: Where we are and where we might go Dylan Wiliam, UCL These days, most schools have large amounts of data—on.

Formative and summative classroom assessment: Where we are and where we might go Dylan Wiliam, UCL
These days, most schools have large amounts of data—on students, on teachers, and on many other aspects of school life. However, much of the time, these data are not used to maximum effect. Sometimes data are not used enough, so that decisions are based on hunch or whim, even though there are data that might improve those decisions. At other times, data are used too much, with the same data being used for different purposes that conflict, so that one use of the data makes them less meaningful for other uses. In this keynote presentation, Dylan Wiliam will outline a number of principles for assessment design that schools and districts can use to audit their existing assessment systems, to examine key trade-offs and compromises, and develop coherent assessment systems that both support learning, and provide meaningful information about students’ achievements. NCME Special Conference on Classroom Assessment: Assessment in the Disciplines, October 2018

Outline Three strands of classroom assessment research
Formative assessment: Critiques and responses Putting the pieces back together

Three important strands
Classroom evaluation Teaching as a contingent activity Student voice

I. Classroom assessment
Classroom assessment is not (necessarily) formative assessment (and vice-versa) Location Relation to instruction (synchronous vs. asynchronous) Purpose (instructional guidance, evaluation) Environment, resources and conditions Authority (teacher, peer, learner, other) Agent (teacher, peer, learner) Subject (individuals, groups, class) Assessor (teacher, peer, learner, machine) Black and Wiliam (2004)

Research on classroom evaluation processes
“The impact of evaluation processes on students” (Natriello, 1987) “The impact of classroom evaluation practices on students” (Crooks, 1988) “For the purpose of this review, classroom evaluation is defined as evaluation based on activities that students undertake as an integral part of the educational programs in which they are enrolled.” (p. 437) inside and outside the classroom curriculum-embedded and terminal tests teacher designed and “off-the-shelf” tests adjunct questions and other exercises in learning materials oral questions

II. Teaching as a contingent activity
Individualized instructional systems 1912: The Individual System (Burke) 1919: The Winnetka Plan (Washburne) 1919: The Dalton Plan (Parkhurst) 1925: Teaching machines (Pressey) … 1966: Learning for mastery (Bloom) “In fact, we may even insist that our educational efforts have been unsuccessful to the extent to which our distribution of achievement approximates the normal distribution.” (Bloom, 1968)

Evaluation: summative and formative (Cronbach)
“But to call in the evaluator only upon the completion of course development, to confirm what has been done, is to offer him a menial role and to make meager use of his services. To be influential in course improvement, evidence must become available midway in curriculum development, not in the home stretch, when the developer is naturally reluctant to tear open a supposedly finished body of materials and techniques. Evaluation, used to improve the course while it is still fluid, contributes more to improvement of education than evaluation used to appraise a product already placed on the market.” (Cronbach, 1963)

Evaluation: summative and formative (Scriven)
“And there are many contexts in which calling in an evaluator to perform a final evaluation of the project or person is an act of proper recognition of responsibility to the person, product, or taxpayers. It therefore seems a little excessive to refer to this as simply ‘a menial role’, as Cronbach does.” “It is obviously a great service if this kind of terminal evaluation (we might call it summative as opposed to formative evaluation) can demonstrate that an expensive textbook is not significantly better than the competition, or that it is enormously better than any competitor.” (Scriven, 1963, emphasis in original)

Summative and formative evaluation (Bloom)
“Much of what we have been discussion in the section on the effects of examinations has been concerned with what may be termed “summative evaluation.” This is the evaluation which is used at the end of a course, term, or educational program. Although the procedures for such evaluation may have a profound effect on the learning and instruction, much of this effect may be in anticipation of the examination or as a short- or long- term consequence of the examination after it has been given.”

Teaching as a contingent activity
“Quite in contrast is the use of “formative evaluation” to provide feedback and correctives at each stage in the teaching-learning process. By formative evaluation we mean evaluation by brief tests used by teachers and students as aids in the learning process. While such tests may be graded and used as part of the judging and classificatory function of evaluation, we see much more effective use of formative evaluation if it is separated from the grading process and used primarily as an aid to teaching.” (Bloom, 1969, pp )

III. The student’s voice: Records of achievement
School leaving examinations in England Top 20%: General Certificate of Education ( ) Next 40%: Certificate of Secondary Education ( ) “Half our future” (Newsom Report, 1963) “Boys and girls who stay at school until they are 16 may reasonably look for some record of achievement when they leave. Some form of leaver's certificate which combined assessment with a record of the pupil's school career would be valued by parents, future employers and colleges of further education and should, we believe, be available to all pupils who complete a full secondary course.” (p. 80)

Research synthesis: Configurative and aggregative
Idealist Realist Philosophy Generate Explore Test Relation to theory Configurating Aggregating Approach to synthesis Iterative A priori Methods Theoretical search Exhaustive search Value contribution Avoid bias Quality assessment Emergent concepts Magnitude/precision Product Enlightenment Instrumental Use Gough (2012)

Where should our efforts be focused?
Which of these is most strongly associated with high student achievement? Student speaks the language of instruction at home Student behavior in the school is good The amount of inquiry-based instruction The amount of teacher-directed instruction The school’s socio-economic profile Top 3 factors Student’s socio-economic profile Index of adaptive instruction The amount of teacher-directed instruction OECD (2016, Fig II.7.2)

“More research is (always) needed…”
“Furthermore, despite the existence of some marginal and even negative results, the range of conditions and contexts under which studies have shown that gains can be achieved must indicate that the principles that underlie achievement of substantial improvements in learning are robust. Significant gains can be achieved by many different routes, and initiatives here are not likely to fail through neglect of delicate and subtle features.” (Black & Wiliam, 1998 pp )

Critiques and responses

Formative assessment: A critical review
The definitional issue The domain-dependency issue The effectiveness issue The measurement issue The professional development issue The system issue Bennett (2011)

The definitional issue
Need for clear definitions So that research outcomes are commensurable To communicate effectively Theorization and definition Theorizing what? Prescriptive: formative assessment as we would like it to be in terms of what students should learn in terms of what happens when learning takes place in terms of how instruction should be organized in terms of how teachers should teach Descriptive: formative assessment as it is

Theorization and definition
Possible variables Category (instruments, outcomes, functions) Beneficiaries (teachers, learners) Timescale (months, weeks, days, hours, minutes) Consequences (outcomes, instruction, decisions) Theory of action (what gets formed?)

Formative Assessment: A contested term
Long-cycle Medium-cycle Short-cycle Across terms, teaching units Within and between teaching units Within and between lessons Span Four weeks to one year One to four weeks Minute-by-minute and day-by-day Length Monitoring, curriculum alignment Student-involved assessment Engagement, responsiveness Impact

Assessment for learning (Mittler, 1973)
“Assessment for learning is any assessment for which the first priority in its design and practice is to serve the purpose of promoting pupils’ learning. It thus differs from assessment designed primarily to serve the purposes of accountability, or of ranking, or of certifying competence. An assessment activity can help learning if it provides information to be used as feedback, by teachers, and by their pupils, in assessing themselves and each other, to modify the teaching and learning activities in which they are engaged. Such assessment becomes ‘formative assessment’ when the evidence is actually used to adapt the teaching work to meet learning needs.” (Black, Harrison, Lee, Marshall & Wiliam, 2004 p. 2)

How does assessment improve learning?
Announced? Given? Scored? Used? 4 Assessment for motivation Announced? Given? Scored? Used? 4 Assessment for motivation Retrieval practice Announced? Given? Scored? Used? 4 Assessment for motivation Retrieval practice Instructional correctives Announced? Given? Scored? Used? 4 Assessment for motivation Retrieval practice Instructional correctives Formative assessment

An inclusive definition of formative assessment
An assessment functions formatively to the extent that evidence about student achievement is elicited, interpreted, and used by teachers, learners, or their peers, to make decisions about future instruction that are likely to be better, or better founded, than the decisions that would have been taken in the absence of that evidence. An assessment functions formatively to the extent that evidence about student achievement is elicited, interpreted, and used by teachers, learners, or their peers, to make decisions about future instruction that are likely to be better, or better founded, than the decisions that would have been taken in the absence of that evidence. An assessment functions formatively to the extent that evidence about student achievement is elicited, interpreted, and used by teachers, learners, or their peers, to make decisions about future instruction that are likely to be better, or better founded, than the decisions that would have been taken in the absence of that evidence. An assessment functions formatively to the extent that evidence about student achievement is elicited, interpreted, and used by teachers, learners, or their peers, to make decisions about future instruction that are likely to be better, or better founded, than the decisions that would have been taken in the absence of that evidence. An assessment functions formatively to the extent that evidence about student achievement is elicited, interpreted, and used by teachers, learners, or their peers, to make decisions about future instruction that are likely to be better, or better founded, than the decisions that would have been taken in the absence of that evidence. An assessment functions formatively to the extent that evidence about student achievement is elicited, interpreted, and used by teachers, learners, or their peers, to make decisions about future instruction that are likely to be better, or better founded, than the decisions that would have been taken in the absence of that evidence.

The domain-dependency issue

Both/and rather than either/or
Domain dependency A theoretical stance or an empirical question? Trade-offs Domain dependent Questions, feedback Domain independent Strategies, techniques Formative assessment is, trivially, both domain dependent and domain-independent Key question: How far can we take formative assessment as a domain-independent process?

The effectiveness issue
Problems with meta-analysis The “file drawer” problem Variations in intervention quality Selection of studies Variation in population variability Sensitivity of outcome measures

Annual growth in achievement, by age
A 50% increase in the rate of learning for six-year-olds is equivalent to an effect size of 0.76 A 50% increase in the rate of learning for 15-year-olds is equivalent to an effect size of 0.1 Bloom, Hill, Black, and Lipsey (2008)

Recent meta-analytic findings
Content area N 95% confidence interval for effect size Lower Mean Upper Mathematics 19 0.14 0.17 0.20 English Language Arts 4 0.30 0.32 0.34 Science 17 0.06 0.19 0.31 Total 40 Mean effect size ≈ 0.20 All but 2 of these effect sizes involved students over the age of 10 A big effect size Equivalent to a 50% to 70% increase in the rate of learning Kingston and Nash (2011, 2015)

The measurement issue Formative assessment as assessment
An assessment is a procedure for making inferences (Cronbach, 1971 p. 447) We give students things to do We observe their responses We collect evidence We make inferences The terms “formative” and “summative” are best thought of as descriptions of inferences

Data and evidence “Evidence is data related to a claim” (Wainer, 2011 p. 148) Some radical shifts From To Data Information From To Data-push Decision-pull From To Data-driven decision-making Decision-driven data collection

The professional development issue
If we treat formative assessment as: Domain specific Domain general Benefits of professional development are: mostly limited to those domains studied mostly applicable to all aspects of practice Professional development improves student achievement: a little a lot Professional development is mostly a matter of: knowledge acquisition habit change And doing it is: easy hard

Implementation issues
Articulation with other policy priorities Teacher evaluation frameworks (Marzano, Danielson) Differentiated instruction Response to (instruction and) intervention Policy environment Teacher pre-service education Commitment to continuous improvement District-level policies that require all teachers to improve Improvement focused on evidence-based practices Focus

Unpacking Formative Assessment
Where the learner is going Where the learner is now How to get the learner there Teacher Peer Student Clarifying, sharing, and understanding learning intentions and success criteria Eliciting evidence of learning Providing feedback that moves learners forward Activating students as learning resources for one another Activating students as owners of their own learning

The relationship of formative assessment to other policy priorities

Educational Endowment Foundation toolkit (1)
Intervention Cost Quality of evidence Extra months of learning Feedback $$ ★★★ +8 Metacognition and self-regulation ★★★★ Peer tutoring +6 Early years intervention $$$$$ One to one tuition $$$$ +5 Homework (secondary) $ Collaborative learning Phonics +4 Small group tuition $$$ Behaviour interventions ★★ Digital technology Social and emotional learning

Educational Endowment Foundation toolkit (2)
Intervention Cost Quality of evidence Extra months of learning Parental involvement $$$ ★★★ +3 Reducing class size $$$$$ Summer schools ★★ Sports participation +2 Arts participation $$ Extended school time Individualized instruction $ After school programmes $$$$ Learning styles Mentoring +1 Homework (primary)

Educational Endowment Foundation toolkit
Intervention Cost Quality of evidence Extra months of learning Teaching assistants $$$$ ★★ Performance pay $$ ★ Aspiration interventions $$$ Block scheduling $ School uniform Physical environment Ability grouping ★★★ -1

Unpacking Formative Assessment
Where the learner is going Where the learner is now How to get the learner there Teacher Peer Student Clarifying, sharing, and understanding learning intentions and success criteria Eliciting evidence of learning Providing feedback that moves learners forward Activating students as learning resources for one another Activating students as owners of their own learning

The system issue: Embedding formative assessment
Whole-school 2-year PD programme Focus on five strategies of formative assessment clarifying, sharing and understanding learning intentions eliciting evidence of achievement feedback that moves learning forward activating students as learning resources for one another activating students as owners of their own learning Detailed resource packs for groups of 8 to 14 teachers 18 monthly Teacher Learning Community (TLC) meetings Peer observations between meetings

A “signature pedagogy” for teacher learning
Every monthly TLC meeting follows the same structure Introduction (5 minutes) Starter activity (5 minutes) Feedback (25–50 minutes) New learning about formative assessment (20–40 minutes) Personal action planning (15 minutes) Review of learning (5 minutes)

Evaluation “Intention to treat” design Participants Outcome measure
Detect an effect size of 0.2 with 80% power Participants 140 schools recruited (70 treatment, 70 control) Excluding those with previous involvement in similar work 58 treatment, 66 control 22,709 students in year 10 (age 15+) in Sep 2015 Outcome measure “Attainment 8” Average score on externally set exams in 8 subjects Taken in May 2017 (i.e., 5/6 of the way through the school year)

English literature (Macbeth)
Read the following extract from Act 1 Scene 5 of Macbeth and then answer the question that follows. At this point in the play Lady Macbeth is speaking. She has just received the news that King Duncan will be spending the night at her castle.

Question (45 minutes) Starting with this speech, explain how far you think Shakespeare presents Lady Macbeth as a powerful woman. Write about: how Shakespeare presents Lady Macbeth in this speech how Shakespeare presents Lady Macbeth in the play as a whole Assessment and Qualifications Alliance (2014)

History Pearson (2015)

Impact on student achievement
One year’s learning for 15 year olds 0.3 sd Attrition of learning per year 0.1 Expected progress for control group students in two years: 0.3* *5/6 = 0.52 sd Effect size on exams (8 subjects) for those who had not previously participated in similar work 0.13 One year’s learning for 15 year olds 0.3 sd Attrition of learning per year 0.1 Expected progress for control group students in two years: 0.3* *5/6 = 0.52 sd Effect size on exams (8 subjects) for those who had not previously participated in similar work 0.13 Increase in rate of learning = 0.13/0.52 = 0.25 One year’s learning for 15 year olds 0.3 sd Attrition of learning per year 0.1 Expected progress for control group students in two years: 0.3* *5/6 = 0.52 sd One year’s learning for 15 year olds 0.3 sd Attrition of learning per year 0.1 One year’s learning for 15 year olds 0.3 sd Speckesser, Runge, Foliano, Bursnall, Hudson-Sharpe, Rolfe, and Anders (2018)

Cost-benefit analysis
Class size reduction (e.g., Tennessee STAR study) Additional cost: $5,000 per student per year Benefit: 12% more learning Embedded Formative Assessment Additional cost: $3 per student per year Benefit: 25% more learning

Unfinished business Links with Pedagogy (Black & Wiliam, 2018)
Instructional design Learning versus performance Cognitive load theory

The challenge: Making classroom formative assessment cohere with all the other kinds of assessment going on in a school

Before we can assess… The ‘backward design’ of an assessment system
Where do we want our students to get to? ‘Big ideas’ What are the ways they can get there? Learning progressions “Degree of difficulty” “Marks for style” Support model When should we check on/report progress? Inherent checkpoints “Troublesome knowledge” Useful checkpoints Key transitions

“All models are wrong, some are useful” (Box, 1976)
“That’s another thing we’ve learned from your Nation,” said Mein Herr, “map- making. But we’ve carried it much further than you. What do you consider the largest map that would be really useful?” “About six inches to the mile.” “Only six inches!” exclaimed Mein Herr. “We very soon got to six yards to the mile. Then we tried a hundred yards to the mile. And then came the grandest idea of all! We actually made a map of the country, on the scale of a mile to the mile!” “Have you used it much?” I enquired. “It has never been spread out, yet,” said Mein Herr: “the farmers objected: they said it would cover the whole country, and shut out the sunlight! So we now use the country itself, as its own map, and I assure you it does nearly as well.” (Carroll, 1893)

Mapping out the terrain
Timescale Academic promotion High-stakes accountability Annually End-of-course exams Quarterly Growth measures Benchmarks Monthly Common assessments End-of-unit tests Before the end- of-unit tests Weekly Graded work Daily Exit pass Hourly Hinge-point questions Instructional Guidance (“formative”) Describing Individuals (“summative”) Institutional Accountability (“evaluative”) Function

Principles for classroom assessment
Assessment outcomes should be recorded only when there is an over-riding need to do so Should be at the finest useful level Assessment records should focus on learning (not activity, not performance) support instruction be cumulative, but punctuated Should be accessible by students and carers Grain-size should depend on instructional utility Frequency should depend on errors of measurement rates of progress

Instruction-driven assessment
Development of science skills in eighth grade Use of laboratory equipment Metric unit conversion Density calculations Density applications Density as a characteristic property Phases of matter Gas laws Communication (graphing) Communication (lab reports) Inquiry skills Clymer and Wiliam (2006/2007)

Assessment matrix Equipment Metric units Density calculations
Density properties Phases of matter Gas laws Communication (graph) Communication (report) Homework 1 ✓ Homework 2 Laboratory 1 Homework 3 Module test Laboratory 2 Homework 4 Final exam

Evidence accumulation
Assessment as a process of evidentiary reasoning The mean is not always (in fact hardly ever) the answer Trade offs between recent and older information achievement and aptitude precision and accuracy quality and quantity representativeness and consequences Reported outcomes should be re-interpretations—not aggregates—of assessment outcomes

Meanings and consequences of assessment
Evidential basis What does the assessment outcome mean? Consequential basis What does the assessment process do?

Meanings and consequences of school grades
Two rationales for grading (Stiggins, 1991) Evidential (What does the assessment outcome mean?) Assessment as evidentiary reasoning Assessment outcomes as supports for inferences Consequential (What does the assessment process do?) Assessment outcomes as rewards and punishments Assessments create incentives for students to do what we want them to do These two rationales interact, and conflict achievement grades for completion of homework achievement grades for effort penalties for late submission zeroes for missing work

Summary Classroom formative assessment Classroom summative assessment
independent of curriculum, psychology, and pedagogy readily incorporated by teachers into existing practice implementable at scale, with minimal support a highly (the most?) cost-effective way to improve achievement Classroom summative assessment support for inferences about student achievement social consequences Productive synergies are possible

Formative and summative classroom assessment: Where we are and where we might go Dylan Wiliam, UCL These days, most schools have large amounts of data—on.

Similar presentations

Presentation on theme: "Formative and summative classroom assessment: Where we are and where we might go Dylan Wiliam, UCL These days, most schools have large amounts of data—on."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Formative and summative classroom assessment: Where we are and where we might go Dylan Wiliam, UCL These days, most schools have large amounts of data—on.

Similar presentations

Presentation on theme: "Formative and summative classroom assessment: Where we are and where we might go Dylan Wiliam, UCL These days, most schools have large amounts of data—on."— Presentation transcript:

Similar presentations

About project

Feedback