CAN INSTRUCTIONALLY INSENSITIVE ACCOUNTABILITY TESTS EVER EVALUATE EDUCATORS FAIRLY? W. James Popham University of California, Los Angeles Winter Conference.

Slides:



Advertisements
Similar presentations
Stephen C. Court Presented at
Advertisements

Assessment Adapted from text Effective Teaching Methods Research-Based Practices by Gary D. Borich and How to Differentiate Instruction in Mixed Ability.
Welcome to: The Power of Assessment in Guiding Student Learning Warm up: In groups of 3 or 4, please use the blank chart paper to brainstorm these questions:
Testing for Tomorrow Growth Model Testing Measuring student progress over time.
Understanding the ELA/Literacy Evidence Tables. The tables contain the Reading, Writing and Vocabulary Major claims and the evidences to be measured on.
Summative Assessment Kansas State Department of Education ASSESSMENT LITERACY PROJECT1.
Domain A A5 Creating or selecting evaluation strategies that are appropriate for the students and that are aligned with the goals of the lesson.
ASSESSMENT LITERACY PROJECT4 Student Growth Measures - SLOs.
Chapter Fifteen Understanding and Using Standardized Tests.
A Terse Self-Test about Testing
(IN)FORMATIVE ASSESSMENT August Are You… ASSESSMENT SAVVY? Skilled in gathering accurate information about students learning? Using it effectively.
Measures of Academic Progress (MAP) Curt Nath Director of Curriculum Ocean City School District.
Authentic Assessment Abdelmoneim A. Hassan. Welcome Authentic Assessment Qatar University Workshop.
FORMATIVE ASSESSMENT IN ACTION
Standards-based Instruction and Assessment Ohio State ABLE Director’s Meeting October 29, 2002 Presenter: Mahna Schwager, PhD WestEd.
INTRODUCTION TO ASSESSMENT DESIGN. INTRODUCTION & PURPOSE.
English 12 CR Edwina Howard-Jack, NBCT English Language Arts Coordinator, Office of Instruction, WVDE Dr. Vaughn Rhudy, WESTEST 2 Online Writing Coordinator,
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Text Complexity Common Core State Standards English Language Arts
Principles of Assessment
How Students’ Identities as Readers Shape Their Engagements with Texts Leigh A. Hall University of North Carolina, Chapel Hill
Classroom Assessment: Concepts and Applications Chapter 5: Summative Assessments.
Becoming a Teacher Ninth Edition
ASSESSMENT LITERACY: A BONA FIDE “MAGIC BULLET” FOR EDUCATION W. James Popham University of California, Los Angeles California Educational Research Association.
Welcome to i-Ready®.
Welcome Math Leaders Mac Scoring Training Year 17 …analyzing student thinking and improving instruction.
Classroom Assessment and Grading
Text Complexity and the Common Core Standards for English Language Arts and Literacy in History/Social Studies, Science, and Technical Subjects.
Text Complexity and the Common Core Standards. Building knowledge through content-rich nonfiction (text complexity) Reading, writing, and speaking grounded.
Classroom Assessment LTC 5 ITS REAL Project Vicki DeWittDeb Greaney Director Grant Coordinator.
Assessment Literacy Series 1 -Module 6- Quality Assurance & Form Reviews.
Clear Purpose: Assessment for and of Learning: A Balanced Assessment System “If we can do something with assessment information beyond using it to figure.
Classroom Assessments Checklists, Rating Scales, and Rubrics
Classroom Assessment A Practical Guide for Educators by Craig A
Writing Modified Achievement Level Descriptors Presented at OSEP Conference January 16, 2008 by Marianne Perie Center for Assessment.
What is it we expect students to learn? Curriculum learning tasks need to be clearly stated 1.What policies and practices, regarding curriculum development,
Performance and Portfolio Assessment. Performance Assessment An assessment in which the teacher observes and makes a judgement about a student’s demonstration.
Evelyn Wassel, Ed.D. Summer  Skilled in gathering accurate information about students learning?  Using it effectively to promote further learning?
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Text Complexity for English Language Arts and Literacy in History/Social Studies, Science, and Technical Subjects.
Common Formative Assessments for Science Monica Burgio Daigler, Erie 1 BOCES.
4-Day Agenda and Expectations Day 2 Building Formative Assessments linked to deconstructed content and skills Building Formative Assessments linked to.
Lecture by: Chris Ross Chapter 7: Teacher-Designed Strategies.
Assessment and Testing
TRANSFORMATIVE ASSESSMENT IN ACTION TRANSFORMATIVE ASSESSMENT IN ACTION W. James Popham University of California, Los Angeles 2011 Teacher Leader Institute.
McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:
Grand Island K-8 SCIENCE Common Formative Assessments for Science Monica Burgio Daigler, Erie 1 BOCES.
1 Overview of Class #2 Today’s goals Comments on syllabus and assignments Mathematics education in the U.S. and becoming a teacher of mathematics Introduction.
Essay Questions. Two Main Purposes for essay questions 1. to assess students' understanding of and ability to think with subject matter content. 2. to.
21 st Century Learning and Instruction Session 2: Balanced Assessment.
Learning AP ILD November 8, 2012 Planning and Monitoring for Learning.
INTRODUCTION TO ASSESSMENT METHODS USED IN MEDICAL EDUCATION AND THEIR RATIONALE.
Springfield Public Schools SEEDS: Collecting Evidence for Educators Winter 2013.
 Rubrics promote student performance and encourage reflective teaching practices (Beeth et al.,2001; Luft, 1998)  Consistent rubric use in the classroom.
 Good for:  Knowledge level content  Evaluating student understanding of popular misconceptions  Concepts with two logical responses.
No Excuses University ASSESSMENT. In Chapter 8, Lopez sends a resounding message: “Assessment is not about you as a teacher; it is about your students”
Chapter 1 Assessment in Elementary and Secondary Classrooms
Classroom Assessments Checklists, Rating Scales, and Rubrics
Nuts and Bolts of Assessment
Developing questioning
Classroom Assessment A Practical Guide for Educators by Craig A
Chapter 6: Checklists, Rating Scales & Rubrics
Quarterly Meeting Focus
Classroom Assessments Checklists, Rating Scales, and Rubrics
Chapter 9: Portfolio Assessment
Welcome to i-Ready®.
Goals for Tonight Present the new Georgia Milestones Assessment System
Goals for Tonight Present the new Georgia Milestones Assessment System
Writing-to-Learn vs. Writing-to-Demonstrate Learning
Presentation transcript:

CAN INSTRUCTIONALLY INSENSITIVE ACCOUNTABILITY TESTS EVER EVALUATE EDUCATORS FAIRLY? W. James Popham University of California, Los Angeles Winter Conference Washington Educational Research Association and Office of Superintendent of Public Instruction, Seattle December 4,2008

In nations where students’ scores on “accountability” tests play a pivotal role in the evaluation of schools, it is assumed that students’ performances on such tests accurately reflect instructional quality. In nations where students’ scores on “accountability” tests play a pivotal role in the evaluation of schools, it is assumed that students’ performances on such tests accurately reflect instructional quality.

But what if students’ scores on educational accountability tests did not accurately reflect instructional quality? But what if students’ scores on educational accountability tests did not accurately reflect instructional quality?

A DEFINITION OF INSTRUCTIONAL SENSITIVITY A DEFINITION OF INSTRUCTIONAL SENSITIVITY The degree to which students’ performances on a test accurately reflect the quality of instruction specifically provided to promote students’ mastery of what is being assessed. The degree to which students’ performances on a test accurately reflect the quality of instruction specifically provided to promote students’ mastery of what is being assessed.

Completely Insensitive Totally Sensitive A Continuum of Instructional Sensitivity Accountability tests, such as numerous assessments used in the U.S., differ in their ability to detect instructional quality.

WHY MIGHT A TEST ITEM BE INSTRUCTIONALLY INSENSITIVE? Alignment Leniency Alignment Leniency Excessive Easiness Excessive Easiness Excessive Difficulty Excessive Difficulty Confusion-Engendering Item Flaws Confusion-Engendering Item Flaws Socioeconomic Status (SES) Links Socioeconomic Status (SES) Links Academic Aptitude Links Academic Aptitude Links

ALIGNMENT LENIENCY Many items on accountability tests, when judged as to their alignment with the curricular aims they are supposed to be measuring, will be regarded as aligned with those aims (skills and/or knowledge) even if the items are only tangentially related to the curricular aim being assessed. Many items on accountability tests, when judged as to their alignment with the curricular aims they are supposed to be measuring, will be regarded as aligned with those aims (skills and/or knowledge) even if the items are only tangentially related to the curricular aim being assessed.

An Example of Lenient Alignment Item 23 Using the bus schedule on the adjacent page, if your purpose was to determine the shortest time to reach Boston from Denver on a Monday, on which bus should you begin your journey? A. Bus 214 B. Bus 197 C. Bus 110 D. Bus 202

Was the item aligned? If the curricular aim had been for students to be able to use appropriate functional texts such as train or bus schedules. If the curricular aim had been for students to be able to use appropriate functional texts such as train or bus schedules. If the curricular aim had been for students to be able to determine whether given functional texts would fulfill their purpose for using such texts. If the curricular aim had been for students to be able to determine whether given functional texts would fulfill their purpose for using such texts.

EXCESSIVE EASINESS If an item is so easy that even completely untaught students would answer it correctly, then the item can’t distinguish between well taught and poorly taught students. If an item is so easy that even completely untaught students would answer it correctly, then the item can’t distinguish between well taught and poorly taught students. E.g., How many letters are there in the word seven? E.g., How many letters are there in the word seven?

EXCESSIVE DIFFICULTY If an item is so difficult that even marvelously instructed students might not answer it correctly, then the item can’t distinguish between well taught and poorly taught students. If an item is so difficult that even marvelously instructed students might not answer it correctly, then the item can’t distinguish between well taught and poorly taught students. E.g., Without using your computer, what is the square root of 1,522,756? E.g., Without using your computer, what is the square root of 1,522,756?

ITEM FLAWS Items embodying serious deficits (e.g., ambiguities, garbled syntax, more than one correct answer, or no correct answer) will prevent well taught students from answering the item correctly, hence make it impossible for the item to accurately distinguish between effectively and ineffectively taught students. Items embodying serious deficits (e.g., ambiguities, garbled syntax, more than one correct answer, or no correct answer) will prevent well taught students from answering the item correctly, hence make it impossible for the item to accurately distinguish between effectively and ineffectively taught students.

SOCIOECONOMIC STATUS (SES) LINKS If an item gives a meaningful advantage to students from higher SES families, then the item will tend to measure what students bring to school rather than how well they are taught once they get there. If an item gives a meaningful advantage to students from higher SES families, then the item will tend to measure what students bring to school rather than how well they are taught once they get there.

A plant’s fruit always contains seeds. Which of the items below is not a fruit? A plant’s fruit always contains seeds. Which of the items below is not a fruit? A.orange B.pumpkin C.apple D.celery A 6th-Grade Science Item:

In which of the sentences below does the word field mean the same thing as in the sentence above? In which of the sentences below does the word field mean the same thing as in the sentence above? A.The shortstop knew how to field his position. B.We prepared the field by plowing it. C.What field do you plan to enter when you graduate? D.The nurse examined my field of vision. A 4th-Grade Reading Item: My father’s field is computer graphics.

ACADEMIC APTITUDE LINKS If an item gives a meaningful advantage to students who possess greater inherited quantitative, verbal, or spatial aptitudes, then the item will tend to measure what students bring to school rather than how well they are taught once they get there. If an item gives a meaningful advantage to students who possess greater inherited quantitative, verbal, or spatial aptitudes, then the item will tend to measure what students bring to school rather than how well they are taught once they get there.

If someone really wants to conserve resources, one good way to do so is to: If someone really wants to conserve resources, one good way to do so is to: A.leave lights on even if they are not needed. B.wash small loads instead of large loads in a clothes-washing machine. C.write on both sides of a piece of paper. D.place used newspapers in the garbage. A 6th-Grade Social Studies Item:

The secret number is inside the circle. It is also inside the square. It is NOT inside the triangle. Which of these is the secret number? The secret number is inside the circle. It is also inside the square. It is NOT inside the triangle. Which of these is the secret number? A. 2B. 3 C. 5 D. 7 A 3rd-Grade Mathematics Item:

A 4 th -Grade Mathematics Item: Which of the letters below, when folded in half, will have two parts that match exactly? Which of the letters below, when folded in half, will have two parts that match exactly? F(A)Z(B) S(C)B(D)

WHY MIGHT A TEST ITEM BE INSTRUCTIONALLY INSENSITIVE? Alignment Leniency Alignment Leniency Excessive Easiness Excessive Easiness Excessive Difficulty Excessive Difficulty Confusion-Engendering Item Flaws Confusion-Engendering Item Flaws Socioeconomic Status (SES) Links Socioeconomic Status (SES) Links Academic Aptitude Links Academic Aptitude Links

A LESSON TO BE LEARNED: When the measurement community became convinced that assessment bias in our high-stakes tests was threatening validity, we set out to (1) detect assessment bias and (2) reduce it. We were successful. We can be equally successful in coping with instructional insensitivity. When the measurement community became convinced that assessment bias in our high-stakes tests was threatening validity, we set out to (1) detect assessment bias and (2) reduce it. We were successful. We can be equally successful in coping with instructional insensitivity.

TWO STRATEGIES FOR DETERMINING INSTRUCTIONAL SENSITIVITY A Judgmental Strategy whereby seasoned, well trained educators supply item-by-item ratings using a rigorous item-evaluation rubric A Judgmental Strategy whereby seasoned, well trained educators supply item-by-item ratings using a rigorous item-evaluation rubric An Empirical Strategy contrasting per-item performances of (1) taught versus untaught students or (2) effectively taught versus ineffectively taught students An Empirical Strategy contrasting per-item performances of (1) taught versus untaught students or (2) effectively taught versus ineffectively taught students

JUDGMENTAL DETERMINATION OF AN ITEM’S INSTRUCTIONAL SENSITIVITY An Illustrative Review Question: An Illustrative Review Question: “If a teacher has provided reasonably effective instruction related to the objective measured by this item, is it likely a substantial majority of the teacher’s students will respond correctly to the item?” “If a teacher has provided reasonably effective instruction related to the objective measured by this item, is it likely a substantial majority of the teacher’s students will respond correctly to the item?”

EMPIRICAL DETERMINATION OF AN ITEM’S INSTRUCTIONAL SENSITIVITY Contrasting per-item performances of taught versus untaught students Contrasting per-item performances of taught versus untaught students Contrasting per-item performances of effectively taught versus ineffectively taught students Contrasting per-item performances of effectively taught versus ineffectively taught students

INSTRUCTIONAL INSENSITIVITY: UNFAIR AND HARMFUL When we allow educators’ quality to be determined on the basis of accountability tests incapable of performing that task, we are being profoundly unfair to those educators. When we allow educators’ quality to be determined on the basis of accountability tests incapable of performing that task, we are being profoundly unfair to those educators. Far worse, some wrongly evaluated, desperation-driven educators will engage in classroom practices that are educationally harmful to children. Far worse, some wrongly evaluated, desperation-driven educators will engage in classroom practices that are educationally harmful to children.

Presenter’s address: Reactions or suggestions regarding this topic will be welcomed.