Download presentation
Presentation is loading. Please wait.
Published byAudrey Wells Modified over 9 years ago
1
Assessment and Performance Standards How Good is Good Enough? March 4-6, 2008
2
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 1 William Lorié, Ph.D. Director, International R&D CTB/McGraw-Hill
3
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 2 Agenda Chapter I: So you want to be a decathlete Chapter II: You want me to jump how high? Chapter III: Favorable winds, sun in my face: A detour into human performance Chapter IV: Philosophies for setting the bar
4
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 3 Chapter I So you want to be a decathlete
5
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 4
6
5 How Good Is Good Enough?
7
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 6 2004 UK level “A” 2000 UK level “A”
8
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 7 The Standard is Different for the Generalist Needed 8000 for “A” Level Qualification for Decathlon in UK Olympic Team in 2004 At 800 points per event, I can “get by” with a high jump of 2 meters …Or less, if I am relatively strong in other events…
9
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 8 Chapter II You want me to jump how high?
10
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 9 Educational Tests Are More Like Decathlons Than High Jumps Student learning outcomes are varied and interlinked At every level of schooling, especially early on, we want students to do well in a number of broad learning outcomes, not just a few Students can be strong overall, weak overall, or strong in some areas and weak in others
11
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 10 Content and Performance Standards Who’s being Tested? High JumperDecathlete What’s on the Test? High Jump EventTen Different but Related Events What do they need to Pass? Jump 2.3 metersGet 8000 points (Try to high jump at least 2 meters)
12
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 11 So, How Good is Good Enough? A matter of judgment Takes into consideration that The test is a sample of tasks that all count toward the final score It is not essential to master any one given task Tasks are a sample from a broader domain that we care about – not everything that could have been tested, is tested. Traiacontacaioctacathlon anyone?
13
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 12 Chapter III Favorable winds, sun in my face: A detour into human performance
14
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 13 Olympic High Jump Athlete’s Performance Typical (Average) Recent Worst Personal Best World Record
15
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 14 Olympic High Jump Athlete’s Performance Typical (Average) Recent Worst Personal Best World Record
16
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 15 All Individual Human Performance has variation… Possible sources of variation High Jumper or Decathlete Student Taking a High School Exit Exam Systematic-Weather conditions -Altitude -Indoors or out -Gender -Time of year -Curriculum -Quality of Instruction Non- systematic -Sharpened focus -Loss of concentration -Muscle fatigue / failure -Lapses in judgment -Moments of insight -Mood
17
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 16 …and Error Is a Part of All Measurement After you’ve standardized your field conditions, and controlled everything you can think of, you still get variation in individual performance. In measurement, that variation is due entirely to non-systematic sources. Those sources are all lumped together and called Error. Error is a technical concept.
18
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 17 Where Is There Error in Educational Measurement? The average score of French 8 th graders on TIMSS My college entrance exam scores Diane Lotfi’s 5 th grade standardized achievement test scores The grades I gave my 9 th year students in physical science when I was a teacher Student grade-point averages Throughout your entire recorded academic career
19
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 18 Don’t Panic In the long run, the Errors average out to zero When it matters most, rigorous steps are taken to quantify and minimize Error
20
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 19 The Problem of Error and Performance Standards
21
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 20 Coach, can you give me another chance?
22
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 21 Chapter IV Philosophies for Setting the Bar
23
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 22 How Do We Set the Bar? Two Ways: Think of People or Think of Tasks
24
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 23 The People Approach, Roughly… I know my students well. I can make judgments about whether each has met the bar. “Have minimal competency in 4 th grade mathematics” “Merit a high school leaving certificate” “Are prepared for the next unit of instruction in Arabic” A standardized test is given, and the score that discriminates most highly between the two groups is chosen as the standard.
25
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 24 In Practical Terms, Most Standard Setting (or Level Setting) Follows the Task Approach
26
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 25 Some Select Task Approaches Angoff and modifications Ebel Nedelsky Jaeger-Mills Bookmark Body of Work Briefing Book Item-Descriptor Matching
27
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 26 What They Have in Common Consider items, tasks, or more specifically performances on tasks Rely on concept / abstraction of the minimally qualified individual Most have been generally accepted in the field Angoff is first invented and most widely used Bookmark is most popular in achievement testing All have been praised and criticized
28
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 27 Standard Setting is arguably the most controversial and most consequential of all the areas of educational measurement Why? Variation in results due to method, judges, language of performance standard The cut point sometimes has important consequences for students, teachers, schools, entire systems, reform efforts. That 8000 Can Alter Your Life Plan
29
Thank you. Questions?
30
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 29 Group Activity: Modified Angoff Standard Setting You have been convened by the Ministry of a GCC country to establish standards for “Proficiency” in 5 th grade mathematics. Step 1: Discuss the minimally proficient student His / her knowledge, skills, and abilities Step 2: Review / take a test of 20 mathematics items at the 5 th grade level Step 3: We will give you verbal instructions on how to make Angoff judgments on the items Step 4: You will make one round of judgments and we will provide feedback for you
31
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 30 Instructions for standard setting judges What is the probability that a minimally Proficient grade 5 student will get this item correct? In a group of 100 minimally Proficient grade 5 students, what percent would you expect to get this item correct? (Convince yourselves that these are equivalent statements.)
32
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 31 Types of Feedback Table Mean and Dispersion Group Mean and Dispersion Impact
33
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. 32 What would happen in the real thing? Multiple Rounds Calculation of Level Setting Error Review by Sponsoring Agency Final Decision Implementation Possible Future Review
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.