Teacher Evaluation and Performance Measurement Doug Staiger, Dartmouth College.

Slides:



Advertisements
Similar presentations
Symantec 2010 Windows 7 Migration EMEA Results. Methodology Applied Research performed survey 1,360 enterprises worldwide SMBs and enterprises Cross-industry.
Advertisements

Symantec 2010 Windows 7 Migration Global Results.
1 A B C
Mississippi Statewide Teacher Appraisal Rubric (M-STAR)
AP STUDY SESSION 2.
1
Solving the Faculty Shortage in Allied Health 9 th Congress of Health Professions Educators 4 June 2002 Ronald H. Winters, Ph.D. Dean College of Health.
David Burdett May 11, 2004 Package Binding for WS CDL.
California Preschool Learning Foundations
Teaching and Learning Science & Assessment Informational Webinars Presenter: Linda Cabe Smith, Science Assessment Specialist Ellen Ebert, Science Director,Teaching.
1 DPAS II Process and Procedures for Teachers Developed by: Delaware Department of Education.
1 What Is The Next Step? - A review of the alignment results Liru Zhang, Katia Forêt & Darlene Bolig Delaware Department of Education 2004 CCSSO Large-Scale.
Prepared by: Workforce Enterprise Services For: The Illinois Department of Commerce and Economic Opportunity Bureau of Workforce Development ENTRY OF EMPLOYER.
Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.
Overview for CTE Educators CTE Accountability, Budget and Grants Management: Data Reporting July 15-17, 2013 Murfreesboro, TN Susan Cowden: Director of.
CALENDAR.
Southern Regional Education Board 1 Preparing Students for Success in High School.
1 Career Pathways for All Students PreK-14 2 Compiled by Sue Updegraff Keystone AEA Information from –Iowa Career Pathways –Iowa School-to-Work –Iowa.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
The 5S numbers game..
Guide to Compass Evaluations and
Performance Appraisal Systems
Teacher Keys Effectiveness System
Chapter 7: Steady-State Errors 1 ©2000, John Wiley & Sons, Inc. Nise/Control Systems Engineering, 3/e Chapter 7 Steady-State Errors.
1 SESSION 5- RECORDING AND REPORTING IN GRADES R-12 Computer Applications Technology Information Technology.
Break Time Remaining 10:00.
Pennsylvania Value-Added Assessment System (PVAAS) High Growth, High Achieving Schools: Is It Possible? Fall, 2011 PVAAS Webinar.
PP Test Review Sections 6-1 to 6-6
NYC DOE – Office of Teacher Effectiveness A
TESOL International Convention Presentation- ESL Instruction: Developing Your Skills to Become a Master Conductor by Beth Clifton Crumpler by.
Middle School 8 period day. Rationale Low performing academic scores on Texas Assessment of Knowledge and Skills (TAKS) - specifically in mathematics.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Adding Up In Chunks.
DPAS II Jessica Baker & Cheryl Cresci MED 7701 Dr. Joseph Massare.
SLP – Endless Possibilities What can SLP do for your school? Everything you need to know about SLP – past, present and future.
Understanding the Basics
DSS Decision Support System Tutorial: An Instructional Tool for Using the DSS.
Center on Knowledge Translation for Disability and Rehabilitation Research Information Retrieval for International Disability and Rehabilitation Research.
New Jersey School Districts Teachscape Reflect. Leona Jamison Teachscape Service Provider.
Alaska Staff Development Network – 2013 Spring Leadership Retreat Emerging Trends and issues in Teacher Evaluation: Implications for Alaska Deep Dive Break-Out.
: 3 00.
5 minutes.
Why were PERA and SB7 passed? What will be the consequences? Dr. Richard Voltz, Associate Director Illinois Association of School Administrators.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Speak Up for Safety Dr. Susan Strauss Harassment & Bullying Consultant November 9, 2012.
1 Phase III: Planning Action Developing Improvement Plans.
Converting a Fraction to %
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Clock will move after 1 minute
By the end of this session we will have an understanding of the following:  A model for teacher evaluation based on current research  The FEAPs as a.
Physics for Scientists & Engineers, 3rd Edition
Select a time to count down from the clock above
1.step PMIT start + initial project data input Concept Concept.
Teacher Evaluation: Lessons Learned Teaneck Public Schools Dr. Marisa M. King Dr. Deirdre Spollen-LaRaia.
Teacher Evaluation New Teacher Orientation August 15, 2013.
The Dynamic Trio of Effective Teaching Measures: Classroom observations, student surveys and achievement gains Thomas Kane Harvard Graduate School of Education.
Teacher Practice in  In 2012, the New Jersey Legislature unanimously passed the TEACHNJ Act, which mandates implementation of a new teacher.
What Does Research Tell Us About Identifying Effective Teachers? Jonah Rockoff Columbia Business School Nonprofit Leadership Forum, May 2010.
The Framework for Teaching Charlotte Danielson 4c: Communicating with Families 1 6/12/201 3.
Professional Learning
Briefing: NYU Education Policy Breakfast on Teacher Quality November 4, 2011 Dennis M. Walcott Chancellor NYC Department of Education.
Teacher Evaluation Model
Recognizing Effective Teaching Thomas J. Kane Professor of Education and Economics Harvard Graduate School of Education.
THE DANIELSON FRAMEWORK. LEARNING TARGET I will be be able to identify to others the value of the classroom teacher, the Domains of the Danielson framework.
Final Reports from the Measures of Effective Teaching Project Tom Kane Harvard University Steve Cantrell, Bill & Melinda Gates Foundation.
Gathering Feedback for Teaching Combining High-Quality Observations with Student Surveys and Achievement Gains.
When Teachers Choose: Fairness and Authenticity in Teacher-Initiated Classroom Observations American Educational Research Association, Annual Meeting.
Presentation transcript:

Teacher Evaluation and Performance Measurement Doug Staiger, Dartmouth College

Not this. 2 Weisberg, D., Sexton, S., Mulhern, J. & Keeling, D. (2009) The Widget Effect: Our National Failure to Acknowledge and Act on Differences in Teacher Effectiveness. New York: The New Teacher Project. Satisfactory (or equivalent) Unsatisfactory (or equivalent)

Not this. 3

Transformative Feedback 4

Recent Work on Teacher Evaluation  Efforts to identify effective teaching using achievement gains –Work with Tom Kane & others in LAUSD, NYC, Charlotte…  Efforts to better identify effective teaching –Measures of Effective Teaching (MET) Project (Bill & Melinda Gates Foundation) –National Center for Teacher Effectiveness (NCTE) (US Department of Education) 5

The Measures of Effective Teaching Project Two school years: and Grades 4-8: ELA and Math High School: ELA I, Algebra I and Biology Participating Teachers

The MET data is unique …  in the variety of indicators tested, 5 instruments for classroom observations (use FFT here) Student surveys (Tripod Survey) Value-added on state tests  in its scale, 3,000 teachers 22,500 observation scores (7,500 lesson videos x 3 scores) trained observers 44,500 students completing surveys and supplemental assessments in year 1 3,120 additional observations by principals/peer observers in Hillsborough County, FL  and in the variety of student outcomes studied. Gains on state math and ELA tests Gains on supplemental tests (BAM & SAT9 OE) Student-reported outcomes (effort and enjoyment in class, grit) 7

What is “Effective” Teaching?  Can be an inputs based concept –Observable actions or characteristics  Can be outcomes based concept –Measured by student success  Ultimately, care about impact on student outcomes –Current focus on standardized exams –Interest in other outcomes (college, non-cognitive) 8

Multiple Measures of Teaching Effectiveness 9

10 Measure #1 Student Achievement Gains (“Value Added”)

11 Basics of Value Added Analysis  Teacher value added compares actual student achievement at the end of the year to an expectation for each student  Difference between actual and expected achievement, averaged over all of teacher’s students  Expected achievement is typical achievement for other students who looked similar at start of year –Same prior-year test scores –Same demographics, program participation –Same characteristics of peers in classroom or school  Various flavors, all work similarly –Student growth percentiles –Average change in score or percentile –Based on prior year test or Fall pre-test

There are Large Differences in Teacher Effects on Student Achievement Gains  Most evidence from “value added” analysis, but similar findings from randomized experiments  Huge literature about “teacher effects” on achievement –Large persistent variation across teachers –Difficult to predict at hire –Partially predictable after hire –Improve only in the first few years of teaching –Not related to most determinants of pay Certification, degrees, experience beyond first few years

Large Variation in Value Added of LAUSD Teachers is Not Related to Teacher Certification

Variation in Value Added of LAUSD Teachers is Related to Prior Performance

Why Not Just Hire Good Teachers?  Wise selection is the best means of improving the school system, and the greatest lack of economy exists wherever teachers have been poorly chosen. Frank Pierrepont Graves, NYS Commissioner, 1932  Unfortunately, easier said than done –Decades of work on type of certification, graduate education, exam scores, GPA, college selectivity, TFA –(Very) small, positive effects on student outcomes

Large Variation in Value Added of NYC Teachers is Not Related to Recruitment Channel

Of Course, Teacher Impact on State Test Score is Not All We Care About  Depends on design & content of test  Test scores are proximate measures –But recent evidence suggests they capture long- run impact on student learning and other outcomes  Test scores are only one dimension of performance –Non-cognitive skills (grit, dependability, …)

Value Added is Controversial  “We need to find a way to measure classroom success and teacher effectiveness. Pretending that student outcomes are not part of the equation is like pretending that professional basketball has nothing to do with the score.” (Arne Duncan 2009)  “There is no way that any of this current data could actually, fairly, honestly or with any integrity be used to isolate the contributions of an individual teacher.” (Randi Weingarten 2008) 18

What we learned from MET: Value-added measures Identified teachers who caused students to learn more on state tests following random assignment. Same teacher’s also caused students to learn more on supplemental assessments and enjoy class more. Low year-to-year correlations in value-added (and other performance measures) understate year-to-career correlations. 19

20

21

22

23 Measure #2 Classroom Observations

Classroom Observation Using Digital Video 24

Access to Validation Engine: What you can expect from us: SEA/LEA chooses a rubric, trains raters The MET Project delivers sample videos SEA/LEA ratings used to -Predict value added -Gauge reliability Helping Districts Test Their Own New Classroom Observations 25

26 InstrumentDeveloperOriginInstructional Focus StructureScoring Framework for Teaching Charlotte Danielson Outgrowth of ETS’s PRAXIS III licensing exam Constructivism Intellectual Engagement 4 domains; 22 components MET uses 8 components* 4 Points Classroom Assessment Scoring System (CLASS) Robert Pianta, Univ. of Virginia Tool for research on early childhood development Teacher- student interactions 3 domains; 12 dimensions 7 Points Two Cross-Subject Observation Instruments *not: “flexibility & responsiveness” & “organization of physical space”

FFT competencies scored: 27 CLASSROOM ENVIRONMENT  Creating an environment of respect and rapport  Establishing a culture of learning  Managing classroom procedures  Managing Student Behavior INSTRUCTION  Communicating with Students  Using Questioning and Discussion Techniques  Engaging Students in Learning  Using Assessments in Instruction

28 InstrumentDeveloperOriginInstructional Focus StructureScoring Mathematical Quality of Instruction (MQI) Heather Hill, Harvard Outgrowth from written test of math teaching knowledge Math errors and imprecision 6 overall elements of instruction 3 Points UTEACH Observation Protocol (UTOP) Michael Marder, Univ. of Texas- Austin Teacher prep program for math & science majors Values different modes, from direct instruction to inquiry-based 4 sections; 22 subsections 5 Points Math Observation Instruments

29 InstrumentDeveloperOriginInstructional Focus StructureScoring Protocol for Language Arts Teaching Observations (PLATO) Pam Grossman Stanford Research on effective middle grade ELA instruction Modeling, explicit teaching of strategies, guided practice 13 elements 6 elements included in MET study 4 Points ELA Observation Instrument

What we learned from MET: Classroom observations: Observation scores were correlated with a teacher’s value- added ( ). Different instruments were highly correlated with each other (although subject-specific instruments were distinct from the general-pedagogical instruments). Reliability requires certified observers and more than one observer per teacher (because rater judgments differ). Principals rate their own teachers higher than other observers do, but their rankings are similar. When teachers select their own videos, scores are higher, but ranking remains the same. 30

31 Four Steps Four Steps to High-Quality Classroom Observations

Actual scores for 7500 lessons. Step 1: Define Expectations Framework for Teaching (Danielson) 32 Four Steps

Step 2: Ensure Accuracy of Observers 33 Four Steps

Step 3: Monitor Reliability 34 Four Steps

35 More than 1 observer One more lesson +.07 One more observer +.16

Step 4: Verify Alignment with Outcomes 36 Four Steps Teachers with Higher Observation Scores Had Students Who Learned More

37 Measure #3 What do students say?

38 Students Distinguish Between Teachers Percent of Students by Classroom Agreeing

39 Students Distinguish Between Teachers Percent of Students by Classroom Agreeing

40 Students Distinguish Between Teachers Percent of Students by Classroom Agreeing

41 Students Distinguish Between Teachers Percent of Students by Classroom Agreeing

42 Students Distinguish Between Teachers Percent of Students by Classroom Agreeing

What we learned from MET: Student surveys: Surveys are a low-cost way to cover untested grades and subjects. Student surveys are related to teacher value-added ( ). Student surveys are the most reliable measures we tested. 43

44 Multiple Measures The “Dynamic Trio”: Classroom observations, student feedback and student achievement gains.

Dynamic Trio 45 Three Criteria: Predictive power: Which measure could most accurately identify teachers likely to have large gains when working with another group of students? Reliability: Which measures were most stable from section to section or year to year for a given teacher? Potential for Diagnostic Insight: Which have the potential to help a teacher see areas of practice needing improvement? (We’ve not tested this yet.)

Dynamic Trio Measures have different strengths …and weaknesses 46

Dynamic Trio Combining Measures Improved Reliability as well as Predictive Power 47 Note: For the equally weighted combination, we assigned a weight of.33 to each of the three measures. The criterion weights were chosen to maximize ability to predict a teacher’s value-added with other students. The next MET report will explore different weighting schemes. Observation alone (FFT) Student survey alone VA alone Combined (Equal Weights) Combined (Criterion Weights) Difference in Math VA (Top 25% vs. Bottom 25%) Reliability Note: Table 16 of the research report. Reliability based on one course section, 2 observations. The Reliability and Predictive Power of Measures of Teaching:

What we learned from MET: Combining measures: The teachers identified as more effective caused students to learn more following random assignment. Combining value added with student surveys and classroom observations produces two benefits: Increased reliability Increased correlation with other outcomes such as value-added on supplemental assessments and happiness in class Weighting value-added below.33, though, lowered correlation with other outcomes and lowered reliability. 48

Can the measures be used for “high stakes”?  High-stakes decisions are being made now, with little or no data.  No information is perfect, but better information should lead to better decisions and fewer mistakes. 49 Scenario 1: Teacher You have been teaching biology for 10 years and want to improve your practice. What weaknesses should you focus on and how will you know if you're making progress? Scenario 2: Principal A probationary teacher in your school is approaching the end of their 2nd year. If you retain him/her, the teacher automatically earns tenure under the collective bargaining agreement. Should you grant tenure (or recruit a new novice teacher)? Scenario 3: Superintendent Your district is considering offering coaching opportunities/higher pay to a subset of your teachers. Should you (i) allocate those slots on the basis of seniority, (ii) ensure that only excellent instructors are coaches? How would you measure effectiveness fairly?

No information is perfect. 50 How do these compare to existing measures? But better information → better decisions Masters Degrees Years of Experience Classroom Observations Alone

Compared to What? Compared to MA Degrees and Years of Experience, the Combined Measure Identifies Larger Differences 51 … on state tests

Compared to What? …and on low stakes assessments 52

Compared to What? …as well as on student-reported outcomes. 53

The Value of Going Beyond Classroom Observation Observations Student Perceptions Observations Student Perceptions VA on state tests

55 Compared to Classroom Observations Alone, the Combined Measure Identifies Larger Differences (Math Value Added) Average math Value Added, Other Class Percentile Rank on FFT Rank using FFT onlyRank using FFT and Tripod Rank using FFT, Tripod, and Value Added

56 Improving Teaching What are Districts Doing?

Robust evaluation systems themselves improve teaching outcomes Source: Eric S. Taylor and John H. Tyler, “Can Teacher Evaluation Improve Teaching?” Education Next, Fall 2012

Teacher Effectiveness Continues to Improve in Better Environments Source: Matthew A. Kraft and John P. Papay, “Can Professional Environments in Schools Promote Teacher Development? Explaining Heterogeneity in Returns to Teaching Experience,” January 2013 (on NCTE website).

The Best Foot Forward Project 1.Teachers record their own lessons. Record ≥1 lesson every 2 weeks. Submit 5 lessons over course of the year. Viewed by principals, content experts. 2.Observers view and discuss videos with teachers. Observers trained to use video for feedback. Identify discreet, coachable changes. 3.Teachers can share videos with each other. 4.Students provide anonymous feedback. 59

Next Up: Dashboard for Tracking Teacher Evaluations and Benchmarking Performance 1.Distribution of Observation Scores: What are the differences in scores and are the differences between schools, districts, grades and subjects larger than might have occurred by chance? 2.Observations and Value-Added: What are the relationships among the different measures? Do they differ by district, school, grade level, subject? Are they weaker/stronger than we observed in MET? 3.Reliability: How does each measure vary from school to school and year to year? 60

Useful Resources Available at:  Student surveys: Tripod survey and “Asking Students about Teaching Practitioner Brief”  Roster Validation: Report by Battelle for Kids on ways to allow teachers to verify students in their class: “Identifying The Importance of Accurately Linking Instruction to Students to Determine Teacher Effectiveness”  Software for Certifying Observers using Pre-Scored Videos: Certification engine from Empirical Education Available at:  Classroom Observation: Links to FFT, CLASS, etc., and webinars with six organizations currently supporting classroom observations Additional examples of sites with useful resources:  TNTP:  Pearson: