Mark D. Reckase Michigan State University The Evaluation of Teachers and Schools Using the Educator Response Function (ERF)

Slides:

Advertisements

Similar presentations

Stephen C. Court Presented at

Advertisements

Statistics for Improving the Efficiency of Public Administration Daniel Peña Universidad Carlos III Madrid, Spain NTTS 2009 Brussels.

Continued Psy 524 Ainsworth

Pennsylvania’s Continuous Improvement Process. Understanding AYP How much do you know about AYP?

Standardized Tests What They Measure How They Measure.

Student Learning Targets (SLT) You Can Do This! Getting Ready for the School Year.

Florida Department of Education Value-added Model (VAM) FY2012 Using Student Growth in Teacher and School Based Administrator Evaluations.

Explaining Race Differences in Student Behavior: The Relative Contribution of Student, Peer, and School Characteristics Clara G. Muschkin* and Audrey N.

24 July 2014 PISA 2012 Financial Literacy results – New Zealand in an international context.

Agenda for January 27 th Administrative Items/Announcements Attendance Handout: presentation signup Pictures today! Finish this week’s topic: Research.

Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 10 th Edition.

Impact Evaluation: The case of Bogotá’s concession schools Felipe Barrera-Osorio World Bank 1 October 2010.

Estimating Growth when Content Specifications Change: A Multidimensional IRT Approach Mark D. Reckase Tianli Li Michigan State University.

Treatment Effects: What works for Whom? Spyros Konstantopoulos Michigan State University.

Catherine Cross Maple, Ph.D. Deputy Secretary Learning and Accountability

Standardized Test Scores Common Representations for Parents and Students.

Advancing Assessment Literacy Data Informed Decision Making IV: Monitoring and Assessing Progress.

Diverse Populations in Small Rural Schools Presented by: Amy Trujillo-Conway Amy Trujillo-Conway Madalena Barboa-Archuleta.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Confidence Interval Estimation Statistics for Managers.

-- Preliminary, Do Not Quote Without Permission -- VALUE-ADDED MODELS AND THE MEASUREMENT OF TEACHER QUALITY Douglas HarrisTim R. Sass Dept. of Ed. LeadershipDept.

Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010.

CAHSEE California High School Exit Exam. OVERVIEW Purpose of the CAHSEE Purpose of the CAHSEE Background Background Contents of the CAHSEE Contents of.

Office of Institutional Research, Planning and Assessment January 24, 2011 UNDERSTANDING THE DIAGNOSTIC GUIDE.

Addressing Student Growth In Performance Evaluation For Teachers and Administrators Patricia Reeves.

AFT 7/12/04 Marywood University Using Data for Decision Support and Planning.

DRE Agenda Student Learning Growth – Teacher VAM – School Growth PYG Area Scorecards. PYG, and other Performance Indicators.

Meryle Weinstein, Emilyn Ruble Whitesell and Amy Ellen Schwartz New York University Improving Education through Accountability and Evaluation: Lessons.

Understanding Statistics

1 New York State Growth Model for Educator Evaluation 2011–12 July 2012 PRESENTATION as of 7/9/12.

Florida Department of Education Value-added Model (VAM) FY2012 Using Student Growth in Teacher and School Based Administrator Evaluations.

+ Chapter 12: Inference for Regression Inference for Linear Regression.

Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.

November 2006 Copyright © 2006 Mississippi Department of Education 1 Where are We? Where do we want to be?

Uses of Statistics: 1)Descriptive : To describe or summarize a collection of data points The data set in hand = the population of interest 2)Inferential.

Chapter 6 USING PROBABILITY TO MAKE DECISIONS ABOUT DATA.

Julian Betts, Department of Economics, UCSD and NBER.

DRE FLDOE “Value-Added Model” School District of Palm Beach County Performance Accountability.

Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry

Two Main Uses of Statistics: 1)Descriptive : To describe or summarize a collection of data points The data set in hand = the population of interest 2)Inferential.

1 New York State Growth Model for Educator Evaluation 2011–12 July 2012 PRESENTATION as of 7/9/12.

NH Commissioner’s Task Force Meeting August 10, 2010 NH DOE 1 Commissioner's Force Meeting: August 10, 2010.

CREP Center for Research in Educational Policy SES Student Achievement Methods/Results: Multiple Years and States Steven M. Ross Allison Potter The University.

Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.

Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.

Latent regression models. Where does the probability come from? Why isn’t the model deterministic. Each item tests something unique – We are interested.

1 Getting Up to Speed on Value-Added - An Accountability Perspective Presentation by the Ohio Department of Education.

Standardized Testing. Basic Terminology Evaluation: a judgment Measurement: a number Assessment: procedure to gather information.

The Design of Statistical Specifications for a Test Mark D. Reckase Michigan State University.

2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)

October 24, 2012 Jonathan Wiens, PhD Accountability and Reporting Oregon Department of Education.

C R E S S T / CU University of Colorado at Boulder National Center for Research on Evaluation, Standards, and Student Testing Measuring Adequate Yearly.

Standardized Testing EDUC 307. Standardized test a test in which all the questions, format, instructions, scoring, and reporting of scores are the same.

Florida Department of Education’s Florida Department of Education’s Teacher Evaluation System Student Learning Growth.

Evaluation Results MRI’s Evaluation Activities: Surveys Teacher Beliefs and Practices (pre/post) Annual Participant Questionnaire Data Collection.

Lesson 2 Main Test Theories: The Classical Test Theory (CTT)

Making the most of Assessment Data in the Secondary Years Dr Robert Clark.

BY MADELINE GELMETTI INCLUDING STUDENTS WITH DISABILITIES AND ENGLISH LEARNERS IN MEASURES OF EDUCATOR EFFECTIVENESS.

Evaluating Outcomes of the English for Speakers of Other Languages (ESOL) Program in Secondary Schools: Methodological Advance and Strategy to Incorporate.

Research Questions  What is the nature of the distribution of assignment quality dimensions of rigor, knowledge construction, and relevance in Math and.

Classical Test Theory Psych DeShon. Big Picture To make good decisions, you must know how much error is in the data upon which the decisions are.

 Mark D. Reckase.  Student achievement is a result of the interaction of the student and the educational environment including each teacher.  Teachers.

1 New York State Growth Model for Educator Evaluation June 2012 PRESENTATION as of 6/14/12.

Analysis of AP Exam Scores

Examining Achievement Gaps

Impact Analyses for VAM Scores

Validating Student Growth During an Assessment Transition

Understanding and Using Standardized Tests

CAASPP Results 2015 to 2016 Santa Clara Assessment and Accountability Network May 26, 2017 Eric E, Zilbert Administrator, Psychometrics, Evaluation.

Release of Preliminary Value-Added Data Webinar

Using the RUMM2030 outputs as feedback on learner performance in Communication in English for Adult learners Nthabeleng Lepota 13th SAAEA Conference.

Presentation transcript:

Mark D. Reckase Michigan State University The Evaluation of Teachers and Schools Using the Educator Response Function (ERF)

Background Current educational policy is built around goal of helping students reach educational goals specified by the states. This goal is generally given a label related to performance on a test that is designed to match the educational goals. The label is often “Proficient”. School systems are often evaluated by computing the proportion of students who reach the Proficient level. Teachers are not usually evaluated using this criterion because students differ on the level of challenge they pose to reaching Proficient.

Background Most teacher evaluation procedures based on test scores do not use the concept of Proficiency. Instead, they attribute the difference in observed student performance and predicted student performance based on previous performance and other variables as the effect of the teacher. The results are usually presented in a norm referenced way showing which teachers are above average in the difference between observed and predicted performance. But, a teacher could be quite strong working with underachieving students and still have an average that is below the average for most teachers. Or, a teacher can have all students above Proficient but have low average difference between observed and predicted performance.

Background It would seem desirable that: The evaluations of teachers be related to the policy requirements of meeting the proficiency standard. The evaluations of teachers take into account the level of challenge posed by working with students with different characteristics. The amount of data required to do the analysis not be burdensome. The procedure proposed here is designed to meet these goals.

The Educator Response Function The educator response function is a mathematical model that relates the capabilities of teachers and the level of challenge of students to the probability that the combination of teacher and student will reach the Proficient level specified by the state department of education. This model assumes that “teaching capability” is a hypothetical construct and that teachers vary on this construct. The assumption is that teachers vary on “teaching capability” and there level on the construct is the Educator Performance Level (EPL). The goal of a teacher evaluation is to determine the location of the teacher on the teaching capability construct yielding a value for the EPL.

Challenge Index A second component of the model is the amount of challenge posed by a student when working with them to reach the Proficient level. The amount of challenge is indicated by a point on the hypothetical construct. The quantification of the point is called the Challenge Index for the student. The location on the hypothetical construct is determine through the use of observable indicators such as (a) previous year’s achievement, (b) attendance record, (c) home language different than the language of instruction, (d) presence of disabilities, (e) low SES level, (f) educational level of parents, etc.

Estimating the Challenge Index Approach 1: Using the previous cohort of students, predict the performance in the target grade G from the indicator variables. The predicted level of performance centered around 0 and then multiplied by -1 to reverse the scale. High values mean high challenge and low values are low challenge. Determine the point on the predicted scale that is equivalent to the Proficient standard. Set that to a fixed value such as 100. Students above 100 are predicted to not meet the Proficient standard. Approach 2: Calibrate the indicator variables as items using an IRT model and estimate a value on the IRT scale for each student in the current cohort.

Educator Performance Level The conceptual framework for the evaluation of teachers is to evaluate them relative to the CI for students that they can help to be proficient. A teacher that can help high CI students be proficient is very good. If a teacher can not help low CI students reach proficiency, they are not very good. Students are considered as test items and the CI value is the difficulty index for a student. The EPL for a teacher is determined from CI levels for students that reach proficiency.

Estimating the EPL Students receive a code of 1 or 0 depending on whether they are proficient or not. These are considered as scores for the students as test items. The relationship between EPL and student performance is assumed to follow a two-parameter logistic model in the form of a person characteristic curve.

Estimating the EPL The students assigned to a teacher make up the items on a test. The proficiency levels are the scores on the items (students). Using IRT technology, the EPL for a teacher is estimated as the maximum likelihood estimate of the pattern of student performance given the CI levels of the students. The information from the student proficiency levels can be used to get the standard error of the estimate of the teacher’s location on the EPL construct. Note that the EPL is computed on the CI-scale.

Example: Teacher with 44 Students CI distribution for students assigned to the teacher. Note that most of them are below 100. This is not a very challenging group of students.

Example: Teacher with 44 Students Proficiency levels of students as a function of CI. Most of those with a low CI are proficient.

Estimation of the EPL for the Teacher The two-parameter logistic model is fit to the data for the teacher. EPL = 100 This means that the probability of this teacher helping a student with CI = 100 reach proficiency is.5. Standard error is 3.9.

Example: Teacher with 42 Students Most of these students have CI values above 100. This is a more challenging teaching assignment than the first teacher.

Example: Teacher with 42 Students EPL estimate is 120 with a standard error of 6.1. The teacher has a higher EPL because students with high CI values were proficient. Error is larger because division is not as distinct.

An Empirical Demonstration 213 teachers of Grade 4 students were linked to student performance. The CI was developed using the regression procedure based on the previous year’s students. Reading: Y = *Read3 – 6.103*ED – 9.090*SWD – 3.830*ELL + e Where Read3 is the state test in Reading for Grade 3; ED is a 0/1 variable indicating Economic Disadvantage (free or reduced lunch): SWD is a 0/1 variable indicating Students with Disabilities; ELL is a 0/1 variable indicating English Language Learner; and e is the error term in the regression model.

An Empirical Demonstration Predicted test scores were rescaled to reverse the endpoints and to set the value at the Proficient cut-score to 100. Estimates were obtained from all of the teachers using maximum likelihood estimation.

Distribution of EPL Mean = 96.6 SD = 17.8 Extreme values at left were mostly teachers with only one student, but one teacher had 14 students with none proficient. At right is one teacher with 27 students, all of whom reached proficient.

Commentary Most teachers were in the middle of the distribution – it is highly peaked. The median standard error is about 3 so teachers that are more than 6 points apart are significantly different in EPL. Estimates are poor if there are not many students assigned to the teachers. To get a high EPL teachers need to help challenging students reach proficiency. This may have positive implications for the use of this procedure.

Implementation Issues This paper presents a new idea and some analyses to show proof of the concept. The critical part of the method is defining the challenge index. In practice, the variables defining the challenge index should be selected in collaboration with teachers and school administrators. The procedure only needs the data from the previous cohort of students. In principle, CI values can be determined for students assigned to all teachers, but a proficiency standard is needed for all subject matter areas.

Implementation Issues The method may have the positive benefit of encouraging teachers to work with challenging students. The CI estimation procedure should be updated each year as tests and student characteristics change. As always, more research is needed.