Measuring Principals’ Effectiveness: Results from New Jersey’s Principal Evaluation Study SREE Spring Conference March 2, 2017 Christine Ross • Mariesa Herrmann
Research partnership with New Jersey Department of Education NJ developed a principal evaluation system, but little research was available to guide its design NJ’s principal evaluation system includes measures of professional practice and student achievement growth Analogous measures as teacher evaluation systems Problem: little evidence on reliability and validity of these measures NJDOE requested an assessment of the new principal evaluation system Pilot year 2012/13 Statewide implementation 2013/14
Main questions for the study Variation in ratings. To what extent did ratings overall and on each of the component measures vary across principals? Stability of measures. How stable were school median student growth percentiles (SGPs) across years? Schools with the same principal and those that changed principals Smaller and larger schools Correlations with schoolwide student characteristics. What were the correlations between principals’ ratings and the student characteristics of the schools they led? Correlations among component measures. What were the correlations among component measure ratings?
Weights on principal evaluation components in overall ratings, by type of school, 2013-14 Types of schools
Percentage of Principals Nearly all principals received overall ratings of effective or highly effective Percentage of Principals Note: 1,656 principals
Few principals were rated highly effective on SGPs; Most were rated highly effective on principal goals and teacher SGOs Note: 1,183 principals
School median SGPs were stable across years, even for new principals Same Principals in 2011-12 and 2012-13 New Principals in 2012-13 (1,374 principals/schools) (352 principals in 356 schools) Correlation: 0.69 Correlation: 0.63
School median SGPs were less stable for smaller schools than for larger schools Change in School Median SGPs 2012/13-2013/14 (Percentile Points) 500 Number of Tested Students in Grades 4–8 Note: 1,267 principals Findings from 3 years
Principals leading schools with larger proportions of economically disadvantaged students tended to receive lower ratings than other principals Component Measure Correlation with Schoolwide Percentage of Economically Disadvantaged Students (Correlation Coefficient) English Learner Students (Correlation Coefficient) School Median SGP Rating –.33* –.05* Principal Practice Instrument Rating –.20* –.14* Evaluation Leadership Instrument Rating –.11* Principal Goals Rating –.15* –.10* Teachers’ Student Growth Objectives Rating –.29* Overall Rating –.24* –.12* * Statistically significant at p < .05, two-tailed test. Note: 1,450 to 1,781 principals
Component measures modestly correlated with each other Correlation with School Median SGP Rating Principal Practice Instrument Rating Evaluation Leadership Instrument Rating Principal Goals Rating Principal Practice Instrument Rating .16* – a Evaluation Leadership Instrument Rating .08* .61* Principal Goals Rating .10* .32* Teachers’ Student Growth Objectives Rating .27* .23* .25* *Statistically significant at p < .05, two-tailed test. – indicates correlation is for the same measure. aindicates correlation is shown in a different cell. Note: 1,183 to 1,752 principals
Additional research could further assess measures’ reliability and validity Topics for further research on principal evaluation: Implementation quality Inter-rater reliability of practice instruments, and training needed to attain high rates of inter-rater reliability Internal consistency of practice instruments Requires item-level data on principal practice instruments and evaluation leadership instrument Measures of principals’ contributions to student achievement growth Requires student achievement data and principal school assignment data over multiple years Confirming findings with additional year of ratings data
For More Information Christine Ross Mariesa Herrmann CRoss@mathematica-mpr.com Mariesa Herrmann MHerrmann@mathematica-mpr.com REL 2016-156: Herrmann, M., & Ross, C. (2016). Measuring principals’ effectiveness: Results from New Jersey’s first year of statewide principal evaluation REL 2015-089: Ross, C., Herrmann, M., & Angus, M. H. (2015). Measuring principals’ effectiveness: Results from New Jersey’s principal evaluation pilot
Additional slides
References Branch, G., Hanushek, E., & Rivkin, S. (2012). Estimating the effect of leaders on public sector productivity: The case of school principals. Working paper. Cambridge, MA: National Bureau of Economic Research. Chiang, H., Lipscomb, S., & Gill, B. (2016). Is school value-added indicative of principal quality? Journal of Education Finance and Policy. Coelli, M., & Green, D. (2012). Leadership effects: School principals and student outcomes. Economics of Education Review, 31(1), 92-109. Dhuey, E., & Smith, J. (2012). How important are school principals in the production of student achievement? Working paper. Toronto, ON: University of Toronto.
Most school median SGPs were transformed into a rating of “effective” Ineffective Partially effective Effective Highly effect- ive Transforms school median SGP into school median SGP rating Note: The number of principals with school median SGPs is 1,742.
Districts mainly selected commercially-available principal practice instruments Note: Four other instruments are the New Jersey LoTi Principal Evaluation Instrument, the Rhode Island Model: Building Administrator Evaluation and Support Model, Principal Evaluation and Improvement Instrument, and the Thoughtful Classroom Principal Effectiveness Framework. Source: New Jersey Department of Education survey of school districts, February 2013 and October 2014
Instrument developers provided qualitative information on validity All seven indicated that the practice instruments are consistent with ISLLC standards for principal leadership Five indicated that the instruments were developed and informed by research on the relationship between principal practice and school performance or student achievement None provided information on the statistical relationship between scores on the instrument and student achievement
Most instrument developers provided incomplete or no information on reliability Internal consistency reliability: One developer indicated that analyses of internal consistency reliability confirmed desired constructs Inter-rater reliability: One developer provided information on inter-rater reliability standards and training required to meet those standards Two other developers provide ongoing inter-rater reliability refresher training, but no standards for reliability are indicated
Statewide principal SGP ratings were relatively stable across years, among principals in the same school Ineffective Partially effective Effective Highly effective Note: The analysis is based on principals who were in the same school in 2012 and 2013 (1,374 principals/schools).
School median SGPs were less stable for smaller schools than for larger schools Change in School Median SGPs 2012/13-2013/14 (Percentile Points) Increased by More than 5 Percentile Points 500 Decreased by More than 5 Percentile Points Number of Tested Students in Grades 4–8 Note: 1,267 principals
Large changes in school median SGP between years 1 and 2 were less persistent in year 3 for smaller schools than larger ones Note: 808 principals