Cory Koedel, Eric Parsons, Michael Podgursky and Mark Ehlert

Slides:



Advertisements
Similar presentations
NYC Teacher Data Initiative: An introduction for Teachers ESO Focus on Professional Development December 2008.
Advertisements

PROBLEM 1 What is the profile of the respondents in terms of: age; gender; highest educational attainment of parents; and monthly family income?
Chapter 3 Introduction to Quantitative Research
Chapter 3 Introduction to Quantitative Research
Value Added in CPS. What is value added? A measure of the contribution of schooling to student performance Uses statistical techniques to isolate the.
Why Take PLAN? PLAN shows your strengths and weaknesses in English, mathematics, reading, and science. PLAN lets you know if you’re on target for college.
Chapter 10: The t Test For Two Independent Samples
OPEN AND DISTANCE LEARNING TEACHER TRAINING PROGRAMME-MALAWI IMPACT EVALUATION DESIGN SHORT TERM STRATEGY FOR IMPROVING STUDENTS ’ LEARNING OUTCOMES THROUGH.
An Empirical Analysis Comparing Public Self-Selecting Elementary Schools to Traditional Based Elementary Schools Within the Anchorage School District by.
What Does Research Tell Us About Identifying Effective Teachers? Jonah Rockoff Columbia Business School Nonprofit Leadership Forum, May 2010.
Teacher Effectiveness in Urban Schools Richard Buddin & Gema Zamarro IES Research Conference, June 2010.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Florida Department of Education Value-added Model (VAM) FY2012 Using Student Growth in Teacher and School Based Administrator Evaluations.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Statistical Issues in Research Planning and Evaluation
Informing Policy: State Longitudinal Data Systems Jane Hannaway, Director The Urban Institute CALDER
Assessment: Reliability, Validity, and Absence of bias
Selecting Growth Measures for School and Teacher Evaluations: Should Proportionality Matter? Mark Ehlert Cory Koedel Eric Parsons Michael Podgursky Department.
Managing Quality Chapter 5.
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
Treatment Effects: What works for Whom? Spyros Konstantopoulos Michigan State University.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Ways to Utilize the 2012 FCPS Working Conditions Survey April 11, 12, 13 Laurie Fracolli, Sid Haro, and Andrew Sioberg.
Characteristics of a Good Research Question
Introduction to GREAT for ELs Office of Student Assessment Wisconsin Department of Public Instruction (608)
Different Skills? Identifying Differentially Effective Teachers of English Language Learners Ben Master, Susanna Loeb, Camille Whitney, James Wyckoff 5.
Survey Results Snapshot A Quick Scan Activity to begin looking at your school’s data results Allocate minutes for completion.
RESEARCH IN EDUCATION Chapter I. Explanations about the Universe Power of the gods Religious authority Challenge to religious dogma Metacognition: Thinking.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Collecting Quantitative Data
EAPY 677: Perspectives in Educational Psychology Dr. K. A. Korb University of Jos 14 May 2009.
1 Comments on: “New Research on Training, Growing and Evaluating Teachers” 6 th Annual CALDER Conference February 21, 2013.
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
The Narrowing Gap in NYC Teacher Qualifications and its Implications for Student Achievement Don Boyd, Hamp Lankford, Susanna Loeb, Jonah Rockoff, & Jim.
1 Brainstorming for Presentation of Variability in Current Practices Scenario B. Contor August 2007.
Human Capital Policies in Education: Further Research on Teachers and Principals 5 rd Annual CALDER Conference January 27 th, 2012.
Meryle Weinstein, Emilyn Ruble Whitesell and Amy Ellen Schwartz New York University Improving Education through Accountability and Evaluation: Lessons.
Florida Department of Education Value-added Model (VAM) FY2012 Using Student Growth in Teacher and School Based Administrator Evaluations.
Probability, contd. Learning Objectives By the end of this lecture, you should be able to: – Describe the difference between discrete random variables.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.
Economic evaluation of health programmes Department of Epidemiology, Biostatistics and Occupational Health Class no. 19: Economic Evaluation using Patient-Level.
Testing Hypotheses about Differences among Several Means.
Final Reports from the Measures of Effective Teaching Project Tom Kane Harvard University Steve Cantrell, Bill & Melinda Gates Foundation.
Prepared by M.A. Sana Yousif Ahmed College of Languages English Department Evening Classes.
Don Boyd, Pam Grossman, Karen Hammerness, Hamp Lankford, Susanna Loeb, Matt Ronfeldt & Jim Wyckoff This work is supported.
CATEGORICAL VARIABLES Testing hypotheses using. When only one variable is being measured, we can display it. But we can’t answer why does this variable.
TEACHER EFFECTIVENESS INITIATIVE VALUE-ADDED TRAINING Value-Added Research Center (VARC)
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Impediments to the estimation of teacher value added Steven Rivkin Jun Ishii April 2008.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Problem Solving By Rosi Barron Problem Solving Process.
Lecture 1 Stat Applications, Types of Data And Statistical Inference.
“Teaching”…Chapter 11 Planning For Instruction
EVALUATION AND SELFASSESSMENT SYSTEMS FOR THE VIRTUAL TEACHING: MULTIPLE CHOICE QUESTIONNAIRES Claudio Cameselle, Susana Gouveia Departamento de Enxeñería.
Quality Control  Statistical Process Control (SPC)
Survey Results Snapshot A Quick Scan Activity to begin looking at your school’s data results Allocate minutes for completion.
Outline Variables – definition  Physical dimensions  Abstract dimensions Systematic vs. random variables Scales of measurement Reliability of measurement.
Florida Department of Education’s Florida Department of Education’s Teacher Evaluation System Student Learning Growth.
Problem-Based Learning Jodi Bumgarner EDU692: Creativity, Culture, and Global Context in Education Instructor: Angela Stephens October 26,2015.
IMPACT EVALUATION PBAF 526 Class 5, October 31, 2011.
1 Perspectives on the Achievements of Irish 15-Year-Olds in the OECD PISA Assessment
Lurking inferential monsters
How High Schools Explain Students’ Initial Colleges and Majors
Evaluation of the Wisconsin Educator Effectiveness System Pilot: Results of the Teacher Practice Rating System Pilot Curtis Jones, UW Milwaukee Steve.
Teacher Effectiveness and Their Carbon TIME Practices and Knowledge
Prepared by Lee Revere and John Large
Gathering and Organizing Data
Presentation transcript:

Cory Koedel, Eric Parsons, Michael Podgursky and Mark Ehlert Teacher Preparation Programs and Teacher Quality: Are There Real Differences Across Programs? Cory Koedel, Eric Parsons, Michael Podgursky and Mark Ehlert

Motivation Evaluations of teacher preparation programs (TPPs) using student achievement data increasingly popular A major component of successful Race to the Top applications (Crowe, 2011) Strongly advocated by the United States Department of Education This paper began as a typical TPP evaluation.

Evaluation Models The models that are being used to evaluate TPPs are value-added models or variants of value-added models Basic idea: construct a forecasting model for students’ year-t achievement that depends on prior achievement and other information. The predictor of interest is the TPP for the teacher assigned to the student in year-t. Key question: Do students taught by teachers from some TPPs systematically exceed or fall short of their predicted performance?

Example Model Yijst = test score for student i with teacher j at school s in year t. Xijst = student characteristics (race, gender, free/reduced lunch status, etc.). Sijst = school characteristics (analogous to student characteristics, but aggregated). TPPijst= Vector of variables that indicate which TPP teacher j was certified by. With our without school fixed effects (I focus here on without – see Mihaly et al. (forthcoming) for discussion)

Example Model and the Clustering Problem Yijst = test score for student i with teacher j at school s in year t. Xijst = student characteristics (race, gender, free/reduced lunch status, etc.). Sijst = school characteristics (analogous to student characteristics, but aggregated). TPPijst= Vector of variables that indicate which TPP teacher j was certified by. With our without school fixed effects (I focus here on without – see Mihaly et al. (forthcoming) for a general discussion of model choice)

Missouri Data Teachers certified between 2004 and 2008 and observed teaching in 2008-2009 in a Missouri public elementary school. Teachers’ classrooms observed for up to three years (through 2010-11). Main Analysis “All” programs: 24 TPPs with more than 15 teachers observed in the analytic data, 1309 teachers in 656 different elementary schools (60,000 plus student-year records attached) Large programs: 12 TPPs with 50+ teachers (produced three-fourths of the teachers in our sample) On average, over 80 teachers per program in our data for the large programs, which is more than in the reports in LA and TN, and in notable research studies (e.g., Boyd et al., 2009). We do not have a small sample.

Results from Missouri

Interpretation The findings from the previous slide suggest that the differences in point estimates across teacher training programs are entirely reflective of statistical noise. To illustrate this point, we perform falsification tests where we randomly assign teachers to TPPs and show that similarly-sized nominal “differences” across TPPs are reflected in the point estimates (not shown for brevity).

Why isn’t this working? There is too much variation in teacher performance within programs, too little between Large-program math models: Change in R2 for student-achievement model when we add TPP indicators (preferred specification): 0.001 Change in R2 for student-achievement model when we add individual-teacher indicators (preferred specification): 0.047 Ratio: 0.019 That is, variation in teacher quality across TPPs explains just 1.9 percent of the total variation in test scores attributable to differences across individual teachers.

Why isn’t this working Because there is so much variation in quality within programs and so little between, the data environment is challenging. The data requirements to perform this type of evaluation – speaking in the abstract and purely from a statistical perspective – are substantial. Even in states where the data are of high quality, we will generally not have enough data to do a good job of this given the clustering structure. The most interesting and surprising finding for us is that the differences between teachers from different TPPs are so small. This is at the root of the statistical issue – the models are being asked to statistically distinguish very small program differences with lots of noise floating around.

Important Caveat Our analysis only looks at “traditional TPPs.” Evaluations that consider alt-cert programs, or more generally, that compare more-heterogeneous groups of programs, may find differences that were not present in our study.

Should we give up? My opinion: No Why? The current statistical problem is driven in large part by the fact that the real differences between TPPs are small. Given that there has never been an evaluation system that monitored output, this is perhaps unsurprising. The mere presence of an evaluation system moving forward may prompt positive changes in TPPs as programs attempt to move up in the rankings. This could improve students’ short-term and long-term outcomes in meaningful ways. Even if these types of evaluations are not particularly informative now, they do no harm as long as people understand the magnitudes of differences between programs are currently small. Keeping the systems in place leaves open the option that they may grow into a more-valuable recruitment and monitoring tool in the future.

Key Takeaways The differences in teacher quality between teachers from different TPPs are small. In Missouri, we cannot reject the null hypothesis that the differences we see in our models are driven purely by noise in the data – that is, there are no real differences. This does not mean that these types of evaluations cannot become more useful over time, so all hope is not lost. However, it is important that educational administrators not put undue weight on the rankings produced by these models. The rankings are noisy, and the implied magnitudes of the differences between programs are typically very small. The data indicate that if you rank the TPPs from best to worst based on estimates from these types of models, there are still many teachers from the lowest-ranking program who will perform better than many teachers from the highest-ranking program.

A brief note on selection