The Determinants of Student Achievement: Different Estimates for Different Measures Tim Sass Department of Economics Florida State University CALDER Conference.

Slides:



Advertisements
Similar presentations
Using Growth Models to improve quality of school accountability systems October 22, 2010.
Advertisements

Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho.
Chapter 9: Simple Regression Continued
Copyright 2004 David J. Lilja1 Comparing Two Alternatives Use confidence intervals for Before-and-after comparisons Noncorresponding measurements.
Objectives 10.1 Simple linear regression
Teacher Effectiveness in Urban Schools Richard Buddin & Gema Zamarro IES Research Conference, June 2010.
Teacher Credentials and Student Achievement in High School: A Cross Subject Analysis with Student Fixed Effects Charles T. Clotfelter Helen F. Ladd Jacob.
Douglas N. Harris University of Wisconsin at Madison Evaluating and Improving Value-Added Modeling.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Human Capital Policies in Education: Further Research on Teachers and Principals 5 rd Annual CALDER Conference January 27 th, 2012.
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
Informing Policy: State Longitudinal Data Systems Jane Hannaway, Director The Urban Institute CALDER
Fall 2014 MAP NWEA Score Comparison Alliance Dr. Olga Mohan High School October 22, 2014.
Evaluating Pretest to Posttest Score Differences in CAP Science and Social Studies Assessments: How Much Growth is Enough? February 2014 Dale Whittington,
Do Accountability and Voucher Threats Improve Low-Performing Schools? David N. Figlio and Cecilia Elena Rouse NBER Working Paper No August 2005.
STANDARDIZED TESTING MEASUREMENT AND EVALUATION  All teaching involves evaluation. We compare information to criteria and ten make judgments.  Measurement.
MCAS-Alt: Alternate Assessment in Massachusetts Technical Challenges and Approaches to Validity Daniel J. Wiener, Administrator of Inclusive Assessment.
z-Scores What is a z-Score? How Are z-Scores Useful? Distributions of z-Scores Standard Normal Curve.
New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Alignment Inclusive Assessment Seminar Brian Gong Claudia.
Special Education Teacher Quality and Student Achievement Li Feng Tim R. Sass Dept. of Finance & Econ.Dept. of Economics Texas State UniversityFlorida.
What Makes For a Good Teacher and Who Can Tell? Douglas N. Harris Tim R. Sass Dept. of Ed. Policy Studies Dept. of Economics Univ. of Wisconsin Florida.
Norms & Norming Raw score: straightforward, unmodified accounting of performance Norms: test performance data of a particular group of test takers that.
Standardized Test Scores Common Representations for Parents and Students.
Research Using State Longitudinal Data Systems: Accomplishments and Challenges – The Case of Florida Tim R. Sass.
Correlation and Linear Regression
-- Preliminary, Do Not Quote Without Permission -- VALUE-ADDED MODELS AND THE MEASUREMENT OF TEACHER QUALITY Douglas HarrisTim R. Sass Dept. of Ed. LeadershipDept.
Deconstructing Standards Summer Training Clear Targets: I can deconstruct standards to better understand and integrate contents. I can develop clear.
Linear Regression and Correlation
Student Assessment Literacy Project “SAL-P”. Why should you be assessment literate? It is your right to be informed and to know  on what you will be.
Human Capital Policies in Education: Further Research on Teachers and Principals 5 rd Annual CALDER Conference January 27 th, 2012.
Instrument Validity & Reliability. Why do we use instruments? Reliance upon our senses for empirical evidence Senses are unreliable Senses are imprecise.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
T-Tests and Chi2 Does your sample data reflect the population from which it is drawn from?
Quasi-Experimental Designs For Evaluating MSP Projects: Processes & Some Results Dr. George N. Bratton Project Evaluator in Arkansas.
Chapter 3 Understanding Test Scores Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition.
Special Education Teacher Quality and Student Achievement Li Feng Tim R. Sass Dept. of Finance & Econ.Dept. of Economics Texas State UniversityFlorida.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
An Introduction to Measurement and Evaluation Emily H. Wughalter, Ed.D. Summer 2010 Department of Kinesiology.
Data Vocabulary Language Arts Summer Cadre 2006 Migdalia Rosario Varsity Lakes Middle Jennifer Miller Varsity Lakes Middle Pat Zubal Dunbar Middle School.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
The Inter-temporal Stability of Teacher Effect Estimates J. R. Lockwood Daniel F. McCaffrey Tim R. Sass The RAND Corporation The RAND Corporation Florida.
Evaluating Instruction
Issues in Selecting Assessments for Measuring Outcomes for Young Children Issues in Selecting Assessments for Measuring Outcomes for Young Children Dale.
MELS 601 Ch. 7. If curriculum can be defined most simply as what is taught in the school, then instruction is the how —the methods and techniques that.
The Policy Choices of Effective Principals David Figlio, Northwestern U/NBER Tim Sass, Florida State U July 2010.
1Spring 02 First Derivatives x y x y x y dy/dx = 0 dy/dx > 0dy/dx < 0.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Chapter 7 Instrumentation. Empirical Data We need DATA We can’t rely solely upon our senses We develop INSTRUMENTS to compensate for the limitations of.
“Value added” measures of teacher quality: use and policy validity Sean P. Corcoran New York University NYU Abu Dhabi Conference January 22, 2009.
To make the best use of this supplementary material, set your PowerPoint view to Normal and make sure that you can see the notes window. Please let a facilitator.
Methods of Assessment – Ch. 3  Formal and informal assessments are used – What is the difference?  Norm-referenced tests – norm group  Basals and ceilings.
Chapter 9 Correlation, Validity and Reliability. Nature of Correlation Association – an attempt to describe or understand Not causal –However, many people.
Changes in Professional licensure Teacher evaluation system Training at Coastal Carolina University.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Business Research Methods
Inductive and Deductive Reasoning  The pre-requisites for this chapter have not been seen since grade 7 (factoring, line constructions,..);
Assessment Assessment is the collection, recording and analysis of data about students as they work over a period of time. This should include, teacher,
STANDARDIZED TESTING Understanding the Vocabulary.
Lecture #25 Tuesday, November 15, 2016 Textbook: 14.1 and 14.3
ESTIMATION.
Basic Statistics Module 6 Activity 4.
Basic Statistics Module 6 Activity 4.
Inference for Regression
Assessment and Differentiation of Instruction
Quantitative Methods Simple Regression.
Bursting the assessment mythology: A discussion of key concepts
Chapter 8 End of School Year.
Aligning curriculum,instruction, and assessemnt
Presentation transcript:

The Determinants of Student Achievement: Different Estimates for Different Measures Tim Sass Department of Economics Florida State University CALDER Conference October 4, 2007

Different Measures  Types of Tests Criterion Reference Tests  Test whether student has learned elements in state established instructional standards  State specific Nationally Normed Tests  Tests whether student has learned a set of concepts and skills that may or may not correspond to any particular state’s curriculum benchmarks  Allows interstate comparisons

Different Measures  Scaling Non-Vertically Aligned Scale Scores  Scale potentially different at each grade level  Can’t compare learning gains  Criterion reference tests are typically not vertically aligned Vertical or Developmental Scales  A single equal-interval scale that spans all grade levels  A one-unit change means the same at all levels within and between grades  Some norm-referenced exams are of this type  Stanford Achievement Test

Non-Vertically Aligned Scores Grade 10 Grade 9 Grade 8 Grade 7 Grade 6 Grade 5 Grade 4 Grade 3 Trigonometry Single-Digit Addition

Vertically Scaled Scores Grade 10 Grade 9 Grade 8 Grade 7 Grade 6 Grade 5 Grade 4 Grade 3 If done right, vertically scaled exam ideal for analyzing learning gains since one-point change has same meaning everywhere on the scale. Trigonometry Simple Addition

Different Measures  Scale Scores Normalized by Grade and Year Frequently used by researchers to compare a student’s performance on criterion referenced tests over time  Compares a student’s performance relative to the performance of other students taking the same grade-level exam in the same year  Unit of measure is the standard deviation  If performance distribution changes from grade to grade, normalized scores may not be comparable Also sometimes used to try to equate performance on different exams when a state changes their test midstream

Normalized score sets mean to zero and rescales score Normalized Scores Grade 5 0

Different Results  Analysis of the Effectiveness of NBPTS Certified Teachers Harris and Sass, “The Effects of NBPTS-Certified Teachers on Student Achievement” (February, 2007)  Compares the effectiveness of NBPTS-certified teachers (NBCTs) with the effectiveness of non-NBCTs in Florida  In many cases, results vary whether use scores from Florida’s criterion reference test, the FCAT-Sunshine State Standards exam (FCAT-SSS), or the Stanford Achievement Test, a norm-referenced test (FCAT-NRT)

Value-Added Estimates of Reading Achievement Selected Explanatory Variables FCAT-SSS Developmental Scale FCAT-NRT Developmental Scale FCAT-SSS Normalized by Grade & Year FCAT-NRT Normalized by Grade & Year NBPTS Certified First-Year Teacher Years of Teaching Experience Years of Teaching Experience Years of Teaching Experience Advanced Degree Class Size Note: all coefficients expressed in standard deviation units omitted experience category is teachers with 10+ years of experience coefficients in green are statistically significant at the 95% confidence level

Different Results  More variation in estimated effects across exams than in different scalings of same exam  Estimated effects of variables representing small proportions of teachers most variable NBPTS Certification Advanced Degrees  Why are there differences across exams? Differences in material covered Differential ceiling effects

Grade 10 Grade 9 Grade 8 Grade 7 Grade 6 Grade 5 Grade 4 Grade 3 Trigonometry Simple Addition Vertically Scaled Scores With Ceiling

Conclusions  Not much difference between developmental scale scores and non-vertically aligned scores that are normalized by grade and year  Different tests can yield different results Low-incidence variables seem to be most sensitive to test instrument Not clear whether differences due to material tested or differential ceiling effects