Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler.

Slides:

Advertisements

Similar presentations

Ed-D 420 Inclusion of Exceptional Learners. CAT time Learner-Centered - Learner-centered techniques focus on strategies and approaches to improve learning.

Advertisements

National Accessible Reading Assessment Projects Defining Reading Proficiency for Accessible Large Scale Assessments Principles and Issues Paper American.

Establishing Performance Standards for PARCC Assessments Initial Discussion PARCC Governing Board Meeting April 3,

Achieve Data Profile: Pennsylvania April AMERICAN DIPLOMA PROJECT NETWORK The Big Picture n To be successful in today’s economy, all students.

Understanding Grade Level and School Growth Reports Office of Assessment and Accountability.

Assessment and Accountability at the State Level NAEP NRT (Iowa Tests) Core CRTs DWA UAA UBSCT NCLB U-PASS Alphabet Soup.

Iowa Assessment Update School Administrators of Iowa November 2013 Catherine Welch Iowa Testing Programs.

Franklin Public Schools MCAS Presentation November 27, 2012 Joyce Edwards Director of Instructional Services.

Chapter Fifteen Understanding and Using Standardized Tests.

C R E S S T / Harvard Daniel Koretz Harvard Graduate School of Education National Center for Research on Evaluation, Standards, and Student Testing “Believe.

State of Texas Assessments of Academic Readiness.

National Center on Educational Outcomes June, 2004 How do we keep kids from being stuck in our gap? A frame, a series of discussion questions, and some.

Copyright 2001 by Allyn and Bacon Standardized Testing Chapter 14.

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Alignment Inclusive Assessment Seminar Brian Gong Claudia.

Staar Trek The Next Generation STAAR Trek: The Next Generation Performance Standards.

Richfield Elementary School Improvement Plan Amber Lawrence, Chairperson Brian K. Barrett, Principal.

Common Questions What tests are students asked to take? What are students learning? How’s my school doing? Who makes decisions about Wyoming Education?

Challenges in Developing a University Admissions Test & a National Assessment A Presentation at the Conference On University & Test Development in Central.

Chapter 14 Understanding and Using Standardized Tests Viewing recommendations for Windows: Use the Arial TrueType font and set your screen area to at least.

1. The Process Rubrics (40 or 90) should be open soon. 2. The Data Profile and SI Plan are expected to open in December. 3. The complete CNA will.

John Cronin, Ph.D. Director The Kingsbury NWEA Measuring and Modeling Growth in a High Stakes Environment.

ESEA NCLB  Stronger accountability  More freedom for states and communities  Use of proven research-based methods  More choices.

High Stakes Testing EDU 330: Educational Psychology Daniel Moos.

Understanding and Using Standardized Tests

REGION 14’S COMPREHENSIVE ASSESSMENT PLAN Assessment and Data.

Common Core State Standards Background and ELA Overview Created By: Penny Plavala, Literacy Specialist.

Achieve Data Profile: Washington January AMERICAN DIPLOMA PROJECT NETWORK The Big Picture n To be successful in today’s economy, all students.

Fall Testing Update David Abrams Assistant Commissioner for Standards, Assessment, & Reporting Middle Level Liaisons & Support Schools Network November.

WELCOME TO PARK VIEW MIDDLE SCHOOL NECAP REPORT NIGHT.

The data-driven conclusion: High-stakes testing has failed.

Understanding the TerraNova Test Testing Dates: May Kindergarten to Grade 2.

1 Watertown Public Schools Assessment Reports 2010 Ann Koufman-Frederick and Administrative Council School Committee Meetings Oct, Nov, Dec, 2010 Part.

TAKS What Students And Parents Can Do To Prepare Brought to you by the teachers and administrators of BHS who want your child to succeed in life.

Employing Empirical Data in Judgmental Processes Wayne J. Camara National Conference on Student Assessment, San Diego, CA June 23, 2015.

Classroom Assessment for Student Learning: Doing It Right – Using It Well.

Release of PARCC Student Results. By the end of this presentation, parents will be able to: Identify components of the PARCC English.

PREPARING [DISTRICT NAME] STUDENTS FOR COLLEGE & CAREER Setting a New Baseline for Success.

Southern Regional Education Board HSTW Measuring High School Students’ College and Career Readiness Race to the Top Assessment Public Meeting United States.

Standard Setting Results for the Oklahoma Alternate Assessment Program Dr. Michael Clark Research Scientist Psychometric & Research Services Pearson State.

MEAP / MME New Cut Scores Gill Elementary February 2012.

Classroom Assessment, Grading, and Standardized Testing

PUBLIC SCHOOLS OF NORTH CAROLINA STATE BOARD OF EDUCATION DEPARTMENT OF PUBLIC INSTRUCTION 1 A. Effect Size Can Tell a Lot! B. Future Ready Core NC Accountability.

The Excellence Achievement Gap: State-Level Data Jonathan A. Plucker, Ph.D. Terry Spradlin, M.P.A. Amber Esping Indiana University American Psychological.

Review of Special Education in the Commonwealth of Massachusetts: Findings and Recommendations Dr. Thomas Hehir Silvana and Christopher Pascucci Professor.

Massachusetts Comprehensive Assessment System (MCAS) /22/2010.

Understanding AzMERIT Results and Score Reporting An Overview.

Foundations of American Education: Perspectives on Education in a Changing World, 15e © 2011 Pearson Education, Inc. All rights reserved. Chapter 11 Standards,

University of Colorado at Boulder National Center for Research on Evaluation, Standards, and Student Testing Challenges for States and Schools in the No.

Neo-Conservative Ideas Berliner and Biddle ( ) Neo-conservative “centrist” thought won out in school reform. Main approaches to school reform: Get.

Claim 1 Smarter Balanced Sample Items Grade 7 - Target H Draw informal comparative inferences about two populations. Questions courtesy of the Smarter.

Presentation to the Nevada Council to Establish Academic Standards Proposed Math I and Math II End of Course Cut Scores December 22, 2015 Carson City,

MCAS Progress and Performance Index Report 2013 Cohasset Public Schools.

KHS PARCC/SCIENCE RESULTS Using the results to improve achievement Families can use the results to engage their child in conversations about.

Interpreting Test Results using the Normal Distribution Dr. Amanda Hilsmier.

Kansas Association of School Boards ESEA Flexibility Waiver KASB Briefing August 10, 2012.

A Close Look at Don’t Fail Idaho’s Student Achievement Message June 25, 2013 Bert Stoneberg, Ph.D. K-12 Research Idaho

Smarter Balanced Performance Levels and Scale Scores

A Growth Measure for ALL Students.

Update on Data Collection and Reporting

IT’S ALL ABOUT GROWTH!. Hemet Unified School District’s Use of Measures of Academic Progress (MAP)

Release of PARCC Student Results

NWEA Measures of Academic Progress (MAP)

The SAT Suite of Assessments

2015 PARCC Results for R.I: Work to do, focus on teaching and learning

Understanding Results

New Statewide Accountability System

Understanding and Using Standardized Tests

Assessment Literacy: Test Purpose and Use

Claim 1 Smarter Balanced Sample Items Grade 7 - Target H

The data-driven conclusion: High-stakes testing has failed.

Presentation transcript:

Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

American Psychological Association: “Measurement validity simply means whether a test provides useful information for a particular purpose.”

State purposes for which we want useful information: Monitoring equity and quality Identifying schools that need intervention Identifying promising practices

Only part of what we want students to know is tested. Under high stakes, schools are incentivized to focus narrowly. Rating based on a subset or goals: is it enough? What we want students to learn Measured by local assessments Measured for accountability purposes

Rating schools: What does a single measure indicate? Narrowing instruction to high-stakes subjects? Scores improved in both math and science.

When we see this gain pattern, should we celebrate or worry? This is not VT data, Credit: Jennifer Jennings, NYU Narrowing instruction within subjects to content tested for high stakes purposes?

2011 High School Math Mean Scale Scores by School Size Top quartile of schools Middle half of schools Bottom quartile of schools Are scores reliable enough to “identify” the “right” schools?

2011 High School Math Mean Scale Scores by School Size 2012 High School Math Mean Scale Scores by School Size (colors reflect 2011 status)

The problem of small “n”s: Are we identifying the right schools? How many students need to take the test to get reliable school level results?

Assuming scores are reliable, can we trust proficiency cut scores? 1 student is 8% of total Wow! Increase of 33% proficient! Strong increase of 6.6, but does it feel like it doubled?

Assuming scores are reliable, can we trust proficiency cut scores? 1 student is 7% of total 1 student is 2.5% of total

Study compared probability of graduating for students just below and just above the cut score. Assuming we trust cut scores, is “predictive validity” of college readiness a function of “readiness” or sampling bias? Papay, Murnane and Willett (2010)

Assuming we trust cut scores, is “predictive validity” of college readiness a function of “readiness” or sampling bias? Compared to peers who “just pass,” low-income, urban students who “just fail” the 10 th grade MCAS: Have an 8 percentage point lower probability of graduating on time Have a 4 percentage point greater probability of dropping out in the year after initial testing Papay, Murnane and Willett (2010) No such effects observed for suburban students (regardless of income) or wealthier urban students

Is what we are measuring the impact of schools on learning? Jurisdiction % of 4th graders scoring at or above "Proficient" on 2013 NAEP Minnesota59.4% New Hampshire 58.7% Massachusetts58.4% Indiana51.8% Vermont51.5%

Is what we are measuring the impact of schools on learning? Jurisdiction % of 4th graders scoring at or above "Proficient" on 2013 NAEP Income of Households (2-Year-Average Medians, ) % of year olds with some kind of postsecondary degree, 2010 census Minnesota59.4% $61, % New Hampshire 58.7% $70, % Massachusetts58.4% $63, % Indiana51.8% $48, % Vermont51.5% $55, % Wow, Indiana!

Is what we are measuring the impact of schools on learning? Jurisdiction % of 4th graders scoring at or above "Proficient" on 2013 NAEP Income of Households (2-Year-Average Medians, ) % of year olds with some kind of postsecondary degree, 2010 census Inclusion rate Students with Disabilities Minnesota59.4% $61, %84% New Hampshire 58.7% $70, %83% Massachusetts58.4% $63, %88% Indiana51.8% $48, %88% Vermont51.5% $55, %93% Given this range, how do we understand results?

Reliability and New Assessments ‪ “You’re asking people still, even with the best of rubrics and evidence and training, to make judgments about complex forms of cognition. The more we go towards the kinds of interesting thinking and problems and situations that tend to be more about open-ended answers, the harder it is to get objective agreement in scoring.” - James Pellegrino (SBAC TAC in the NYT, 6/22/15) ‬

Closing thought: “Setting absurd standards and then announcing massive failures has undermined public support for public schools... We are dismantling public school systems whose problems are basically the problems of racial and economic polarization, segregation and economic disinvestment.” (Gary Orfield, 2014)

Summary: VT takeaways 1.Assuming we want to rate schools and apply sanctions based on student mastery of a subset of important content, skills and item formats, we may not be able to distinguish between schools where more learning has taken place and schools where students have learned more of tested content and formats at the expense of other valued learning. 2.Assuming we are comfortable with evaluating based on a subset of goals, scores may not be reliable enough to “identify” the “right” schools. 3.Assuming scores are reliable, performance reporting categories may (and probably do) distort underlying patterns of learning. 4.Assuming we trust scores and performance categories, what we are measuring may not be the impact of schools on learning.

Resources: Memo to SBAC on Performance Categories Categories_11_2014.pdf Memo to parents and caregivers on SBAC: rs_SBAC_Another%20Measure%20of%20Learning_3_17_2015.pdf Memo to schools on SBAC g%20Perspective%20SBAC_3_23_2015.pdf Vermont State Board of Education Statement and Resolution on Assessment and Accountability Letter to parents and caregivers about the limitations of NCLB Letter_to_parents_and_caregivers_AOE_8_8_14.pdf

Partial Bibliography: Darling-Hammond, Linda; Edward Haertel, Edward; Pellegrino, James. (2014). Making good use of new assessments: Interpreting and using scores from the Smarter Balanced Assessment Consortium. Smarter Balanced Assessment Consortium. Making_Good_Use-of_New_Assessments.pdf Making_Good_Use-of_New_Assessments.pdf Geller, Wendy and Bailey, Glenn. VT Agency of Education Data and Research Work Group. Ho, Andrew Dean. (2008). The problem with proficiency: Limitations of statistics and policy under No Child Left Behind. Educational Researcher. 37, 6, p Hollingshead, L. & Childs, R.A. (2011.) Reporting the percentage of students above a cut score: The effect of group size. Educational Measurement: Issues and Practice, 30 (1), 36–43. Orfield, Gary. (2014). A new civil rights agenda for American education. Educational Researcher, August/September 2014, p.286 Papay, John P.; Murnane, Richard J. & Willett, John B. (2010). The consequences of high school exit examinations for low-performing urban students: Evidence from Massachusetts. Educational Evaluation & Policy Analysis. Vol. 32 Issue 1, p