Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler.

Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler

American Psychological Association: “Measurement validity simply means whether a test provides useful information for a particular purpose.”

State purposes for which we want useful information: Monitoring equity and quality Identifying schools that need intervention Identifying promising practices

Only part of what we want students to know is tested. Under high stakes, schools are incentivized to focus narrowly. Rating based on a subset or goals: is it enough? What we want students to learn Measured by local assessments Measured for accountability purposes

Rating schools: What does a single measure indicate? Narrowing instruction to high-stakes subjects? Scores improved in both math and science.

When we see this gain pattern, should we celebrate or worry? This is not VT data, Credit: Jennifer Jennings, NYU Narrowing instruction within subjects to content tested for high stakes purposes?

2011 High School Math Mean Scale Scores by School Size Top quartile of schools Middle half of schools Bottom quartile of schools Are scores reliable enough to “identify” the “right” schools?

2011 High School Math Mean Scale Scores by School Size 2012 High School Math Mean Scale Scores by School Size (colors reflect 2011 status)

The problem of small “n”s: Are we identifying the right schools? How many students need to take the test to get reliable school level results?

Assuming scores are reliable, can we trust proficiency cut scores? 1 student is 8% of total Wow! Increase of 33% proficient! Strong increase of 6.6, but does it feel like it doubled?

Assuming scores are reliable, can we trust proficiency cut scores? 1 student is 7% of total 1 student is 2.5% of total

Study compared probability of graduating for students just below and just above the cut score. Assuming we trust cut scores, is “predictive validity” of college readiness a function of “readiness” or sampling bias? Papay, Murnane and Willett (2010)

Assuming we trust cut scores, is “predictive validity” of college readiness a function of “readiness” or sampling bias? Compared to peers who “just pass,” low-income, urban students who “just fail” the 10 th grade MCAS: Have an 8 percentage point lower probability of graduating on time Have a 4 percentage point greater probability of dropping out in the year after initial testing Papay, Murnane and Willett (2010) No such effects observed for suburban students (regardless of income) or wealthier urban students

Is what we are measuring the impact of schools on learning? Jurisdiction % of 4th graders scoring at or above "Proficient" on 2013 NAEP Minnesota59.4% New Hampshire 58.7% Massachusetts58.4% Indiana51.8% Vermont51.5%

Is what we are measuring the impact of schools on learning? Jurisdiction % of 4th graders scoring at or above "Proficient" on 2013 NAEP Income of Households (2-Year-Average Medians, 2012- 13) % of 25-34 year olds with some kind of postsecondary degree, 2010 census Minnesota59.4% $61,800.0049.8% New Hampshire 58.7% $70,063.0046.0% Massachusetts58.4% $63,772.1954.3% Indiana51.8% $48,690.8236.1% Vermont51.5% $55,615.7644.5% Wow, Indiana!

Is what we are measuring the impact of schools on learning? Jurisdiction % of 4th graders scoring at or above "Proficient" on 2013 NAEP Income of Households (2-Year-Average Medians, 2012- 13) % of 25-34 year olds with some kind of postsecondary degree, 2010 census Inclusion rate Students with Disabilities Minnesota59.4% $61,800.0049.8%84% New Hampshire 58.7% $70,063.0046.0%83% Massachusetts58.4% $63,772.1954.3%88% Indiana51.8% $48,690.8236.1%88% Vermont51.5% $55,615.7644.5%93% Given this range, how do we understand results?

Reliability and New Assessments ‪ “You’re asking people still, even with the best of rubrics and evidence and training, to make judgments about complex forms of cognition. The more we go towards the kinds of interesting thinking and problems and situations that tend to be more about open-ended answers, the harder it is to get objective agreement in scoring.” - James Pellegrino (SBAC TAC in the NYT, 6/22/15) ‬

Closing thought: “Setting absurd standards and then announcing massive failures has undermined public support for public schools... We are dismantling public school systems whose problems are basically the problems of racial and economic polarization, segregation and economic disinvestment.” (Gary Orfield, 2014)

Summary: VT takeaways 1.Assuming we want to rate schools and apply sanctions based on student mastery of a subset of important content, skills and item formats, we may not be able to distinguish between schools where more learning has taken place and schools where students have learned more of tested content and formats at the expense of other valued learning. 2.Assuming we are comfortable with evaluating based on a subset of goals, scores may not be reliable enough to “identify” the “right” schools. 3.Assuming scores are reliable, performance reporting categories may (and probably do) distort underlying patterns of learning. 4.Assuming we trust scores and performance categories, what we are measuring may not be the impact of schools on learning.

Resources: Memo to SBAC on Performance Categories http://education.vermont.gov/documents/VT_SBAC-Governing-States_Performance- Categories_11_2014.pdf Memo to parents and caregivers on SBAC: http://education.vermont.gov/documents/RH_Letter%20to%20Parents%20and%20Caregive rs_SBAC_Another%20Measure%20of%20Learning_3_17_2015.pdf Memo to schools on SBAC http://education.vermont.gov/documents/RH_Memo%20to%20Supts%20Principals_Keepin g%20Perspective%20SBAC_3_23_2015.pdf Vermont State Board of Education Statement and Resolution on Assessment and Accountability http://education.vermont.gov/documents/EDU-SBE_AssmntAcct_Adpted081914.pdf Letter to parents and caregivers about the limitations of NCLB http://education.vermont.gov/documents/EDU- Letter_to_parents_and_caregivers_AOE_8_8_14.pdf

Partial Bibliography: Darling-Hammond, Linda; Edward Haertel, Edward; Pellegrino, James. (2014). Making good use of new assessments: Interpreting and using scores from the Smarter Balanced Assessment Consortium. Smarter Balanced Assessment Consortium. http://education.vermont.gov/documents/EDU-WhitePaper- Making_Good_Use-of_New_Assessments.pdf http://education.vermont.gov/documents/EDU-WhitePaper- Making_Good_Use-of_New_Assessments.pdf Geller, Wendy and Bailey, Glenn. VT Agency of Education Data and Research Work Group. Ho, Andrew Dean. (2008). The problem with proficiency: Limitations of statistics and policy under No Child Left Behind. Educational Researcher. 37, 6, p. 351. Hollingshead, L. & Childs, R.A. (2011.) Reporting the percentage of students above a cut score: The effect of group size. Educational Measurement: Issues and Practice, 30 (1), 36–43. Orfield, Gary. (2014). A new civil rights agenda for American education. Educational Researcher, August/September 2014, p.286 Papay, John P.; Murnane, Richard J. & Willett, John B. (2010). The consequences of high school exit examinations for low-performing urban students: Evidence from Massachusetts. Educational Evaluation & Policy Analysis. Vol. 32 Issue 1, p. 5-23.

Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler.

Similar presentations

Presentation on theme: "Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler.

Similar presentations

Presentation on theme: "Validity in the Context of High-Stakes Accountability? Rebecca Holcombe June 24, 2015 Johanna Bandler."— Presentation transcript:

Similar presentations

About project

Feedback