1 Evaluating the Validity of State Accountability Systems: Examples of Evaluation Studies Scott Marion Center for Assessment Presented at AERA, Division.

Slides:



Advertisements
Similar presentations
KATIE BUCKLEY, HARVARD UNIVERSITY SCOTT MARION, CENTER FOR ASSESSMENT NATIONAL CONFERENCE ON STUDENT ASSESSMENT (NCSA) NATIONAL HARBOR, MD JUNE 22, 2013.
Advertisements

Using Growth Models to improve quality of school accountability systems October 22, 2010.
Response to Intervention: Linking Statewide Initiatives.
Instructional Decision Making
Principal & Assistant Principal
Pennsylvania’s Continuous Improvement Process. Understanding AYP How much do you know about AYP?
IMPLICATIONS FOR KENTUCKY’S SCHOOLS AND DISTRICTS SUPERINTENDENTS’ WEBCAST MARCH 6, 2012 NCLB Waiver Flexibility 1.
A Guide to Education Research in the Era of NCLB Brian Jacob University of Michigan December 5, 2007.
IDEA and NCLB Accountability and Instruction for Students with Disabilities SCDN Presentation 9/06 Candace Shyer.
AMOs 101 Understanding Annual Measurable Objectives Office of Educational Accountability Wisconsin Department of Public Instruction November 2012.
Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.
Developing a Framework for Examining Validity in State Accountability Systems Ellen Forte Fast edCount, LLC AERA Division H Paper Presentation April 15,
Project Monitoring Evaluation and Assessment
Statistical Issues in Research Planning and Evaluation
Lessons Learned from AYP Decision Appeals Prepared for the American Educational Research Association Indiana Department of Education April 15, 2004.
North East School Division An Introduction to Response to Intervention (RTI)2009.
Understanding Tiers, Interventions and Supports Presentation Adopted from NESD.
Title III Notice of Proposed Interpretations & Implications for California’s Accountability System Robert Linquanti Cathy George Project Director & Sr.
Concept of Measurement
1 Some Key Points for Test Evaluators and Developers Scott Marion Center for Assessment Eighth Annual MARCES Conference University of Maryland October.
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
National Center on Educational Outcomes June, 2004 How do we keep kids from being stuck in our gap? A frame, a series of discussion questions, and some.
NATIONAL CENTER ON EDUCATIONAL OUTCOMES University of Minnesota Rachel Quenemoen Cammy Lehr Martha Thurlow.
BA 427 – Assurance and Attestation Services
Classroom Assessment A Practical Guide for Educators by Craig A
Understanding Validity for Teachers
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
How to Develop the Right Research Questions for Program Evaluation
Arizona’s Federal Accountability System 2011 David McNeil Director of Assessment, Accountability and Research.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
A Parent’s Guide to Understanding the State Accountability Workbook.
Specific Learning Disability: Accurate, Defensible, & Compliant Identification Mississippi Department of Education.
Including Quality Assurance Within The Theory of Action Presented to: CCSSO 2012 National Conference on Student Assessment June 27, 2012.
Title I Annual Meeting What Every Family Needs to Know!
Alaska Staff Development Network – Follow-Up Webinar Emerging Trends and issues in Teacher Evaluation: Implications for Alaska April 17, :45 – 5:15.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 10. Hypothesis Testing II: Single-Sample Hypothesis Tests: Establishing the Representativeness.
Comprehensive Educator Effectiveness: New Guidance and Models Presentation for the Special Education Advisory Committee Virginia Department of Education.
Comprehensive Educator Effectiveness: New Guidance and Models Presentation for the Virginia Association of School Superintendents Annual Conference Patty.
Growth Model for District “X” Why Use Growth Models? Showing progress over time is a more fair way of evaluating It is not just a “snap shot” in time.
Title I and Families. Purpose of Meeting According to the No Child Left Behind Act of 2001, schools are required to host an Annual Meeting to explain.
Title I and Families. Purpose of Meeting According to the No Child Left Behind Act of 2001, schools are required to host an Annual Meeting to explain.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Educable Mental Retardation as a Disability: Accurate, Defensible, & Compliant Identification Mississippi Department of Education.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Developing a Framework for Ensuring the Validity of State Accountability Systems Council of Chief State School Officers AERA San Diego April 15, 2004.
Issues concerning the interpretation of statistical significance tests.
Problem Solving and RtI ASCA Conference Denver, 2007 Rich Downs School Counseling Consultant Student Support Services Project Florida Department of Education.
NH Commissioner’s Task Force Meeting August 10, 2010 NH DOE 1 Commissioner's Force Meeting: August 10, 2010.
Self-Assessing Locally-Designed Assessments Jennifer Borgioli Learner-Centered Initiatives, Ltd.
Evaluating Impacts of MSP Grants Ellen Bobronnikov January 6, 2009 Common Issues and Potential Solutions.
WELCOME Title I School-wide Open House EWING PUBLIC SCHOOLS
Our Theory of Action and Multi-Tiered Framework are anchored in the Vision and Mission for the Superintendent of Public Instruction. Office of Student.
2012 MOASBO SPRING CONFERENCE Missouri Department of Elementary and Secondary Education 1 April 26, 2012.
Fidelity of Implementation A tool designed to provide descriptions of facets of a coherent whole school literacy initiative. A tool designed to provide.
C R E S S T / CU University of Colorado at Boulder National Center for Research on Evaluation, Standards, and Student Testing Design Principles for Assessment.
Combining Multiple Measures What are the indicators/ components? What are the priority outcomes? What are the performance expectations? How can we evaluate.
Chapter 9 Audit Sampling – Part a.
Chapter 13 Understanding research results: statistical inference.
Forum on Evaluating Educator Effectiveness: Critical Considerations for Including Students with Disabilities Lynn Holdheide Vanderbilt University, National.
Statewide System of Support For High Priority Schools Office of School Improvement.
Specific Learning Disability: Accurate, Defensible, & Compliant Identification Mississippi Department of Education.
You cannot be serious ?!? Question #5. Intent of Question The primary goals of this question were to assess a student’s ability to (1)calculate proportions.
Selection Criteria and Invitational Priorities School Leadership Program U.S. Department of Education 2005.
Every Student Succeeds Act (ESSA) Accountability
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
OSEP Project Directors Meeting
Federal Policy & Statewide Assessments for Students with Disabilities
Campus Improvement Planning
Internal Control Internal control is the process designed and affected by owners, management, and other personnel. It is implemented to address business.
Presentation transcript:

1 Evaluating the Validity of State Accountability Systems: Examples of Evaluation Studies Scott Marion Center for Assessment Presented at AERA, Division H # San Diego, CA April 15, 2004

Marion. Center for Assessment. AERA A Working Definition An accountability system can be said to have validity when the evidence is strong enough to support the inferences that..   The components of the system are aligned to the purposes, and are working in harmony to help the system accomplish those purposes; and   The system is accomplishing what was intended (and did not accomplish what was not intended).   Carlson, D. (2002) In Marion, et al CCSSO.

Marion. Center for Assessment. AERA The Center’s Accountability System Validity Framework I.The Theory of Action (How do you think things should work?) II.Accuracy and Consistency of School Classifications (Did you identify the right schools?) III.Consequences of the System (Did people do what was expected? what was fair? what was effective?) IV.Interventions (Did people—including the state—do what was legally required and what was needed to help students, schools, and districts succeed?)

Marion. Center for Assessment. AERA Some assessment considerations Assessment validity is a necessary but not nearly sufficient condition for ensuring accountability system validity. Assessment validity is a necessary but not nearly sufficient condition for ensuring accountability system validity. Dale Carlson depicted this relationship using the following heuristic: Dale Carlson depicted this relationship using the following heuristic:

Marion. Center for Assessment. AERA  Setting purposes and focus  Selecting indicators  Drawing inferences and making decisions  Implementing the decisions  Evaluating the effects  Data collection, scoring, and analyses Student Assessment System Standards and Specifications Annual Assessment s Scoring and Analysis Public Disaggregated Reporting From Carlson in Marion et al. (2002)

6 Accuracy and Consistency of School Classifications

Marion. Center for Assessment. AERA Accuracy and Consistency: Balancing Type I and II Errors Type I Error in an accountability context means identifying a school as failing AYP if it did not “truly” fail (“false positive”). Type I Error in an accountability context means identifying a school as failing AYP if it did not “truly” fail (“false positive”). Type II Error means that a school that should have been identified as failing AYP was incorrectly considered to be a “passing” school (“false negative”). Type II Error means that a school that should have been identified as failing AYP was incorrectly considered to be a “passing” school (“false negative”).

Marion. Center for Assessment. AERA Type I and II Errors We do not question the importance of minimizing Type I errors, given the sanctions associated with NCLB including the effects of multiple conjunctive decisions on the nominal Type I error rate. We do not question the importance of minimizing Type I errors, given the sanctions associated with NCLB including the effects of multiple conjunctive decisions on the nominal Type I error rate. Yet, few states appear concerned with minimizing Type II errors. Yet, few states appear concerned with minimizing Type II errors. We think it is important to identify schools for improvement if students are not being served adequately. (See R. Hill 2003 RILS presentation for a “two-tier” approach for addressing both types of errors.) We think it is important to identify schools for improvement if students are not being served adequately. (See R. Hill 2003 RILS presentation for a “two-tier” approach for addressing both types of errors.)

Marion. Center for Assessment. AERA A Couple of Accuracy Study Examples You can do a simple correlational study to examine the relationship between the number of subgroups and the likelihood of a school being identified. You can do a simple correlational study to examine the relationship between the number of subgroups and the likelihood of a school being identified.  Is the validity threatened if schools’ chances of being identified is simply a function of the presence of subgroups? What is the effect of school/district size on the likelihood of being identified for improvement? What is the effect of school/district size on the likelihood of being identified for improvement?

Marion. Center for Assessment. AERA Type I & Type II Errors (CI) Gather additional data about schools that “made AYP” due to CI, such as historic achievement data, school climate information, etc. Gather additional data about schools that “made AYP” due to CI, such as historic achievement data, school climate information, etc. Design a systematic approach (qualitative or quantitative) for evaluating, based on the full set of data, whether or not these schools should “truly” be considered passing—TIED TO YOUR VALUES. Design a systematic approach (qualitative or quantitative) for evaluating, based on the full set of data, whether or not these schools should “truly” be considered passing—TIED TO YOUR VALUES.  “Truly” passing—protected against Type I.  Should have failed=Type II error. Calculate the percentage of schools correctly classified as a result of using confidence intervals. Calculate the percentage of schools correctly classified as a result of using confidence intervals. Similar analyses can also be done using discriminant analyses procedures. Similar analyses can also be done using discriminant analyses procedures.

Marion. Center for Assessment. AERA Type I and Type II (min-n) Select a sample of schools that failed to make AYP because one or more subgroups missed the AMO by 5% proficient or less, but no subgroup failed by more than this amount. Select a sample of schools that failed to make AYP because one or more subgroups missed the AMO by 5% proficient or less, but no subgroup failed by more than this amount. Gather additional data about the school. Gather additional data about the school. Design a systematic approach for evaluating, based on the full set of data, whether or not these schools should “truly” be considered failing. Design a systematic approach for evaluating, based on the full set of data, whether or not these schools should “truly” be considered failing.  Incorrectly labeled failing=Type I error  “Truly” failing—protected against Type II error Calculate the percentage of schools correctly classified as a result of using this minimum-n. Calculate the percentage of schools correctly classified as a result of using this minimum-n.

Marion. Center for Assessment. AERA Type I and Type II (min-n) Establish a relatively low (e.g., n=5) minimum-n. Establish a relatively low (e.g., n=5) minimum-n. Compute the % proficient for each subgroup for each school. Compute the % proficient for each subgroup for each school. For each subgroup, divide the schools into two groups— those below the accountability minimum-n (e.g., n=30) and those above the accountability minimum-n. For each subgroup, divide the schools into two groups— those below the accountability minimum-n (e.g., n=30) and those above the accountability minimum-n. Estimate the population distribution of school means for each of these two groups of schools. Estimate the population distribution of school means for each of these two groups of schools. If the two population means are not significantly different, we can conclude that schools are not identified simply because they fall below min-n, a Type II error. If the two population means are not significantly different, we can conclude that schools are not identified simply because they fall below min-n, a Type II error.

Marion. Center for Assessment. AERA Results from one state

Marion. Center for Assessment. AERA Results from one state

Marion. Center for Assessment. AERA Types of Policy Consequences Cells on the shaded diagonal are the focus. We’re willing to suspend conspiracy theories regarding the “intended negative” and willing to take the “unintended positive” as a delightful, but rare occurrence. Cells on the shaded diagonal are the focus. We’re willing to suspend conspiracy theories regarding the “intended negative” and willing to take the “unintended positive” as a delightful, but rare occurrence. Intended Positive Unintended Positive Intended Negative Unintended Negative

Marion. Center for Assessment. AERA Intended Positive “…to ensure that all children have a fair, equal, and significant opportunity to obtain a high- quality education and reach, at a minimum, proficiency on challenging State academic achievement standards and state academic assessments.” “…to ensure that all children have a fair, equal, and significant opportunity to obtain a high- quality education and reach, at a minimum, proficiency on challenging State academic achievement standards and state academic assessments.” We, as evaluators, should look to see whether or not these outcomes have been achieved (e.g., closing the achievement gap). We, as evaluators, should look to see whether or not these outcomes have been achieved (e.g., closing the achievement gap).

Marion. Center for Assessment. AERA Unintended Negative Theorists (e.g., Cronbach, Stake, House, Shepard) have made a convincing case that evaluators must search for and make explicit the unintended negative consequences caused by programs or policies. Theorists (e.g., Cronbach, Stake, House, Shepard) have made a convincing case that evaluators must search for and make explicit the unintended negative consequences caused by programs or policies.

Marion. Center for Assessment. AERA A Categorization of Potential Consequences: A.Effects on Title I programs, budgets, personnel, and students B.Definitional (design) Effects  FAY  SWD and ELL placements C.Closing the achievement gap

Marion. Center for Assessment. AERA A study example What effect does FAY (full academic year) have on school/district identification?  Who is out of accountability because of FAY—at the school, district, state levels?  What are school/district characteristics with high exclusion because of FAY?  What happens to mobile students instructionally? Is district accountability “working” for these FAY students?

Marion. Center for Assessment. AERA Another study example Gather data about the qualifications of leaders and teachers in Title I and non-Title I schools. Gather data about the qualifications of leaders and teachers in Title I and non-Title I schools. Monitor the trends in teacher and leader quality to see if there is a relative loss in educator quality in Title I schools. Monitor the trends in teacher and leader quality to see if there is a relative loss in educator quality in Title I schools. Would these findings lead to specific interventions (e.g., higher pay for teachers moving into high-impact schools). Would these findings lead to specific interventions (e.g., higher pay for teachers moving into high-impact schools).

Marion. Center for Assessment. AERA Coherent and comprehensive accountability system How does your “Theory of Action” reflect your “Theory of Assistance”? How does your “Theory of Action” reflect your “Theory of Assistance”? How do your consequences (+ & -) lead to improvements teaching and learning? How do your consequences (+ & -) lead to improvements teaching and learning? What unintended negative consequences are you protecting against? What unintended negative consequences are you protecting against?

Marion. Center for Assessment. AERA Assistance and “accountability system validity” Evaluation is focused on the “right” things Evaluation is focused on the “right” things Performance of schools is accurately portrayed Performance of schools is accurately portrayed Information (goals, targets, performance, consequences) is used appropriately in accountability Information (goals, targets, performance, consequences) is used appropriately in accountability Appropriate schools are identified to receive consequences Appropriate schools are identified to receive consequences Schools/districts receive appropriate consequences Schools/districts receive appropriate consequences AND THEN… [black box, local control, individual implementation…] AND THEN… [black box, local control, individual implementation…] Students and schools are “better” than they would be without the accountability system Students and schools are “better” than they would be without the accountability system

Marion. Center for Assessment. AERA What is in the Black Box? Part of the validity evaluation should involve evaluating the effectiveness of the specific interventions… Part of the validity evaluation should involve evaluating the effectiveness of the specific interventions…  What is your intervention plan, including practical constraints (e.g., “triage”)?  What evidence do you have that it works and is fair?  What will you do to improve your system?

Marion. Center for Assessment. AERA Sample study How effective was school choice? How effective was school choice?  What school choice was offered? How? Who transferred? Why? What happened? Why?  What happened to students who transferred? – programs, services  What happened to students who did not transfer?  What school rating changes are due mostly to population shifts?

Marion. Center for Assessment. AERA More study topics How differentiated is your assistance in terms of NCLB “hurdles” How differentiated is your assistance in terms of NCLB “hurdles” Do you differentiate in terms of different contexts (e.g., small/rural schools)? Do you differentiate in terms of different contexts (e.g., small/rural schools)? Do you identify and account for differences in other resources (e.g., teacher “quality)? Do you identify and account for differences in other resources (e.g., teacher “quality)? Do you track the quality of implementation? Do you track the quality of implementation?

Marion. Center for Assessment. AERA Now What? Remember, validity is NOT a YES or NO decision. Remember, validity is NOT a YES or NO decision.  Evidence supporting or challenging the validity of the accountability system should be accumulated over time to make a case for or against the system.  The evidence does not require a “beyond a shadow of doubt” criminal trial decision, but rather a “preponderance of evidence” civil decision. However, we do not have to wait until the trial is over to make all of decisions—we can use the accumulating evidence in formative ways to improve the existing system. Using a validity framework helps to organize all of your evidence to help you make a more organized argument for or against the validity of the system. Using a validity framework helps to organize all of your evidence to help you make a more organized argument for or against the validity of the system.