Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions,

Slides:

Advertisements

Similar presentations

National Accessible Reading Assessment Projects Defining Reading Proficiency for Accessible Large Scale Assessments Principles and Issues Paper American.

Advertisements

No Child Left Behind Act January 2002 Revision of Elementary and Secondary Education Act (ESEA) Education is a state and local responsibility Insure.

Validity in Action: State Assessment Validity Evidence for Compliance with NCLB William D. Schafer, Joyce Wang, and Vivian Wang University of Maryland.

Designing Content Targets for Alternate Assessments in Science: Reducing depth, breadth, and/or complexity Brian Gong Center for Assessment Web seminar.

Lessons Learned from AYP Decision Appeals Prepared for the American Educational Research Association Indiana Department of Education April 15, 2004.

National Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder Robert L. Linn Paper prepared for The CRESST.

Robert L. Linn CRESST, University of Colorado at Boulder Paper presented at a symposium sponsored by the National Association of Test Directors entitled.

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Consequential Validity Inclusive Assessment Seminar Elizabeth.

Robert L. Linn CRESST, University of Colorado at Boulder Paper presented at a symposium sponsored entitled “Accountability: Measurement and Value-Added.

1 Some Key Points for Test Evaluators and Developers Scott Marion Center for Assessment Eighth Annual MARCES Conference University of Maryland October.

Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.

Meeting NCLB Act: Students with Disabilities Who Are Caught in the Gap Martha Thurlow Ross Moen Jane Minnema National Center on Educational Outcomes

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Alignment Inclusive Assessment Seminar Brian Gong Claudia.

Catherine Cross Maple, Ph.D. Deputy Secretary Learning and Accountability

Understanding Validity for Teachers

Our Children Are Our Future: No Child Left Behind No Child Left Behind Accountability and AYP A Archived Information.

CYCO Professional Development Packages (PDPs) Teacher Responsiveness to Student Scientific Inquiry 1.

Chapter 2 Ensuring Progress in the General Curriculum Through Universal Design for Learning and Inclusion Each Power Point presentation can be viewed as.

NCAASE Work with NC Dataset: Initial Analyses for Students with Disabilities Ann Schulte NCAASE Co-PI

Argumentation in Middle & High School Science Victor Sampson Assistant Professor of Science Education School of Teacher Education and FSU-Teach Florida.

Department of Research and Evaluation Santa Ana Unified School District 2011 CST API and AYP Elementary Presentation Version: Elementary.

ASSESSING DEVELOPMENT ASSESSMENT IN COMMUNITY COLLEGES: A REVIEW OF THE LITERATURE Katherine Hughes Community College Research Center Judith Scott-Clayton.

A Parent’s Guide to Understanding the State Accountability Workbook.

Exploring Alternate AYP Designs for Assessment and Accountability Systems 1 Dr. J.P. Beaudoin, CEO, Research in Action, Inc. Dr. Patricia Abeyta, Bureau.

Update on Virginia’s Growth Measure Deborah L. Jonas, Ph.D. Executive Director for Research and Strategic Planning Virginia Department of Education July-August.

CHOLLA HIGH MAGNET SCHOOL Plc Workshop

A Closer Look at Adequate Yearly Progress (AYP) Michigan Department of Education Office of Educational Assessment and Accountability Paul Bielawski Conference.

Student Learning Objectives: Approval Criteria and Data Tracking September 17, 2013 This presentation contains copyrighted material used under the educational.

1 Watertown Public Schools Assessment Reports 2010 Ann Koufman-Frederick and Administrative Council School Committee Meetings Oct, Nov, Dec, 2010 Part.

Introduction to Validity

Robert L. Linn Center for Research on Evaluation, Standards, and Student Testing University of Colorado at Boulder CRESST Conference, UCLA September 9,

1 No Child Left Behind for Indian Groups 2004 Eva M. Kubinski Comprehensive Center – Region VI January 29, 2004 Home/School Coordinators’ Conference UW-Stout.

Dr. Nancy S. Grasmick July 26,2012.  Maryland is proud to be the top-ranked state in U.S. growth as reported in this study, and judged by Education Week.

Lodi Unified School District Accountability Progress Report (APR) Results Update Prepared by the LUSD Assessment, Research & Evaluation Department.

Evaluating Consequential Validity of AA-AAS Presented at OSEP Conference January 15, 2008 by Marianne Perie Center for Assessment.

1 Evaluating the Validity of State Accountability Systems: Examples of Evaluation Studies Scott Marion Center for Assessment Presented at AERA, Division.

Santa Ana Unified School District 2011 CST Enter School Name Version: Intermediate.

No Child Left Behind. HISTORY President Lyndon B. Johnson signs Elementary and Secondary Education Act, 1965 Title I and ESEA coordinated through Improving.

NECAP Results and Accountability A Presentation to Superintendents March 22, 2006.

Adequate Yearly Progress The federal law requires all states to establish standards for accountability for all schools and districts in their states. The.

Michigan School Report Card Update Michigan Department of Education.

NCLB / Education YES! What’s New for Students With Disabilities? Michigan Department of Education.

Early Childhood Outcomes Center New Tools in the Tool Box: What We Need in the Next Generation of Early Childhood Assessments Kathy Hebbeler ECO at SRI.

Last Revised: 10/01/15. Senate Bill 290 has specific goal-setting requirements for all licensed and administrative staff in the State of Oregon. In ,

A presentation of methods and selected results 1.

C R E S S T / U C L A Validity Issues for Accountability Systems Eva L. Baker AERA April 2002 UCLA Graduate School of Education & Information Studies.

ADEQUATE YEARLY PROGRESS. Adequate Yearly Progress Adequate Yearly Progress (AYP), – Is part of the federal No Child Left Behind Act (NCLB) – makes schools.

On the horizon: State Accountability Systems U.S. Department of Education Office of Elementary and Secondary Education October 2002 Archived Information.

1 Accountability Systems.  Do RFEPs count in the EL subgroup for API?  How many “points” is a proficient score worth?  Does a passing score on the.

No Child Left Behind Impact on Gwinnett County Public Schools’ Students and Schools.

Foundations of American Education: Perspectives on Education in a Changing World, 15e © 2011 Pearson Education, Inc. All rights reserved. Chapter 11 Standards,

University of Colorado at Boulder National Center for Research on Evaluation, Standards, and Student Testing Challenges for States and Schools in the No.

C R E S S T / CU University of Colorado at Boulder National Center for Research on Evaluation, Standards, and Student Testing Design Principles for Assessment.

C R E S S T / CU University of Colorado at Boulder National Center for Research on Evaluation, Standards, and Student Testing Measuring Adequate Yearly.

What is Research?. Intro.  Research- “Any honest attempt to study a problem systematically or to add to man’s knowledge of a problem may be regarded.

1 Adequate Yearly Progress (AYP) U.S. Department of Education Adapted by TEA May 2003 Modified by Dr. Teresa Cortez for Riverside Feeder Data Days February.

KHS PARCC/SCIENCE RESULTS Using the results to improve achievement Families can use the results to engage their child in conversations about.

Breakout Discussion: Every Student Succeeds Act - Scott Norton Council of Chief State School Officers.

Conversation about State Report Card November 28, 2016

American Institutes for Research

Phyllis Lynch, PhD Director, Instruction, Assessment and Curriculum

Consequential Validity

Federal Policy & Statewide Assessments for Students with Disabilities

AWG Spoke Committee- English Learner Subgroup

School Performance Measure Calculations SY

Every Student Succeeds Act Update

Campus Improvement Planning

PARCC RESULTS: PRESENTATION FAIRVIEW SCHOOL DISTRICT OCTOBER 2, 2018

Assessment Literacy: Test Purpose and Use

Every Student Succeeds Act (ESSA):

Presentation transcript:

Robert L. Linn CRESST, University of Colorado at Boulder Presentation at the Ninth Annual Maryland Assessment Conference: The Concept of Validity : Revisions, New Directions and Application. College Park MD: University of Maryland. Sponsored by the Maryland State Department of Education and the Maryland Assessment Research Center for Education Success, October 9 and 10, 2008 The Concept of Validity in the Context of NCLB

Validity Points of Broad Consensus Validity is the most fundamental consideration in the evaluation of the appropriateness of claims about, and uses and interpretations of assessment results. Validity is a matter of degree rather than all or none.

Validity (continued) Broad, but not universal agreement (for exception, see Lissitz & Samuelson, 2007) It is the uses and interpretations of tests rather than the test itself that is validated. Validity may be relatively high for one use or interpretation of assessment results by quite low for another use or interpretation.

Validity (continued) A comprehensive validation program for state tests used for purposes of NCLB requires systematic analysis of the myriad uses, interpretations, and claims that are made. Evidence relative to particular uses, interpretations and claims needs to be accumulated and organized into relevant validity arguments (Kane, 2006).

1999 Test Standards “Validity is the degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses of tests.” Validation logically begins with an explicit statement of the proposed interpretation of test scores, along with a rationale for the relevance of the interpretation to the proposed use.” (AERA, APA, & NCME, 1999, p. 9).

Foundation for position in the Test Standards Concept of validity in the Test Standards builds on the work of major validity theorists Cronbach (1971, 1980, 1988, 1989) Kane (1993) Messick (1975, 1989) Shepard (1993)

Kane (2006) Argument-Based Approach Interpretive Argument: specification of proposed interpretations and uses of Validation Argument: evaluation of the interpretive argument Builds on earlier work by Cronbach (1989), Kane(1992), Messick (1989), and Shepard (1993)

Validity Argument (Cronbach, 1988)  Functional perspective  Political perspective  Operationalist perspective  Economic perspective  Explanatory perspective

Guiding Questions Shepard (1993) “What does the testing practice claim to do? What are the arguments for and against the intended aims of the test? What does the test do in the system other than in claims?” (p. 429)

NCLB Accountability States required to administer tests of mathematics and Reading or English language arts required for all students grade 3 though 8 Science tests required for one grade in each of three levels: elementary, middle, and high school

NCLB Accountability (continued) States had to adopt academic achievement standards defining proficient performance and two other levels (usually called basic and advanced) States had to establish targets, known as annual measurable objectives (AMO’s) that would be on trajectories that would lead to all students being at the proficient level or above by 2014

NCLB Targets Current Status: AMO is percent proficient each year that is set to be on a trajectory to 100% proficient or above by 2014 Change: Safe harbor allows school to make AYP if percentage of students is reduced by at least 10% compared to previous year

NCLB Targets (continued) Disaggregated reporting for subgroups Economically disadvantaged students Major racial and ethnic groups Students with disabilities Students with limited English proficiency

NCLB Targets (continued) Subgroup reporting Critical for monitoring the closing of gaps in achievement No real relevance for small schools with homogeneous student bodies However, it leads to many hurdles that large, diverse schools must meet

Multiple-Hurdle Approach NCLB uses multiple-hurdle approach Schools must meet multiple targets each year – participation and achievement separately for reading and mathematics for the total student body and for subgroups of sufficient size

Multiple-Hurdle Approach (continued) Many ways to fail to make AYP (miss any target), but only one way to make AYP (meet or exceed every target) Large schools with diverse student bodies at a relative disadvantage in comparison to small schools or schools with relatively homogeneous student bodies

Growth Models Growth Pilot Program: Percentage of students who are either proficient or on a growth trajectory toward proficient within three years Restriction of growth results for AYP by rapid growth trajectory has meant that few schools that would not make AYP under status approach do so because of growth approach

Primary Use and Interpretation of Test Results for NCLB Use: Identification of schools as making or failing to make AYP –Schools that fail to make AYP two or more years in a row placed in “needs improvement” category Interpretation: Schools that make AYP or better or more effective than schools that fail to make AYP

Multi-level Interpretations Validity of interpretations of individual student scores not equivalent to validity of interpretations of aggregate results (Zumbo & Forer, in press) Need to think in terms of validation at aggregate level (e.g., school or school district) as well as individual student level

Validation of School Quality Inference Validating the claim that if school A makes AYP it is of higher quality or more effective than school B that fails to make AYP requires elimination of plausible hypotheses for difference in AYP status –AYP differences due to higher achievement at school A higher than school B in earlier years, e.g., when children enter school –AYP Differences due to differences in demographics –Differences due to differences in parental support

Inferences from Growth Models Growth models rule out the alternate explanation of differences in prior achievement Nonetheless, causal inferences about school effectiveness are not justified by the growth approach to test-based accountability (Raudenbush, 2004, Rubin, Stuart, & Zanutto, 2004)

Growth Model Results Many rival explanations to between-school differences in growth besides differences in school quality or effectiveness Results better thought of as descriptive for generating hypotheses about school quality that need to be evaluated

School Characteristics and Instructional Practice School differences in achievement and in growth describe outcomes and can be the source of hypotheses about school effectiveness Accountability systems need to be informed by direct information about school characteristics and instructional practices

NCLB Peer Review Peer Review Purposes 1.Inform states about what would be Useful Evidence 2.Guide review teams who advise the Department

Validity Evidence for Peer Review Related to test content Based on relationships to other variables Based on student response processes Based on internal structure Alignment of assessments to content standards Based on consequences of assessments

Consequences and Validity “Perhaps the most contentious topic in validity is the role of consequences” (Brennan, 2006, p. 8). Although investigations of consequence of test uses commonly referred to as “consequential validity”, Messick did not use that designation.

Messick’s Facets of Validity Test Interpretation Test Use Evidential Basis Construct Validity Construct Validity + Relevance/Utility Consequential Basis Value Interpretations Social Consequences

Controversy Many experts (e.g., Popham, Mehrens, Green, Ebel, and, most recently, Lissitz and Samuelson) have argued that consequences should not be considered part of validity, while others (e.g., Lane, Linn, Moss, Shepard, Brennan, and Kane) have argued that they should be considered as part of validity.

Controversy (continued) Fairly broad agreement that it is important to look at positive and negative effects of test use as part of overall evaluation, even if such and evaluation is considered beyond the scope of validation, per se.

Peer Review Guidance on Consequences “In validating an assessment, the State must also consider the consequences of its interpretation and use. Messick (1989) points out that these are different functions and that the impact of an assessment can be traced either to an interpretation or to how it is used. Furthermore, as in all evaluative endeavors, States must attend not only to the intended outcomes, but also to unintended effects” (U.S. Department of Education, 2004, p. 33).

Test Standards Narrow view of consequences and validity –Consequences that are directly due to the way in which the construct is measured –Degree to which intended benefits are realized –Excludes “evidence that may inform decisions about social policy but falls outside the realm of validity

Test Standards 1.24 “When unintended consequences result from test use an attempt should be made to investigate whether such consequences arise from the test’s sensitivity to characteristics other than those it is intended to assess or to the test’s failure fully to represent the intended construct” (1999, p. 23).

Michael Kane “Consequences have always been a part of our conception of validity… Traditional definitions of validity in terms of how well a testing programs achieves its goals… necessarily raise questions about consequences, positive and negative” Kane, 2006, p. 54).

Consequences of Uses of NCLB Assessments Controversy regarding consequences as a component of validity, but not about the importance of evaluating consequences Frameworks –Bill Mehrens –Suzanne Lane and her colleagues

Mehrens Framework Curricular and instructional reform Teacher motivation and stress Student motivation and self concept Changes in student achievement Public awareness of student achievement

Lane, et al Framework Identification of a set of propositions about consequences that are central to an interpretive argument –(e.g., School administrators and teachers are motivated to adapt instruction and curriculum to the content standards) –(e.g., students are motivated to learn as well as to perform their best on the assessment) Teacher and student questionnaires and interviews regarding motivation and instructional practices Collection of multiple indicators of student achievement

Frameworks of Lane and Mehrens Applicable to the status approach to AYP as well as to growth model approach to AYP, and/or other types of accountability uses of growth models, e.g., value-added models. With growth models the emphasis on student learning may be greater than in a status approach to accountability.

Curricular and instructional reform Questionnaire studies of are most common –Teachers –Principals Interviews –Teachers –Principals Qualitative studies Collection of instructional artifacts

Teacher motivation and stress - Student motivation and self concept Questionnaire studies are most common –Teachers –Students Interviews –Teachers –Students Qualitative studies

Student achievement Center on Education Policy –Tracked trends on state tests before and after enactment of NCLB –Tracked size of achievement gaps –Compared trends in achievement and gaps on state tests to NAEP –Generally modest increases in achievement and modest reductions in size of gaps –Doesn’t prove effect of NCLB tests but generally consistent with intention

Alternate Assessments Inclusion of students with severe cognitive disabilities in alternate assessments intended to improve learning for those students Inclusion judged to be having positive effects on students participating in alternate assessments Need more evidence of influence on instruction for included students and effects on their learning

End-of-Course Tests Use of questionnaires, interviews, and collection of instructional artifacts to document changes in –Rigor of courses and instruction –Uniformity of instruction across schools –Student course taking patterns –Student dropout rates

Conclusion Two major validity issues yet to be addressed by states regarding their NCLB testing programs 1.Validity of inferences about school quality based on test-based AYP determinations for schools 2.Consequences of state testing programs used for purposes of NCLB Neither issue is easy to address, but both are important to the justification of state testing programs used for NCLB

“Validation is doing your damnedest with your mind – no holds barred. Eddington, as you know said that about science” (Cronbach, 1988, p. 14).