Validity in Action: State Assessment Validity Evidence for Compliance with NCLB William D. Schafer, Joyce Wang, and Vivian Wang University of Maryland.

Slides:

Advertisements

Similar presentations

Test Development.

Advertisements

Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.

Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.

Designing Content Targets for Alternate Assessments in Science: Reducing depth, breadth, and/or complexity Brian Gong Center for Assessment Web seminar.

Domain 1: Planning and Preparation

Chapter 4 Validity.

1 Alignment of Alternate Assessments to Grade-level Content Standards Brian Gong National Center for the Improvement of Educational Assessment Claudia.

Seminar /workshop on cognitive attainment ppt Dr Charles C. Chan 28 Sept 2001 Dr Charles C. Chan 28 Sept 2001 Assessing APSS Students Learning.

MCAS-Alt: Alternate Assessment in Massachusetts Technical Challenges and Approaches to Validity Daniel J. Wiener, Administrator of Inclusive Assessment.

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Consequential Validity Inclusive Assessment Seminar Elizabeth.

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Standard Setting Inclusive Assessment Seminar Marianne.

1 Some Key Points for Test Evaluators and Developers Scott Marion Center for Assessment Eighth Annual MARCES Conference University of Maryland October.

Large Scale Assessment Conference June 22, 2004 Sue Rigney U.S. Department of Education Assessments Shall Provide for… Participation of all students Reasonable.

Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.

Evaluating and Revising the Physical Education Instructional Program.

Jamal Abedi University of California, Davis/CRESST Presented at The Race to the Top Assessment Program January 20, 2010 Washington, DC RACE TO THE TOP.

Meeting NCLB Act: Students with Disabilities Who Are Caught in the Gap Martha Thurlow Ross Moen Jane Minnema National Center on Educational Outcomes

Chapter 3 Preparing and Evaluating a Research Plan Gay and Airasian

Principles of High Quality Assessment

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Alignment Inclusive Assessment Seminar Brian Gong Claudia.

Understanding Validity for Teachers

Chapter 4. Validity: Does the test cover what we are told (or believe)

Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.

Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.

Technical Considerations in Alignment for Computerized Adaptive Testing Liru Zhang, Delaware DOE Shudong Wang, NWEA 2014 CCSSO NCSA New Orleans, LA June.

NCCSAD Advisory Board1 Research Objective Two Alignment Methodologies Diane M. Browder, PhD Claudia Flowers, PhD University of North Carolina at Charlotte.

BY Karen Liu, Ph. D. Indiana State University August 18,

Instrumentation.

Exploring Alternate AYP Designs for Assessment and Accountability Systems 1 Dr. J.P. Beaudoin, CEO, Research in Action, Inc. Dr. Patricia Abeyta, Bureau.

Accommodations in Oregon Oregon Department of Education Fall Conference 2009 Staff and Panel Presentation Dianna Carrizales ODE Mike Boyles Pam Prosise.

Cara Cahalan-Laitusis Operational Data or Experimental Design? A Variety of Approaches to Examining the Validity of Test Accommodations.

 Closing the loop: Providing test developers with performance level descriptors so standard setters can do their job Amanda A. Wolkowitz Alpine Testing.

Evaluating a Research Report

Including Quality Assurance Within The Theory of Action Presented to: CCSSO 2012 National Conference on Student Assessment June 27, 2012.

1 Issues in Assessment in Higher Education: Science Higher Education Forum on Scientific Competencies Medellin-Colombia Nov 2-4, 2005 Dr Hans Wagemaker.

Framework for High- Quality English Language Proficiency Standards and Assessments P r e pa r e d b y t h e Assessment and Accountability Comprehensive.

CCSSO Criteria for High-Quality Assessments Technical Issues and Practical Application of Assessment Quality Criteria.

Committee on the Assessment of K-12 Science Proficiency Board on Testing and Assessment and Board on Science Education National Academy of Sciences.

Illustration of a Validity Argument for Two Alternate Assessment Approaches Presentation at the OSEP Project Directors’ Conference Steve Ferrara American.

1 Alignment of Alternate Assessments to Grade-level Content Standards Brian Gong National Center for the Improvement of Educational Assessment Claudia.

Programme Objectives Analyze the main components of a competency-based qualification system (e.g., Singapore Workforce Skills) Analyze the process and.

Enhancing the Technical Quality of the North Carolina Testing Program: An Overview of Current Research Studies Nadine McBride, NCDPI Melinda Taylor, NCDPI.

Basics of Research and Development and Design STEM Education HON4013 ENGR1020 Learning and Action Cycles.

Standard Setting Results for the Oklahoma Alternate Assessment Program Dr. Michael Clark Research Scientist Psychometric & Research Services Pearson State.

An Analysis of Three States Alignment Between Language Arts and Math Standards and Alternate Assessments Claudia Flowers Diane Browder* Lynn Ahlgrim-Delzell.

Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.

Validity Validity is an overall evaluation that supports the intended interpretations, use, in consequences of the obtained scores. (McMillan 17)

Michigan School Report Card Update Michigan Department of Education.

State Practices for Ensuring Meaningful ELL Participation in State Content Assessments Charlene Rivera and Lynn Shafer Willner GW-CEEE National Conference.

Student Learning and Growth Goals Foundations 1. Outcomes Understand purpose and requirements of Student Learning and Growth (SLG) goals Review achievement.

1 Scoring Provincial Large-Scale Assessments María Elena Oliveri, University of British Columbia Britta Gundersen-Bryden, British Columbia Ministry of.

Chapter 6 - Standardized Measurement and Assessment

Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.

Proposed End-of-Course (EOC) Cut Scores for the Spring 2015 Test Administration Presentation to the Nevada State Board of Education March 17, 2016.

Colorado Accommodation Manual Part I Section I Guidance Section II Five-Step Process Welcome! Colorado Department of Education Exceptional Student Services.

Presentation to the Nevada Council to Establish Academic Standards Proposed Math I and Math II End of Course Cut Scores December 22, 2015 Carson City,

Copyright © Springer Publishing Company, LLC. All Rights Reserved. DEVELOPING AND USING TESTS – Chapter 11 –

Stages of Research and Development

IE 102 Lecture 6 Critical Thinking.

Consequential Validity

Concept of Test Validity

Validity and Reliability

Journalism 614: Reliability and Validity

Week 3 Class Discussion.

Standard Setting for NGSS

Timeline for STAAR EOC Standard Setting Process

William D. Schafer, Joyce Wang, and Vivian Wang University of Maryland

Assessment Literacy: Test Purpose and Use

Innovative Approaches for Examining Alignment

Presentation transcript:

Validity in Action: State Assessment Validity Evidence for Compliance with NCLB William D. Schafer, Joyce Wang, and Vivian Wang University of Maryland

Objectives review the evidence that state testing programs provide to the United States Department of Education on the validity of their assessments examine in detail the validity evidence that certain selected states provided for their peer reviews make recommendations for improving the evidence submissions supporting validity for state assessments

Data Sources official decision letters on each state's final assessment system under NCLB from USED; publicly available at peer review reports for five selected states technical reports for available states that have received full approval from USED; downloaded from the web sites of each state

Types of Validity Evidence the AERA/APA/NCME Standards lists five types of validity evidence –content-based evidence –response-process-based evidence –evidence based on internal structure –evidence based on relationships with other variables –evidence based on consequences we will look at the judgments that each type should support in the context of statewide assessments of educational achievement

Content-Based Evidence judgments that need to be supported: the domain is described in the academic content standards at the grade level the test items sample that content domain appropriately achievement level descriptions refer back to the content domain of the test

Response-Process-Based Evidence judgment that needs to be supported: the activities the test demands of students are consistent with the cognitive processes the test is supposed to represent (as implied by the content standards)

Evidence Based on Internal Structure judgment that needs to be supported: test score relationships are consistent with the strand structures of the academic content standards

Evidence Based on Relationships with Other Variables judgments that need to be supported: higher correlations occur when traits are more similar low correlations (perhaps partial on ability) exist with specific traits (e.g., gender, race- ethnicity, disability)

Evidence Based on Consequences judgments that need to be supported: test use maximizes positive outcomes test use minimizes negative outcomes

Decision Letters decision letters were viewed at the USED web site – they are public documents 19 of the states were required to provide additional validity evidence the evidence was not classified by USED, but we classified it into the five types to help make the project manageable decision-letter evidence is required by USED – it is mandatory – these elements may be thought of as necessary for states to submit

Content-Based Evidence evidence to show that assessments measure the academic content standards and not characteristics not specified in the academic content standards or grade level expectations blueprints, item specifications, and test development procedures evidence of alignment with content standards – this is an emphasis in peer review explanations of design and scoring standard setting process, results, and impact

Response-Process-Based Evidence evidence to show that items are tapping the intended cognitive processes – this sort of evidence is commonly a part of alignment studies

Evidence Based on Internal Structure item interrelationships subscale score correlations showing they are are consistent with the structures inherent to the academic content standards scoring and reporting are consistent with the subdomain structure of the content standards justification of score use given the threat (observed) that the subdomain correlations are higher between content areas than within content areas

Evidence Based on Relationships with Other Variables criterion validity relationships between test scores and external variables

Evidence Based on Consequences studies of intended and unintended consequences

Evidence from State Submissions each state submitted voluminous evidence to USED the Peer Review Reports included descriptions of the evidence submitted we had sets of Reports for five states this evidence may be over and above what is actually required

Evidence of Purposes each state was asked to provide evidence about the purposes of their assessments each state did that this is an important part of Kane’s (2006) concept of a validity argument because it does not fall into the categories of validity evidence in the USED Peer Review Guidance, we did not include it in our review

Content-Based Evidence test blueprints & construction process alignment reports –categorical concurrence (each content strand has enough items for a subscore report) –range of knowledge (the number of content elements in each strand that have items associated with them) –balance of representation (the distribution of items across the content elements within each strand) achievement level descriptions (ALDs) compared with the strand structure

Response-Process-Based Evidence alignment reports –depth of knowledge (relates the cognition tapped by each item to that implied in the statement of the element in the content standards the item is associated with) think-aloud studies (proposed)

Evidence Based on Internal Structure dimensional analysis at the item level –principal components analysis –dimensionality hypothesis testing intercorrelations among the subtest scores

Evidence Based on Relationships with Other Variables correlations with external tests of similar constructs (and dissimilar constructs) correlations with student demographics and course-taking patterns choosing and implementing accommodations for disabilities and limited English proficiency bias studies (e.g., DIF) and passage reviews universal design principles monitoring of test administration procedures

Evidence Based on Consequences longitudinal change in dropout and graduation rates and NAEP results use of results to evaluate schools and districts use of test data to improve curriculum & instruction use of adequate yearly progress reports use of tests to make promotion & graduation decisions

Synthesis of Evidentiary Needs it would be useful to have a minimum list for state regulatory submissions can we use these studies to generate a list? most likely over-inclusive using our evidence as soon as we do so, it will surely be challenged it seems reasonable to submit the following –for each test series (e.g., regular, alternate) –for each tested content and grade combination

Content Evidence content standards test blueprint item (and passage) development process item categorization rules and process forms development process (e.g., item sampling; item location; section timing) results of alignment studies

Process Evidence test blueprint (if it has a process dimension) item categorization rules and method (if items are categorized by process) results of alignment studies results of other studies, such as think- alouds

Internal Structure Evidence subscore correlations Item-subscore correlations dimensionality analyses

Relations with Other Variables convergent Evidence –correlations with independent, standardized measures –correlations with within-class variables, such as grades discriminant Evidence –correlations with standardized tests of other traits (e.g., math with reading) –correlations with within-class variables, such as grades in other contents –correlations with irrelevant student characteristics (e.g., gender) –item-level (e.g., DIF) studies

Consequential Evidence purposes of the test – as they describe intended consequences uses of results by educators trends over time studies that generate and evaluate positive and negative aspects from user input

Validity in the Accountability Context – Role of Processes majority of the evidence submitted capitalizes on well-known methods for study of the validity of a particular test form – a product but object of study in accountability is actually a process by which tests are developed & used –a test form is important only as a representative of a process of test development –programs are expected to engage in a continual process of self-evaluation and improvement

Process Evidence assume it is useful to distinguish between product evidence and process evidence –product evidence focuses on a particular test and –process evidence focuses on a testing program will review and extend some suggestions for process evidence that were originally proposed in the context of state assessment and accountability peer reviews

What is a Process? a recurring activity that takes material, operates on it, and produces a product concept is borrowed from project management could be as large as the entire assessment and accountability program could be as small as, say, the production of a test item one challenge is to organize the activities of a program into useful processes

Is Validity a Process Concept? i.e., is there a sense in which we can use the concept of the validity of a process? validity is justification for an interpretation of a score –a test form is a static element that can contribute support for an interpretation –a process is a dynamic element that can contribute support for future interpretations so we give this one a tentative “yes”

Elements of Process Evidence process –The process is described –The inputs and operating rules are laid out product –The results of the process are presented or described evaluation (how are these questions are considered) –is the process adequate? –can (or how can) it be improved? –should it be improved (e.g., do the benefits justify the costs)? improvement (how the consideration is done?) –The recommendations from the evaluation are considered for implementation in order to improve the process

Examples of Process Evidence three examples of these four elements of process evidence follow they vary markedly in scope –small to large –illustrate the nature of process evidence for different contexts within an assessment and accountability program

Bias and Sensitivity Committee Selection process. desired composition, generation of committee members, contacting potential members, proposed meeting schedule, etc. product. committee composition, especially the constituencies represented. evaluation. comparison of actual with desired composition, follow up with persons who declined, suggestions for improvement. improvement. who has responsibility to consider the recommendations generated by the evaluation, how they go about their analysis, how change is implemented in the system, examples of changes that were made in the past to document responsiveness

Alignment process. test blueprint, items, item categorizations, sampling processes product. a test form evaluation. alignment study improvement. review of study recommendations, plan for future

Psychometric Adequacy of a Test Form process. the analyses that are performed. product. technical manual evaluation. review by a group such as a TAC, recommendations for the manual as well as the testing program improvement. consideration of recommendations, plan for future

Making Judgments About Processes two typically independent layers of judgment first layer is an evaluation that makes recommendations about improvement second layer considers them in many cases, second layer would be an excellent way for a state to use its TAC

Judging Process Evidence process evidence by definition describes processes it should be judged by how well it describes processes that support interpretations based on future assessments it should also be judged on how well it describes processes that lead to improvements in the program

Possible Criteria for Process Evidence data are collected from all relevant sources data are reported completely and efficiently reviewed by persons with appropriate expertise review is conducted fairly review results are reported completely and efficiently recommendations are suggested in the reports consideration given to the recommendations past actions based are presented as evidence that the process results in improvement