CAROLE GALLAGHER, PHD. CCSSO NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 26, 2015 Reporting Assessment Results in Times of Change:

Slides:

Advertisements

Similar presentations

Ed-D 420 Inclusion of Exceptional Learners. CAT time Learner-Centered - Learner-centered techniques focus on strategies and approaches to improve learning.

Advertisements

Session Learning Target You will gain a better understanding of identifying quality evidence to justify a performance rating for each standard and each.

STRATEGIC PLAN Community Unit School District 300 7/29/

Iowa Assessment Update School Administrators of Iowa November 2013 Catherine Welch Iowa Testing Programs.

Beyond Peer Review: Developing and Validating 21st-Century Assessment Systems Is it time for an audit? Thanos Patelis Center for Assessment Presentation.

Determining Validity For Oklahoma’s Educational Accountability System Prepared for the American Educational Research Association (AERA) Oklahoma State.

Student Learning Targets (SLT) You Can Do This! Getting Ready for the School Year.

An Overview of Preparation Program Effectiveness Measures Georgia Assessment Directors’ Association Fall 2013 Meeting Chuck McCampbell.

Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.

Implementing Virginia’s Growth Measure: A Practical Perspective Deborah L. Jonas, Ph.D. Executive Director, Research and Strategic Planning Virginia Department.

Consistency of Assessment

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Standard Setting Inclusive Assessment Seminar Marianne.

Introduction & Background Laurene Christensen National Center on Educational Outcomes National Center on Educational Outcomes (NCEO)

Chapter 5 Instrument Selection, Administration, Scoring, and Communicating Results.

1 Some Key Points for Test Evaluators and Developers Scott Marion Center for Assessment Eighth Annual MARCES Conference University of Maryland October.

Setting Alternate Achievement Standards Prepared by Sue Rigney U.S. Department of Education NCEO Teleconference March 21, 2005.

Jamal Abedi University of California, Davis/CRESST Presented at The Race to the Top Assessment Program January 20, 2010 Washington, DC RACE TO THE TOP.

Evaluation. Practical Evaluation Michael Quinn Patton.

What should be the basis of

Standards and Guidelines for Quality Assurance in the European

Classroom Assessment A Practical Guide for Educators by Craig A. Mertler Chapter 5 Informal Assessments.

performance INDICATORs performance APPRAISAL RUBRIC

Common Core State Standards & Assessment Update The Next Step in Preparing Michigan’s Students for Career and College MERA Spring Conference May 17, 2011.

Assessment Literacy Series

Reporting and Using Evaluation Results Presented on 6/18/15.

AICT5 – eProject Project Planning for ICT. Process Centre receives Scenario Group Work Scenario on website in October Assessment Window Individual Work.

Moving from Development to Efficacy & Intervention Fidelity Topics National Center for Special Education Research Grantee Meeting: June 28, 2010.

Quality in language assessment – guidelines and standards Waldek Martyniuk ECML Graz, Austria.

Martha Thurlow and Laurene Christensen National Center on Educational Outcomes CEC Preconvention Workshop #4 April 21, 2010.

Criteria for Procuring and Evaluating High-Quality Assessments: A State Perspective National Conference on Student Assessment New Orleans, Louisiana June.

PRESENTED BY THERESA RICHARDS OREGON DEPARTMENT OF EDUCATION AUGUST 2012 Overview of the Oregon Framework for Teacher and Administrator Evaluation and.

Evaluating a Research Report

CCSSO Criteria for High-Quality Assessments Technical Issues and Practical Application of Assessment Quality Criteria.

Evidence-Based Observations Training for Observers of Teachers Module 5 Dr. Marijo Pearson Dr. Mike Doughty Mr. John Schiess Spring 2012.

A Principled Approach to Accountability Assessments for Students with Disabilities CCSSO National Conference on Student Assessment Detroit, Michigan June.

Smarter Balanced Assessment System March 11, 2013.

Assessing Learning for Students with Disabilities Tom Haladyna Arizona State University.

Georgia’s Changing Assessment Landscape Melissa Fincher, Ph.D. Associate Superintendent for Assessment and Accountability Georgia Department for Education.

What you need to know about changes in state requirements for Teval plans.

Race to the Top General Assessment Session Atlanta, Georgia November 17, 2009 Louis M. (Lou) Fabrizio, Ph.D. Director of Accountability Policy & Communications.

Readiness for AdvancED District Accreditation Tuscaloosa County School System.

Ch 9 Internal and External Validity. Validity  The quality of the instruments used in the research study  Will the reader believe what they are readying.

State Practices for Ensuring Meaningful ELL Participation in State Content Assessments Charlene Rivera and Lynn Shafer Willner GW-CEEE National Conference.

Consultant Advance Research Team. Outline UNDERSTANDING M&E DATA NEEDS PEOPLE, PARTNERSHIP AND PLANNING 1.Organizational structures with HIV M&E functions.

State Support for Classroom Assessment Fen Chou, Ph.D. Louisiana Department of Education National Conference on Student Assessment June 27, 2012.

Alternative Assessment Chapter 8 David Goh. Factors Increasing Awareness and Development of Alternative Assessment Educational reform movement Goals 2000,

ANNOOR ISLAMIC SCHOOL AdvancEd Survey PURPOSE AND DIRECTION.

Using the PARCC Rubrics to Analyze Student Writing College Career Ready Conference 2015.

1 Georgia’s Changing Assessment Landscape Melissa Fincher Associate Superintendent for Assessment and Accountability Georgia Department for Education GACIS.

Understanding the 2015 Smarter Balanced Assessment Results Assessment Services.

1 Scoring Provincial Large-Scale Assessments María Elena Oliveri, University of British Columbia Britta Gundersen-Bryden, British Columbia Ministry of.

C R E S S T / U C L A Validity Issues for Accountability Systems Eva L. Baker AERA April 2002 UCLA Graduate School of Education & Information Studies.

Presented at the OSPA Summit 2012 January 9, 2012.

C R E S S T / CU University of Colorado at Boulder National Center for Research on Evaluation, Standards, and Student Testing Design Principles for Assessment.

Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.

Forum on Evaluating Educator Effectiveness: Critical Considerations for Including Students with Disabilities Lynn Holdheide Vanderbilt University, National.

If I hear, I forget. If I see, I remember. If I do, I understand. Rubrics.

Phyllis Lynch, PhD Director, Instruction, Assessment and Curriculum

Test Blueprints for Adaptive Assessments

Smarter Balanced Assessment Results

Illinois Performance Evaluation Advisory Council Update

Tracey R. Hembry, Ph.D. Andrew Wiley, Ph.D.

Brian Gong Center for Assessment

Illinois Performance Evaluation Advisory Council Update

Assessment Literacy: Test Purpose and Use

Georgia’s Changing Assessment Landscape

Roadmap November 2011 Revised March 2012

Presentation transcript:

CAROLE GALLAGHER, PHD. CCSSO NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 26, 2015 Reporting Assessment Results in Times of Change: Guidance from the Joint Standards

Purpose of this Presentation Share reminders about responsible reporting practices in current context Provide recommendations for the types of information to be communicated to stakeholders (including media) in times of change Focus on particular recommendations from updated Standards for Educational and Psychological Testing (2014) 2

Presentation Origins CCSSO sponsored original work, with support from state members in the Accountability Systems Reporting special interest group (ASR SCASS) Outcome was white paper (Gallagher, 2012) intended to provide guidance to states and other jurisdictions to promote responsible reporting of findings from measures of teacher effectiveness 3

Seminal Resources at the Core of this Work Standards for Educational and Psychological Testing (AERA, APA, NCME, 1999 and 2014) AERA Code of Ethics (2011) GRE Guide to the Use of Scores (ETS, 2011) Findings from NAEP validity studies (NCES, 2003) ED Information Quality Guidelines ED Peer Review Guidance (2009) Researchers such as Colmers, Goodman, Hambleton, Zenisky, Aschbacher, and Herman 4

Key Reminders from Joint Standards When tests are revised, users should be informed of the changes, any adjustments made to the score scale, and the degree of comparability of scores from the original and revised tests. (Standard 4.25) When substantial changes to tests occur, scores should be reported on a new scale, or a clear statement should be provided to alert users that the scores are not directly comparable with those on earlier versions of the test. (Standard 5.20) 5

Key Reminders from Joint Standards When substantial changes are made to a test, documentation should be amended, supplemented, or revised to provide stakeholders with useful information and appropriate cautions. (Standard 7.14) When an alteration to a test have occurred, users have the right to information about the rationale for that change and empirical evidence to support the validity of score interpretations from the revised test. (Standard 9.9) 6

Implications During Times of Change Changes to a core assessment component must be transparent and communicated to stakeholders via reporting tools  Test purpose (same test, new purpose)  Target population (same test, changing population)  Content assessed (new test, same purpose)  Item types (same content, new techniques)  Delivery method/mode of administration (same content, new techniques)  Scoring methods (new test, new techniques)  Performance expectations (new test, new expectations) 7

Changes in Test Purpose Examples of changes in test purpose:  Measure growth as well as status  Use for accountability at the state, school, or teacher levels  Use to determine readiness for college and career  Use for placement decisions at the K–12 or post-secondary levels What should be communicated to stakeholders?  Procedures for constructing indices of growth (e.g., gain score)  Rationale for use for this purpose and how scores will be interpreted  Resulting changes to tested content or frequency of testing  Technical evidence to support test use for this purpose  Findings from analyses of decision consistency if used for classification  How consequences are anticipated and monitored (stakes may change) 8

Changes in Population Tested Examples of changes in target population:  Census instead of self-select (e.g., ACT or SAT)  Students with disabilities formerly known as 2%  Changes in English learner population, new translations What should be communicated to stakeholders?  Documentation of development practices (e.g., universal design)  Ways in which potential sources of construct-irrelevant variance were evaluated and emerging sub-group differences will be examined  Norming decisions and practices  Technical evidence to support use for this population  Information about administration and scoring methods  How consequences are anticipated and monitored 9

Changes in Content Assessed Examples of changes in tested content:  Any new or revised standards on which tests are based  CCSS-like standards (focus on CCR, includes listening & speaking, includes practice or process standards)  NGSS-like standards What should be communicated to stakeholders?  Documentation of stakeholder involvement in development  Technical evidence related to content validity, e.g., findings from blueprint analyses or studies of alignment  Plan for communicating shifts in what is assessed at each grade  Plan for mitigating threats associated with opportunity to learn  How consequences are anticipated and monitored 10

Changes in Item Types Examples of new item types used on assessment:  Performance tasks  Technology-enhanced items What should be communicated to stakeholders?  Rationale for use/measurement theory of action  Ways in which potential sources of construct-irrelevant variance were identified and addressed  Documentation about development, administration, and scoring  Findings from small-scale tryouts, pilot, and field testing (item-level data)  Plan for analyzing and evaluating potential sub-group differences that may emerge 11

Changes in Item Delivery Method Examples of new delivery methods:  Mix of computer supported and paper-pencil  Computer adaptive What should be communicated to stakeholders?  Rationale for new approach  Ways in which potential sources of construct-irrelevant variance were considered and addressed  Detailed administration and scoring guidance  Findings from analyses for scores produced under each method  Plan for analyzing and evaluating potential sub-group differences that may emerge  Documentation of how comparability of inferences intended to be drawn will be evaluated and findings used 12

Changes in Scoring Practices Examples of changes in scoring practices:  Use of artificial intelligence (AI) scoring of text  Teacher-scoring of performance tasks  Reporting at claim or other subscore level  Combining unlike measures into composite What should be communicated to stakeholders?  Research supporting use of particular methods or rubrics  Qualifications and training of scorers  Reliability estimation procedures consistent with test structure  Precision and SEM for all scores  Findings from studies of inter-rater reliability  Documentation of rationale/methods for assigning weights 13

Changes in Performance Expectations Examples of changing performance standards:  Test now used for new purpose (stakes have changed)  Content rigor has changed  Changes in scaling decisions or in meaning of scale scores  New cut-scores What should be communicated to stakeholders?  New guidance for interpreting scores  Information about scale rationale and properties  Documentation about standard setting and SEMs in vicinity of each cut-score  Evidence of the validity of score interpretations for subgroups  How consequences are anticipated and monitored 14

In Summary... Changes to a number of assessment elements can affect the validity of inferences drawn from scores Lack of transparency can undermine stakeholder trust in the assessment system Responsible testing practices in times of change call for states to keep stakeholders informed Evidence collected should inform test users about the technical quality of the new test, its fairness for the targeted population, how scores should be interpreted, and appropriate uses of test result 15

White Paper and Other Resources 16

Details in White Paper Exemplary practices in number of states and large districts Long-standing guidance in developing comprehensive reports, based on research and best practices State-of-the-states in terms of laws, policies, and regulations that can have an impact on reporting practices 17

Key Report Features Discussed Purpose and target audience(s) Measures from which results are reported Scoring, rating, and performance levels How score was calculated and/or performance rated (e.g., criteria used) Interpretation of results Use of report or database 18