Consistency/Reliability

Slides:

Advertisements

Similar presentations

Performance Assessment

Advertisements

Agenda Levels of measurement Measurement reliability Measurement validity Some examples Need for Cognition Horn-honking.

© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.

In Today’s Society Education = Testing Scores = Accountability Obviously, Students are held accountable, But also!  Teachers  School districts  States.

Alternative Assesment There is no single definition of ‘alternative assessment’ in the relevant literature. For some educators, alternative assessment.

Designs to Estimate Impacts of MSP Projects with Confidence. Ellen Bobronnikov March 29, 2010.

Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.

Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.

1 An Introduction to Validity Arguments for Alternate Assessments Scott Marion Center for Assessment Eighth Annual MARCES Conference University of Maryland.

Education 3504 Week 3 reliability & validity observation techniques checklists and rubrics.

MCAS-Alt: Alternate Assessment in Massachusetts Technical Challenges and Approaches to Validity Daniel J. Wiener, Administrator of Inclusive Assessment.

CLOSING THOUGHTS The long and winding road of alternate assessments Where we started, where we are now, and the road ahead! Rachel F. Quenemoen, Senior.

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Consequential Validity Inclusive Assessment Seminar Elizabeth.

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Standard Setting Inclusive Assessment Seminar Marianne.

1 Some Key Points for Test Evaluators and Developers Scott Marion Center for Assessment Eighth Annual MARCES Conference University of Maryland October.

Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.

1 BASIC CONSIDERATIONS in Test Design 2 Pertemuan 16 Matakuliah: >/ > Tahun: >

2004 CCSSO Large-scale Conference Peasley, Deeter, Quenemoen Measurement Purgatory or Best Practice? Alternate Assessment for Students with Significant.

Assessment Population and the Validity Evaluation

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments 1 Introduction to Comparability Inclusive Assessment Seminar.

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Alignment Inclusive Assessment Seminar Brian Gong Claudia.

Classroom Assessment A Practical Guide for Educators by Craig A

Validity Lecture Overview Overview of the concept Different types of validity Threats to validity and strategies for handling them Examples of validity.

Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.

Planning for Assessment Presented by Donna Woods.

Measurement and Data Quality

Validity and Reliability

1 Development of Valid and Reliable Case Studies for Teaching, Diagnostic Reasoning, and Other Purposes Margaret Lunney, RN, PhD Professor College of.

NCLB AND VALUE-ADDED APPROACHES ECS State Leader Forum on Educational Accountability June 4, 2004 Stanley Rabinowitz, Ph.D. WestEd

Becoming a Teacher Ninth Edition

ASSESSMENT OF STUDENT LEARNING Manal bait Gharim.

EDU 385 Education Assessment in the Classroom

Assessing Students With Disabilities: IDEA and NCLB Working Together.

Classroom Assessment A Practical Guide for Educators by Craig A. Mertler Chapter 7 Portfolio Assessments.

Including Quality Assurance Within The Theory of Action Presented to: CCSSO 2012 National Conference on Student Assessment June 27, 2012.

Module 4: Systems Development Chapter 12: (IS) Project Management.

Committee on the Assessment of K-12 Science Proficiency Board on Testing and Assessment and Board on Science Education National Academy of Sciences.

Visions for the Future: Inclusive Assessments Jacqueline F. Kearns, Ed.D. University of Kentucky.

Developing Assessments for and of Deeper Learning [Day 2b-afternoon session] Santa Clara County Office of Education June 25, 2014 Karin K. Hess, Ed.D.

EDU 8603 Day 6. What do the following numbers mean?

Assessing Learning for Students with Disabilities Tom Haladyna Arizona State University.

Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.

Race to the Top Assessment Program: Public Hearing on High School Assessments November 13, 2009 Boston, MA Presenter: Lauress L. Wise, HumRRO Aab-sad-nov08item09.

Alternative Assessment Chapter 8 David Goh. Factors Increasing Awareness and Development of Alternative Assessment Educational reform movement Goals 2000,

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.

Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.

Identifying Assessments

Fitz-Albert R. Russell, PhD JP Establishing Standards for Students’ Assessment.

Reliability: Introduction. Reliability Session Definitions & Basic Concepts of Reliability Theoretical Approaches Empirical Assessments of Reliability.

ASSESSMENT IN VALUES EDUCATION Fe Josefa G. Nava, Ph. D. Lecturer College of Education, University of the Philippines, Diliman, Quezon City, DISP 2009.

Validity Evaluation NCSA Presentation NAAC/NCIEA GSEG CONSTORIA.

* Session 1 : Introduction and frameworks * Session 2 : Principles of alignment * Session 3 : Overview of assessment * Session 4 : Formative and summative,

LISA A. KELLER UNIVERSITY OF MASSACHUSETTS AMHERST Statistical Issues in Growth Modeling.

Developing the Tests for NCLB: No Item Left Behind Steve Dunbar Iowa Testing Programs University of Iowa.

PGES Professional Growth and Effectiveness System.

Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.

Review of Assessment Toolkit

Daniel Muijs Saad Chahine

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.

Multivariate Analysis - Introduction

Consequential Validity

Reliability and Validity in Research

Week 3 Class Discussion.

پرسشنامه کارگاه.

Assessment Population and the Validity Evaluation

Brian Gong Center for Assessment

Assessing Students With Disabilities: IDEA and NCLB Working Together

Chapter 8 VALIDITY AND RELIABILITY

Presentation transcript:

Consistency/Reliability Inclusive Assessment Seminar Charlie DePascale

The Assessment Triangle and Validity Evaluation (Marion, Quenemoen, & Kearns, 2006) OBSERVATION INTERPRETATION COGNITION VALIDITY EVALUATION Empirical Evidence Theory and Logic (argument) Consequential Features Reporting Alignment Item Analysis/DIF/Bias Measurement Error Scaling and Equating Standard Setting Assessment System Test Development Administration Scoring Student Population Academic Content Theory of Learning

Measurement Error All measurements contain error Many sources contribute to measurement error Features/traits/characteristics being measured Individuals or object whose features are being measured Measurement tool People using the measurement tool

Common Sources of Error General education assessments: Test content (items, length, format) Administration conditions Scoring Student (preparation, motivation, anxiety) Added sources for Alternate Assessments: Individual test administrators Confounding medical conditions Depth and breadth of the construct Complexity of the assessment instruments

Interpreting Measurement Error: Two Key Conceptions Classical Test Theory Measurement error is viewed as a unitary, global entity, though it is acknowledged to arise from a combination of sources. (Feldt and Brennan, 1989) Concern with estimating the total amount of error present. Generalizability Theory Concerned with the errors from multiple sources as separate entities. (Feldt and Brennan, 1989) Estimates the magnitude of various sources of error Pinpoints sources of error so cost-efficient measurements can be built (Brennan, 1995)

A little background on reliability Reliability has traditionally been conceptualized as the tendency toward consistency from one set of measurements to another. “The fact that repeated sets of measurements never exactly duplicate one another is what is meant by unreliability” (Stanley, 1971). There are many approaches to interpreting measurement error and quantifying reliability.

How much should we worry? Acceptable levels of reliability are contingent upon the uses of test scores. If scores are used only for school accountability purposes, reliability at the student level is not as critical. If scores are used to make decisions about students, then student-level reliability is much more of a concern. Test users need to be explicit about the intended purposes of the test and acceptable uses of the results.

Alternate Assessment Reporting Requirements Individual student-level results must be reported to students and parents. (IDEA and NCLB) There is also an expectation that the results can be used to improve individual instruction. Therefore, although student test results might not be used for accountability, we must be concerned with their reliability and report the amount of measurement error associated with each score.

Reliability of Alternate Assessments We need to find the most appropriate method to define reliability. We need to find the most accurate way to calculate and report measurement error. Each type of alternate assessment presents a unique set of challenges.

Reliability Challenges with AA-AAS Attaining and calculating reliability with small numbers of tasks – particularly when tasks are individualized for students. Determining the impact of the test administrator. Evaluating the error associated with the student based on a single test administration – some assessment designs (e.g., portfolios, observation, checklists) build in multiple test occasions.

Characterizing Measurement Error Ideally, we want to calculate and report the amount of error associated with each of the major sources. In practice, however, this is hard to accomplish. Reporting results from a single source of error (e.g., inter-rater agreement) is not nearly sufficient. This afternoon’s session will present some strategies for approaching these challenges