Jean-Guy Blais Université de Montréal

Slides:

Advertisements

Similar presentations

Test Development.

Advertisements

Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved

Psychometric Aspects of Linking Tests to the CEF Norman Verhelst National Institute for Educational Measurement (Cito) Arnhem – The Netherlands.

Conceptualization and Measurement

M AKING A PPROPRIATE P ASS- F AIL D ECISIONS D WIGHT H ARLEY, Ph.D. DIVISION OF STUDIES IN MEDICAL EDUCATION UNIVERSITY OF ALBERTA.

Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.

Standard setting and maintenance for Reformed GCSEs Robert Coe.

Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.

Using the CEFR in Catalonia Neus Figueras

Presented by Denise Sibley Laura Jean Kerr Mississippi Assessment Center Research and Curriculum Unit.

Chapter 4 Validity.

MCAS-Alt: Alternate Assessment in Massachusetts Technical Challenges and Approaches to Validity Daniel J. Wiener, Administrator of Inclusive Assessment.

Standard Setting Different names for the same thing Standard Passing Score Cut Score Cutoff Score Mastery Level Bench Mark.

Setting Alternate Achievement Standards Prepared by Sue Rigney U.S. Department of Education NCEO Teleconference March 21, 2005.

Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.

Standardized Test Scores Common Representations for Parents and Students.

Technical Issues Two concerns Validity Reliability

Formative and Summative Assessment

Standard Setting Methods with High Stakes Assessments Barbara S. Plake Buros Center for Testing University of Nebraska.

Establishing MME and MEAP Cut Scores Consistent with College and Career Readiness A study conducted by the Michigan Department of Education (MDE) and ACT,

Classroom Assessments Checklists, Rating Scales, and Rubrics

Review and Validation of ISAT Performance Levels for 2006 and Beyond MetriTech, Inc. Champaign, IL MetriTech, Inc. Champaign, IL.

Assessment in Education Patricia O’Sullivan Office of Educational Development UAMS.

Diagnostics Mathematics Assessments: Main Ideas  Now typically assess the knowledge and skill on the subsets of the 10 standards specified by the National.

Session 7 Standardized Assessment. Standardized Tests Assess students’ under uniform conditions: a) Structured directions for administration b) Procedures.

High School Session 1: Exploring the Critical Areas Module 1: A Closer Look at the Common Core State Standards for Mathematics.

Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.

Assessment Information from multiple sources that describes a student’s level of achievement Used to make educational decisions about students Gives feedback.

Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”

University of Ostrava Czech republic 26-31, March, 2012.

Alternative Assessment Chapter 8 David Goh. Factors Increasing Awareness and Development of Alternative Assessment Educational reform movement Goals 2000,

Mini-Project #2 Quality Criteria Review of an Assessment Rhonda Martin.

Criterion-Referenced Testing and Curriculum-Based Assessment EDPI 344.

Chapter 6 - Standardized Measurement and Assessment

LISA A. KELLER UNIVERSITY OF MASSACHUSETTS AMHERST Statistical Issues in Growth Modeling.

RelEx Introduction to the Standardization Phase Relating language examinations to the Common European Framework of Reference for Languages Gilles Breton.

Setting Performance Standards EPSY 8225 Cizek, G.J., Bunch, M.B., & Koons, H. (2004). An NCME Instructional Module on Setting Performance Standards: Contemporary.

Reliability and Validity

EVALUATING EPP-CREATED ASSESSMENTS

You Can’t Afford to be Late!

Classroom Assessments Checklists, Rating Scales, and Rubrics

Take-Home Message: Principles Unique to Alternate Assessments

CLEAR 2011 Annual Educational Conference

Assessment Fall Quarter 2007.

Classroom Assessment A Practical Guide for Educators by Craig A

ECML Colloquium2016 The experience of the ECML RELANG team

Assessments for Monitoring and Improving the Quality of Education

CHAPTER 6, INDEXES, SCALES, AND TYPOLOGIES

Chapter 6: Checklists, Rating Scales & Rubrics

Types of Tests.

The Impact of Item Response Theory in Educational Assessment: A Practical Point of View Cees A.W. Glas University of Twente, The Netherlands University.

Bay High School, Retired

Introduction to the Validation Phase

The Impact of Item Response Theory in Educational Assessment: A Practical Point of View Cees A.W. Glas University of Twente, The Netherlands University.

Journalism 614: Reliability and Validity

Classroom Assessments Checklists, Rating Scales, and Rubrics

پرسشنامه کارگاه.

RELATING NATIONAL EXTERNAL EXAMINATIONS IN SLOVENIA TO THE CEFR LEVELS

III Choosing the Right Method Chapter 10 Assessing Via Tests

Task-based assessment of students’ computational thinking skills developed through visual programming or tangible coding environments Takam Djambong.

Criterion Referencing Judges Who are the best predictors?

Bursting the assessment mythology: A discussion of key concepts

Assessment 101 Zubair Amin MD MHPE.

III Choosing the Right Method Chapter 10 Assessing Via Tests

Assessments TAP 1- Strand 5.

Understanding and Using Standardized Tests

Validity and Reliability II: The Basics

TESTING AND EVALUATION IN EDUCATION GA 3113 lecture 1

Standard Setting Zagreb, July 2009.

Assessment Fall Quarter 2007.

Presentation transcript:

Jean-Guy Blais Université de Montréal Methodological aspects related to establishing minimum standards for performance Jean-Guy Blais Université de Montréal Neuchâtel / J.-G Blais january 2008

Neuchâtel / J.-G Blais january 2008 Standards What is a standard ? Just enough Average Excellence All of the above Neuchâtel / J.-G Blais january 2008

Educational standards Manufacturing quality standard Health standards Environmental standards Educational standards Neuchâtel / J.-G Blais january 2008

Neuchâtel / J.-G Blais january 2008 Standards Education Systems, schools, teachers and students Nowadays Explicit and public standards Large-scale assessment Minimal standards for all : NCLB / AYP Fairness and accomodation Performance standards and tasks Neuchâtel / J.-G Blais january 2008

Standards Related terminology / research: Mastery assessment Criterion-referenced measurement Cut-off scores Classification Neuchâtel / J.-G Blais january 2008

Neuchâtel / J.-G Blais january 2008 Standards System’s Goals and student’s competencies Values Reforms / trends Standards Test items / Performance tasks Ratings of accomplished tasks according to standards Scoring model / scoring scale Compensatory model Conjunctive model Decision / Consequences Report Press / TV… Neuchâtel / J.-G Blais january 2008

Neuchâtel / J.-G Blais january 2008 Standards Hambleton 1980 : 16 methods Judgmental Empirical Combination «…a point on a test score scale that is used to sort examinees into two categories that reflect different level of proficiency…» Neuchâtel / J.-G Blais january 2008

Neuchâtel / J.-G Blais january 2008 Standards Jaeger 1989 Examinee centered Test centered Kane 1994: «…the performance standard is defined as the minimally adequate level of performance, …, it is the conceptual version of the desired level of competence, and the passing score is the operational version.» Berk 1995 : 20 methods Applied Measurement in Education, 1995 8(1): 50 methods Neuchâtel / J.-G Blais january 2008

Neuchâtel / J.-G Blais january 2008 Hambleton & Pitoniak 2006 : 25 methods Review of items and scoring rubric Review of candidates Review of candidate’s work Review of scores profiles Cizek & Bunch 2007 : 15 methods «…the process of establishing one or more cut scores on examinations.» Procedural process / sound technically Substantive process / fair decision Neuchâtel / J.-G Blais january 2008

Generic steps (Cisek & Bunch 2007) Standards Generic steps (Cisek & Bunch 2007) Choose a method Performance level labels / descriptions Select a panel Train participants Compile ratings / more than one round Review / consensus Document the process / Validity Neuchâtel / J.-G Blais january 2008

Neuchâtel / J.-G Blais january 2008 Standards Many studies, many reviews, 1978-2007: Regression and correlation studies Generalizability studies IRT studies Rasch studies Main feature : human judgment Neuchâtel / J.-G Blais january 2008

Different methods….different results !! Standards Many methods… Different methods….different results !! Neuchâtel / J.-G Blais january 2008

Neuchâtel / J.-G Blais january 2008 Standards Brennan 1996: Performance tasks / GStudy Task reliability is relatively small Equating scores on different performance tasks is difficult Rater reliability/consistency is not always good Haertel and Linn 1996 : «Equating test score when examinees choose which problems to attempt depends on strong untestable assumptions.» Neuchâtel / J.-G Blais january 2008

Neuchâtel / J.-G Blais january 2008 Standards Resnick and Resnick 1996 : «Because the learning of skills and concepts is partly constrained by social contingencies and partly constrained by the curriculum and the instructional process, definition of standards will always be a mixture of our understanding of the learning process and our values.» Neuchâtel / J.-G Blais january 2008

Neuchâtel / J.-G Blais january 2008 Standards ….«Much of the research and at least 30 years of operational standard setting studies lead to one conclusion: making judgements about item difficulties is neither natural nor can panellists be trained readily to make these judgments.» Hand 1997 : «What is the best classification rule ? The answer is it depends.» Neuchâtel / J.-G Blais january 2008

Neuchâtel / J.-G Blais january 2008 Standards Cizek & Bunch 2007: «The same methods used with equivalent groups of participants can produced different cut scores, sometimes very different.» The challenge of vertical scaling (equating, linking): Is there a continuous developmental construct across grades ? The further is the linking between grades the more hazardous are the results. The challenge of alternate assessments. Neuchâtel / J.-G Blais january 2008

Neuchâtel / J.-G Blais january 2008 Standards G.Stone 1996-2004-2006 : Rasch model «Individual are not very good at establishing what examinees should know or be able to do.» Theoretical inconsistencies Standards should be about content not scores. «Traditional standards cannot be expressed qualitatively, confronting the validity of meaning and the validity of score.» Neuchâtel / J.-G Blais january 2008

Neuchâtel / J.-G Blais january 2008 Standards «(methods…) fail to meet goals of judge agreement and fail to produce reproducible standards.» «Judges are asked to perform a task that is too difficult and confusing.» Estimate the probability of a minimally competent person to do something successfully. «Standards defined by judge panels are inexorably connected to their normative experiences and are therefore wholly sample dependent.» Neuchâtel / J.-G Blais january 2008

Neuchâtel / J.-G Blais january 2008 Standards Blais 2004-2007 : Qualitative standards Qualitative standards are more intuitive but they overlap, like tectonic plaques in a way. Like in the real world of evaluation/assessment. Personal development is not linear and do not occur at the same rate for everyone. Yearly standards should overlap, but yearly programme content does not overlap a lot. There is no free lunch. Neuchâtel / J.-G Blais january 2008

Standards / Conclusion Relative standards, contextual standards; are they fixed for life ? How long will they stand in a world moving fast forward ? When do we have to review them ? Each year ? Every five years ? Neuchâtel / J.-G Blais january 2008

Standards / Conclusion Much of the controversy over standard setting in education is centered around disputes over what is or should be in the best interest of the public Neuchâtel / J.-G Blais january 2008

Standards / Conclusion «How much does the past shape the future» B. Mandelbrot Neuchâtel / J.-G Blais january 2008