David Oglesby Defense Language Institute English Language Center

Slides:

Advertisements

Similar presentations

Performance Assessment

Advertisements

Presented by Eroika Jeniffer.  We want to set tasks that form a representative of the population of oral tasks that we expect candidates to be able to.

Assessment Assessment should be an integral part of a unit of work and should support student learning. Assessment is the process of identifying, gathering.

TELPAS Grades K-1 Holistic Rating Training Spring 2010 Hitchcock ISD.

TELPAS Grades 2-12 Holistic Rating Training Spring 2010 Hitchcock ISD.

KEMENTERIAN PENDIDIKAN DAN KEBUDAYAAN BADAN PENGEMBANGAN SUMBER DAYA MANUSIA PENDIDIKAN DAN KEBUDAYAAN DAN PENJAMINAN MUTU PENDIDIKAN AUTHENTIC ASSESSMENT.

Learning targets: Students will be better able to: ‘Unpack’ the standards. Describe the purpose and value of using a rubric Evaluate whether a rubric can.

Ohhhhh, Christopher Robin, I am not the right one for this job... The House on Pooh Corner, A.A. Milne.

Teaching and Testing Pertemuan 13

Test Evaluation ~assessing speaking Group Members Lulu Irena Crystal.

Testing for Language Teachers

RAPPS – Rural Alaska Principal Preparation and Support Program Selecting and Training Evaluators May 28 – 30, 2014 Learning Groups 1, 2, and 3 Hilton Hotel.

McCann Associates Presented by: Michael Childs, Barbara Dyer, Ira Taylor, Bruce Nugyen Chad Warner & Joe Koury, McCann Associates.

Performance Management

New Directions in Assessment: Professional Development and Teacher Evaluation Maurice Hauck, PhD Strategic Advisor, Assessment Division Educational Testing.

5 Criteria of Performance Measures

Challenges in Developing and Delivering a Valid Test Michael King and Mabel Li NAFLE, July 2013.

PDHPE K-6 Using the syllabus for consistency of assessment © 2006 Curriculum K-12 Directorate, NSW Department of Education and Training.

INTER-RATER AGREEMENT IN KANSAS Summer Principals Academy July 22-24, 2014 Abilene, KS.

Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.

Understanding MYP Criteria

Principles in language testing What is a good test?

Review: Cognitive Assessments II Ambiguity (extrinsic/intrinsic) Item difficulty/discrimination relationship Questionnaires assess opinions/attitudes Open-/Close-ended.

 Calibrating Your Grading Criteria with a Little Help from Milli Vanilli Lindsay Portnoy, Educational Foundations.

Measuring Complex Achievement

Alternative Assessment

Alternate Assessments: A Case Study of Students and Systems: Gerald Tindal UO.

Quality Assessment July 31, 2006 Informing Practice.

Workshops to support the implementation of the new languages syllabuses in Years 7-10.

NATO BAT Testing: The First 200 BILC Professional Seminar 6 October, 2009 Copenhagen, Denmark Dr. Elvira Swender, ACTFL.

Rubrics. What is a Rubric? A rubric is a set of guidelines for scoring performance against criteria. Performance is described along a scale of quality.

Assessment Design. Four Professional Learning Modules 1.Unpacking the AC achievement standards 2.Validity and reliability of assessments 3. Confirming.

Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.

KVEC Presents PGES Observation Calibration Are You On Target?

Alternative Assessment Chapter 8 David Goh. Factors Increasing Awareness and Development of Alternative Assessment Educational reform movement Goals 2000,

Human Society and its Environment K-6 (HSIE) Using the syllabus for consistency of assessment © 2006 Curriculum K-12 Directorate, NSW Department of Education.

Evaluating Student Presentations Speaking Intensive Workshop.

Assessment My favorite topic (after grammar, of course)

NYSED Network Team and Teacher and Principal Evaluator Training Kate Gerson -Senior Fellow Ken Slentz -Associate Commissioner June 2,

TESTING ORAL ABILITY. ORAL ABILITY Interact successfully ComprehensionProduction.

ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.

Presented by John Versluis What is Assessment? The Career and Technology Education and Training Framework.

BEHAVIOR BASED SELECTION Reducing the risk. Goals  Improve hiring accuracy  Save time and money  Reduce risk.

Typical farms and hybrid approaches

EVALUATING EPP-CREATED ASSESSMENTS

Direct vs Indirect Assessment of Student Learning: An Introduction

Scaling up EBPs and HLPs

NEEDS ANALYSIS.

Writing Rubrics Module 5 Activity 4.

Chapter 6: Checklists, Rating Scales & Rubrics

Component 2 Differentiation in Instruction

HRM – UNIT 10 Elspeth Woods 9 May 2013

Assessing all children

Roland Wilson, David Potter, & Dr. Dru Davison

Assessment, Scoring, Recording and Result Analysis

FEAPs (Florida Educator Accomplished Practices)

Bursting the assessment mythology: A discussion of key concepts

AREA OF STUDY 2: INTELLIGENCE & PERSONALITY

Win May USC Keck School of Medicine

Creating Assessable Student Learning Outcomes

An international context in higher education – outside the ENL world

The main features of assessment in the PYP

Critically Evaluating an Assessment Task

Training Teachers to Assess the Productive Skills

Deconstructing Standard 2a Dr. Julie Reffel Valdosta State University

Jennifer Rodriguez TB610-60

Alternative Assessment

Aligning Academic Review and Performance Evaluation (AARPE)

The Teacher Work Sample: An Authentic Assessment

Presentation transcript:

David Oglesby Defense Language Institute English Language Center A Case for STANAG 6001 Rater Training & Norming David Oglesby Defense Language Institute English Language Center

The Way I See It… “Two individuals can witness the same external event, but interpret and perceive it totally differently, depending on how they think about or cognitively process it.” Stephen Diamond Magura

Testing the Productive Skills We want to set tasks that form a representative sample of the population of oral tasks that we expect candidates to be able to perform The tasks should elicit behavior which truly represents the candidate’s ability The samples of behavior can and will be scored validly and reliably From Hughes’ Testing for Language Teachers

Reliability in Testing A reliable test should be a consistent measure of performance. Immune to Boredom Grin & Bear It Bored Can’t Read My Stoic Face OMG! I’m Bored Miserable Fog of Ennui Bored to Tears Levels of Boredom

Fair & Balanced Testing? 1. 2 3. I object to your subjectivity I won’t be subjected to your objectivity Rating Productive Skills

Objective vs Subjective Scoring Given this undirected graph, what would be the result of a depth first iterative traversal starting at node E? a. EABCFDG b. EDBFCG c. EDBGFCA d. EADBCFG e. EGDCFBA Write a fictional account of this young woman’s life up to the moment reflected in the picture.

Training STANAG Raters Background & Overview: Introduce scale, testing protocol & scoring conventions Initial Norming: Review & discuss benchmark language samples Guided practice: Rate samples using rating scale & rubric Mock Interviews: For speaking assessment, complete practice interviews Certification: Select raters who provide reliable and efficient ratings

Background & Overview Discuss the STANAG testing mission and purpose Introduce the modality-specific specifications and testing protocol Demonstrate/preview a model interview Issue training materials (handbook) and discuss contents

Business of Testing Protocols 1. General: Unwritten rules or guidelines that are peculiar to every culture or organization, and are supposed to be observed by all parties in the conduct of business, entertaining, negotiating, politics, etc. 2. Product development: Statement of attributes (features and benefits) that a new product must be designed to have. A product protocol is prepared by consulting all parties to the project. 3. Technology: Set of agreed upon, and openly published and distributed, standards that enables different firms to manufacture compatible devices to the same specifications. All devices made under the same protocol work with one another without any adjustment or modification.

Initial Norming Present calibrated (benchmark) performances at each base/plus level As a group, analyze the performance considering content, task & accuracy Participants identify parts of performance that constitute a ratable sample Discuss the rating with respect to rubric

Benchmarks Reduce the Likelihood of Rater Drift Level 1 Level 2 Level 3

Guided Practice Present speaking/writing performances for individual practice rating Participants report scores and criteria applied Group discussion leads to consensus (or not) Compare group rating with expert rating

Evaluating at the Microlevel Purpose of evaluating = Select level Collecting information = Elicit sample Interpreting information = Apply criteria Decision making = Determine rating

Mock Interviews for Speaking Assessment Pairs of trainees conduct mock interviews Other trainees observe & take notes After the interview, all trainees assign rating Group discusses conduct of interview, effectiveness of elicitation, ratings Trainees paired anew and repeat process Certify reliable & efficient raters

The Norming Conquest Construct-irrelevant variance can cause rater drift over time Moral dimension (severe – lenient) Halo effect Central tendency Analyze inter-rater/intra-rater reliability Renorm raters using benchmarks

People are Dying for Change “A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.” Max Plank

Questions?