A Tale of Two Tests STANAG and CEFR Comparing the Results of side-by-side testing of reading proficiency BILC Conference May 2010 Istanbul, Turkey Dr.

Slides:

Advertisements

Similar presentations

Assessment types and activities

Advertisements

Program Goals Just Arent Enough: Strategies for Putting Learning Outcomes into Words Dr. Jill L. Lane Research Associate/Program Manager Schreyer Institute.

Principles and Standards for Learning English as a Foreign Language in Israel Schools ENGLISH Curriculum for all Grades.

TESTING SPEAKING AND LISTENING

TELPAS Grades 2-12 Holistic Rating Training Spring 2010 Hitchcock ISD.

Types of Tests. Why do we need tests? Why do we need tests?

Language Assessment System (LAS) Links TM Census Test.

Discover potential. Expand global opportunity. Copyright © 2011 by Educational Testing Service. All rights reserved. ETS, the ETS logo, LISTENING. LEARNING.

Educator Evaluations Education Accountability Summit August 26-28,

Language Proficiency Assessment Commitee (LPAC)

ELL (English Language Learner) Program.  An ELL student is a student who:  Was not born in the United States  Or whose native language is not English.

1 The New Adaptive Version of the Basic English Skills Test Oral Interview Dorry M. Kenyon Funded by OVAE Contract: ED-00-CO-0130 The BEST Plus.

JHLA Junior High Literacy Assessment. The school year saw the first administration of the Junior High Literacy Assessment. The assessment was.

Language Testing Introduction. Aims of the Course The primary purpose of this course is to enable students to become competent in the design, development,

About Matrigma Welcome to this course section. This section gives an introduction to Matrigma. This course section will take about minutes to complete.

 Here’s What... › The State Board of Education has adopted the Common Core State Standards (July 2010)  So what... › Implications and Impact in NH ›

Common European Framework of Reference for Languages (CEFR): Learning, Teaching, Assessment Nuppu Tuononen Palmenia Centre for Continuing Education

Teacher Institute Day August 20, 2012 Lincoln Elementary School.

1 DEVELOPING ASSESSMENT TOOLS FOR ESL Liz Davidson & Nadia Casarotto CMM General Studies and Further Education.

Becoming a Teacher Ninth Edition

The BILC BAT: A Research and Development Success Story Ray T. Clifford BILC Professional Seminar Vienna, Austria 11 October.

Evaluating the Validity of NLSC Self-Assessment Scores Charles W. Stansfield Jing Gao Bill Rivers.

Building Effective Assessments. Agenda  Brief overview of Assess2Know content development  Assessment building pre-planning  Cognitive factors  Building.

Arkansas Foreign Language Curriculum Framework (2007) Presented by Dr. Ellen Treadway Arkansas Department of Education Updated: February 28, 2008.

A California Perspective Sally Mearns, with thanks to: Phyllis Jacobson, California Commission on Teacher Credentialing Helene Chan, PACT Guru.

ACCESS for ELLs® Interpreting the Results Developed by the WIDA Consortium.

Ways for Improvement of Validity of Qualifications PHARE TVET RO2006/ Training and Advice for Further Development of the TVET.

UKNARIC conference Understanding IELTS scores explanation and practical exercise.

Administering ELDA K & ELDA 1-2 English Language Development Assessment Assessing ELL Students in the Primary Grades Developed by the Limited English Proficient.

ELA Common Core Shifts. Shift 1 Balancing Informational & Literary Text.

What do IELTS candidates have to do?  Candidates must do all four test modules:  Listening  Reading  Writing  Speaking.

Language and Content-Area Assessment Chapter 7 Kelly Mitchell PPS 6010 February 3, 2011.

Bureau for International Language Coordination Julie J. Dubeau BILC Secretary Istanbul, Turkey May 24, 2010.

Title of Training Module Design Plan

Creating Rubrics. Information taken from Formative Assessment and Standards-Based Grading Robert Marzano 2010.

NATO BAT Testing: The First 200 BILC Professional Seminar 6 October, 2009 Copenhagen, Denmark Dr. Elvira Swender, ACTFL.

FCE First Certificate in English. What is it ? FCE is for learners who have an upper- intermediate level of English, at Level B2 of the Common European.

8 Strategies for the Multiple Choice Portion of the AP Literature and Composition Exam.

Assessment and Testing

COUNCIL OF CHIEF STATE SCHOOL OFFICERS (CCSSO) & NATIONAL GOVERNORS ASSOCIATION CENTER FOR BEST PRACTICES (NGA CENTER) JUNE 2010.

UKNARIC conference Understanding IELTS scores

Benchmark Advisory Test (BAT) Update BILC Conference Athens, Greece Dr. Ray Clifford and Dr. Martha Herzog June 2008.

Developing a curriculum according to Job Requirements Elias Papadopoulos Instructor of English as a foreign language. Examiner of officers and non-commissioned.

TOEFL EXAM By: Alexandra Alfonso Code: TOEFL The Test of English as a Foreign Language (TOEFL) measures the ability of nonnative speakers of English.

What are competencies?  Emphasize life skills and evaluate mastery of those skills according to actual leaner performance.  Competencies consist of.

Stages of Test Development By Lily Novita

Second Language Acquisition Important points to remember.

Standardized Testing EDUC 307. Standardized test a test in which all the questions, format, instructions, scoring, and reporting of scores are the same.

COURSE AND SYLLABUS DESIGN

Convergences between modern languages and language(s) of schooling – Sweden –

Introduction to the Instructional Materials Evaluation Tool (IMET): ELA

FSM NSTT Teaching Competency Test Evaluation. The NSTT Teaching Competency differs from the three other NSTT tests. It is accompanied by a Preparation.

English as an Additional Language or Dialect 2014/21125 © 2014 School Curriculum and Standards Authority.

Breakout Discussion: Every Student Succeeds Act - Scott Norton Council of Chief State School Officers.

AAPPL Assessment Follow Up June What is AAPPL Measure? The ACTFL Assessment of Performance toward Proficiency in Languages (AAPPL) is a performance-

BILC Conference Athens, Greece 22 – 26 June 2008 Ray T. Clifford

Introduction Tony Cortez, Account Executive

The New Illinois Learning Standards

50 Years of BILC: The Evolution of STANAG – 2016 and the first Benchmark Advisory Test Ray Clifford 24 May 2016.

Assessment of Learning 1

How Accurate Are Self-Assessments of Second Language Proficiency?

Smarter Balanced Assessment Results

Test Standardization: From Design to Concurrent Validation

BILC Conference Prague 2012

The Stranger Timed Write Exam

Understanding Your Child’s Report Card

The New Illinois Learning Standards

Converting proficiency levels from one language scale into another

COMPETENCIES & STANDARDS

Using the 7 Step Lesson Plan to Enhance Student Learning

Presentation transcript:

A Tale of Two Tests STANAG and CEFR Comparing the Results of side-by-side testing of reading proficiency BILC Conference May 2010 Istanbul, Turkey Dr. Elvira Swender, ACTFL

With apologies to the author

We had a “Dickens of a time” with this study.

Overview Two systems: STANAG and CEFR Two systems: STANAG and CEFR Two tests of reading proficiency Two tests of reading proficiency BAT-Reading BAT-Reading Leipzig Test of Reading Proficiency (LTRP) Leipzig Test of Reading Proficiency (LTRP) The side-by-side study The side-by-side study Observations Observations Questions Questions

Two Systems

Why is there a need to relate STANAG and CEFR? To recognize linguistic abilities of military personnel in civilian society To recognize linguistic abilities of military personnel in civilian society To provide a framework to military institutions in nation states operating STANAG qualifications who need to equate them with CEFR for the purpose of gaining civilian recognition of military qualifications To provide a framework to military institutions in nation states operating STANAG qualifications who need to equate them with CEFR for the purpose of gaining civilian recognition of military qualifications To provide guidance to employers, trainers, non- language experts on how to interpret/evaluate CEFR qualifications To provide guidance to employers, trainers, non- language experts on how to interpret/evaluate CEFR qualifications To identify competence gaps thereby determine whether an individual is capable of undertaking a job requiring a given SLP To identify competence gaps thereby determine whether an individual is capable of undertaking a job requiring a given SLP To allow informed decisions to be made on appropriate linguistic competence To allow informed decisions to be made on appropriate linguistic competence

“Birds of a Feather”

Broad Questions? Can the two systems be compared? Can the two systems be compared? Are the two systems related? Are the two systems related? Can the two systems be aligned? Can the two systems be aligned? Can the two systems be equated? Can the two systems be equated?

Comparing CEFR and STANAG Similarities FeatureCEFRSTANAG Describe language abilities on a scale from little or no ability to that of a highly articulate speaker A1, A2, B1, B2, C1, C20+, 1, 1+, 2, 2+, 3, 3+, 4, 4+, 5 Criterion referenced Address speaking, listening, reading, and writing Describe tasks (functions), contexts, and expectations for accuracy Contain can-do statements All criteria, some of the time All criteria, all of the time

A Summary of the Major Contrasts CEFR STANAG CEFR STANAG The primary purpose is to check learners’ progress in developing communicative competence within a specific course of study. The primary purpose is to check learners’ progress in developing communicative competence within a specific course of study. The primary purpose is to test individuals’ general proficiency across a wide range of topics regardless of their course of study. The primary purpose is to test individuals’ general proficiency across a wide range of topics regardless of their course of study. The primary users of the information are the teachers and students. The primary users of the information are the teachers and students. The primary users of the information are teachers and administrators, employers. The primary users of the information are teachers and administrators, employers. By design, the CEFR is under- specified for testing of general, real-world proficiency. By design, the CEFR is under- specified for testing of general, real-world proficiency. By design, STANAG is under- specified for measuring step-by- step progress within a specific curriculum. By design, STANAG is under- specified for measuring step-by- step progress within a specific curriculum.

About this Study University of Leipzig University of Leipzig April 19-23, 2010 April 19-23, 2010 Proctored on-line tests in computer lab Proctored on-line tests in computer lab Goal was to involve five groups with 20 participants each Goal was to involve five groups with 20 participants each Levels A1, A2, B1, B2, C1 according to course enrolled Levels A1, A2, B1, B2, C1 according to course enrolled Split test design Split test design half of the participants in each group took the BAT-R test first, the other half took the RPT-E first half of the participants in each group took the BAT-R test first, the other half took the RPT-E first Tests taken on different days Tests taken on different days 2 to 3 days apart depending on group 2 to 3 days apart depending on group 90 minutes per test 90 minutes per test

Characteristics of Participants Gender Gender Female: 65%; Male 35% Female: 65%; Male 35% Age Age Average 25 (Range: 19-63) Average 25 (Range: 19-63) First language First language German (85%) German (85%) Arabic, Russian, Polish, Brazilian, Chinese, Thai Arabic, Russian, Polish, Brazilian, Chinese, Thai Mean # of years of English study in school: Mean # of years of English study in school: German students 8.7 years German students 8.7 years Foreign students: 5.1 years Foreign students: 5.1 years Enrolled in 1 of 5 different levels Enrolled in 1 of 5 different levels English Language Institute to English teacher trainees English Language Institute to English teacher trainees

BAT Reading Test Test of English reading proficiency Test of English reading proficiency Advisory scores for calibrating national proficiency tests Advisory scores for calibrating national proficiency tests STANAG 6001 (version 3), Levels 1,2,3 STANAG 6001 (version 3), Levels 1,2,3 Internet-delivered and computer scored Internet-delivered and computer scored Developed by BILC Test Working Group Developed by BILC Test Working Group Delivered by ACTFL Delivered by ACTFL

Format Criterion-referenced tests Criterion-referenced tests Allow for direct application of the STANAG Proficiency Scale Allow for direct application of the STANAG Proficiency Scale Texts and tasks are aligned by level Texts and tasks are aligned by level Each proficiency level is tested separately Each proficiency level is tested separately Test takers take all items for Levels 1,2,3 Test takers take all items for Levels 1,2,3 20 texts at each level 20 texts at each level One item with 4 multiple choice responses per text One item with 4 multiple choice responses per text

Scoring Criteria The proficiency rating is assigned based on two separate scores The proficiency rating is assigned based on two separate scores “Floor” – sustained ability across a range of tasks and contexts specific to one level “Floor” – sustained ability across a range of tasks and contexts specific to one level “Ceiling” – non-sustained ability at the next higher proficiency level “Ceiling” – non-sustained ability at the next higher proficiency level Must show “mastery” at a level to be assigned that level Must show “mastery” at a level to be assigned that level Non-compensatory scoring Non-compensatory scoring Performance at the next higher level provides evidence of random, emerging, or developing proficiency at the next higher level. Performance at the next higher level provides evidence of random, emerging, or developing proficiency at the next higher level. Developing proficiency at the next higher level indicates a + rating. Developing proficiency at the next higher level indicates a + rating.

Leipzig Test of Reading Proficiency Test of English reading proficiency for entering and exiting students at universities in the state of Saxony/Germany Test of English reading proficiency for entering and exiting students at universities in the state of Saxony/Germany To determine proficiency levels from A1 to C1 according to the CEFR To determine proficiency levels from A1 to C1 according to the CEFR For placement and certification purposes For placement and certification purposes Entrance and exit requirements in all subjects Entrance and exit requirements in all subjects Developed by the University of Leipzig under a grant from the state of Saxony Developed by the University of Leipzig under a grant from the state of Saxony

Format 5 texts with 3 questions each per level 5 texts with 3 questions each per level 15 items per level 15 items per level Multiple choice questions Multiple choice questions one correct answer and three distracters one correct answer and three distracters Entire Series of tests Entire Series of tests Combine 2 or 3 adjoining levels Combine 2 or 3 adjoining levels A1-B1 or B1-B2 or B1-C1 A1-B1 or B1-B2 or B1-C1 Version of the test used in this study Version of the test used in this study B1-C1 B1-C1

Level A1 5 texts: words each 5 texts: words each Major tasks and functions Major tasks and functions Topic recognition and comprehension of simple single facts Topic recognition and comprehension of simple single facts Content Content Basic personal and social needs Basic personal and social needs Text type Text type Very short, simple straight-forward texts: notes, post cards, simple instructions and directions Very short, simple straight-forward texts: notes, post cards, simple instructions and directions 3 MC questions per text 3 MC questions per text Global, selective, detail Global, selective, detail

Screen shot of A1 item to come (requestedfrom Helen) to come (requestedfrom Helen)

Level C1 5 texts: words each 5 texts: words each Major tasks and functions Major tasks and functions Complex information processing including inferences, hypotheses, and nuances Complex information processing including inferences, hypotheses, and nuances Content Content Academic, professional, and literary material Academic, professional, and literary material Text type Text type Op/ed pieces, analyses and commentaries, detailed technical reports, literary texts Op/ed pieces, analyses and commentaries, detailed technical reports, literary texts 3 MC questions per text 3 MC questions per text global, detail, inference global, detail, inference

Scoring Criteria Total number of points Total number of points Rate highest levels that have a combined total of at least 18 points with the lower level with at least 11 points (70%) Rate highest levels that have a combined total of at least 18 points with the lower level with at least 11 points (70%) points (60-80%) = lower level points (60-80%) = lower level points (81-100%) = higher level points (81-100%) = higher level

Findings

A1A2B1B2C1TOTAL TOTAL

Scatter Plot of Total Raw Scores LTRP Total Score BAT-R Total Score (Correlation of Total Raw Scores r =.905, p <.001)

With the current data, one could say At the lowest and highest ends of the scales there is alignment At the lowest and highest ends of the scales there is alignment No one who was rated 1 was also rated B2 or C1 No one who was rated 1 was also rated B2 or C1 No one who was rated 3 was rated A1, A2, or B1. No one who was rated 3 was rated A1, A2, or B1. The middle ranges are where there is the least amount of alignment The middle ranges are where there is the least amount of alignment A BAT-R 2 can be anything from A2 to C1 A BAT-R 2 can be anything from A2 to C1

A1A2B1B2C1TOTAL TOTAL

BAT-R LTRP 0 0 or A1 1A1 or A2, (Mostly A2) 1+ A2 or B1 (Mostly B1) 2A2, B1, B2, or C1 (Mostly B1) 2+B2 or C1 (Mostly B2) 3 B2 or C1 (Mostly C1) With the current data, one could say

LTRP BAT-R LTRP BAT-R A10 or 1 (Mostly 1) A21, 1+ or 2 (Mostly 1) B11+ or 2 (Mostly 2) B22, 2+ or 3 (Mostly 2) C12, 2+ or 3 (Mostly 3) With the current data, one could say

Estimated Probability Estimated Probability of a BAT-R Rating Based on LTRP Rating BAT-R Rating LTRP Rating A A B B C Shaded values are highest probability on the row.

What is the probability? That a BAT-R 2 is also a LTRP: That a BAT-R 2 is also a LTRP: A29% A29% B174% B174% B257% B257% C15% C15%

What is the probability? That a BAT-R 3 is also an LTRP: That a BAT-R 3 is also an LTRP: B19% B19% B218% B218% C188% C188%

What is the probability? That a LTRP B1 is also a BAT-R: That a LTRP B1 is also a BAT-R: 13% 13% 1+21% 1+21% 274% 274% 2+1% 2+1% 31% 31%

What is the probability? That a LTRP B2 is also a BAT-R: That a LTRP B2 is also a BAT-R: 1+1% 1+1% 257% 257% 2+23% 2+23% 318% 318%

Answering the Broad Questions Can the two systems be compared? YES Are the two systems related? YES Can the two systems be aligned? Somewhat Can the two systems be equated? Probably not

“Heat Chart” CEFR STANAG 6001

When comparing testing systems Ask about the purpose of the test Ask about the purpose of the test Placement, progress, prove a level, etc. Placement, progress, prove a level, etc. Ask about what the test is testing Ask about what the test is testing Is it a test of achievement, performance, proficiency? Is it a test of achievement, performance, proficiency? Does it test spontaneous abilities or rehearsed performance? Does it test spontaneous abilities or rehearsed performance? Ask about how the test scores are determined Ask about how the test scores are determined Non-compensatory Non-compensatory prove a floor and ceiling prove a floor and ceiling Total points Total points Ask if research exists Ask if research exists

Answers from a CEFR Expert CEFR is not one system. It is NOT intended to be used to transfer scores from one country to the next or from one language to another but rather to set a framework within which educators can build curricula. Not a harmonisation project Alignment is problematic because we do not know what we are aligning. Not a matter of alignment or equivalency but a matter of relationship The scale is an origin for comparison. The scale functions as exemplars and activities. The scale is a meta-framework for learning and teaching. Conversation with Nick Saville, Cambridge, England April 15, 2010

In Closing It is a far, far better thing that we do than we have ever done to know how to use test scores.

Questions? Contact:

Extra slides

Crosstabulation of Test Results