BILC Standardization Efforts & BAT, Round 2

Slides:



Advertisements
Similar presentations
The Teacher Work Sample
Advertisements

A Tale of Two Tests STANAG and CEFR Comparing the Results of side-by-side testing of reading proficiency BILC Conference May 2010 Istanbul, Turkey Dr.
PEER REVIEW OF TEACHING WORKSHOP SUSAN S. WILLIAMS VICE DEAN ALAN KALISH DIRECTOR, UNIVERSITY CENTER FOR ADVANCEMENT OF TEACHING ASC CHAIRS — JAN. 30,
BUREAU FOR INTERNATIONAL LANGUAGE COORDINATION BUREAU DE COORDINATION LINGUISTIQUE INTERNATIONALE NATOSPEAK: ENGLISH IN MULTINATIONAL SETTINGS.
Consistency of Assessment
Creating Research proposal. What is a Marketing or Business Research Proposal? “A plan that offers ideas for conducting research”. “A marketing research.
BILC Standardization Initiatives and Conference Objectives
Assessment Literacy for Language Teachers by Peggy Garza Partner Language Training Center Europe Associate BILC Secretary for Testing Programs.
6 th semester Course Instructor: Kia Karavas.  What is educational evaluation? Why, what and how can we evaluate? How do we evaluate student learning?
Quality in language assessment – guidelines and standards Waldek Martyniuk ECML Graz, Austria.
Bureau for International Language Coordination
Lynn Thompson Center for Applied Linguistics Startalk Network for Program Excellence Chicago, Illinois October 16-18, 2009 Formative and Summative Assessment.
How to Write a Critical Review of Research Articles
Evaluating a Research Report
Standardizing Testing in NATO Peggy Garza and the BAT WG Bureau for International Language Co-ordination.
Bureau for International Language Coordination Julie J. Dubeau BILC Secretary Istanbul, Turkey May 24, 2010.
BILC UPDATE Rome, Italy, June 8, 2009 BILC Secretary & D/Secretary Bureau de Coordination Linguistique Internationale Bureau for International Language.
Elkin High School Beth Felts.  Introduce students to the principles, concepts, and software applications used in the management of projects  Through.
NATO BAT Testing: The First 200 BILC Professional Seminar 6 October, 2009 Copenhagen, Denmark Dr. Elvira Swender, ACTFL.
Action Plans for Test Development and Administration STANAG 6001 BILC Conference, Mons, SHAPE, Belgium, September 2013 Ludmila Ianovici-Pascal Head, Language.
Using the IRT and Many-Facet Rasch Analysis for Test Improvement “ALIGNING TRAINING AND TESTING IN SUPPORT OF INTEROPERABILITY” Desislava Dimitrova, Dimitar.
Welcome! - Current BILC activities. - Comments regarding the theme of this seminar. Dr. Ray T. Clifford BILC Seminar, Vienna 8 October 2007.
Cpt. Nato Jiadze Ms. Mzia Skhulukha Ms. Mzia Skhulukha J-7 Joint Staff of Georgia Success in Training and Testing in Georgian Armed Forces (GAF) Teaching.
Benchmark Advisory Test (BAT) Update BILC Conference Athens, Greece Dr. Ray Clifford and Dr. Martha Herzog June 2008.
Monterey, USA October, 2011 “ Furthering our Training Goals Through Research” BILC UPDATES Julie J. Dubeau & Jana Vasilj-Begovic BILC Secretaries.
BILC Testing Seminars Language Testing Seminar (LTS) Advanced Language Testing Seminar (ALTS)
Peggy Garza Associate BILC Secretary For Testing Programs Standardization Initiatives.
Study group #3: NATO STANAG 6001 Ed 3 Level 4 Testing Mission statement: The aim of this study group is to formulate recommendations which could alleviate.
UPDATE ON EDUCATOR EVALUATIONS IN MICHIGAN Directors and Representatives of Teacher Education Programs April 22, 2016.
CDIO: Overview, Standards, and Processes (Part 2) Doris R. Brodeur, November 2005.
AAPPL Assessment Follow Up June What is AAPPL Measure? The ACTFL Assessment of Performance toward Proficiency in Languages (AAPPL) is a performance-
Job Titles Examples Used for HISD Nonexempt Jobs
School – Based Assessment – Framework
Test Validation Topics in the BILC Testing Seminars
Standards-Based Assessment Linking up with Authentic Assessment
BILC and Workshop Overview
50 Years of BILC: The Evolution of STANAG – 2016 and the first Benchmark Advisory Test Ray Clifford 24 May 2016.
English language Proficiency for Aeronautical Communication
Oral Proficiency Interview Workshop for Language Education Majors
Introduction to the Workshop
ECML Colloquium2016 The experience of the ECML RELANG team
Phyllis Lynch, PhD Director, Instruction, Assessment and Curriculum
Bureau for International
Test Standardization: From Design to Concurrent Validation
STANAG 6001 Testing Update and Introduction to the 2017 Workshop
Language Proficiency Assessment Detlev Kesten Associate Provost, Academic Support.
ASSESSMENT OF STUDENT LEARNING
Skopje, 5-7 Sept STANAG 6001 Testing Workshop
EALTA MILSIG: Standardising the assessment of writing across nations
BILC Professional Seminar Helsinki, Finland 4-8 October 2015 “Increasing proficiency levels: What works and what doesn't “ Keith L Wert, BILC Chair
Homework questions How does ACTFL define a beginning level learner? (p.30) What are the principles for teaching speaking to beginning learners? (pp.36-40)
Chief of English Testing, Language Programs
Brno, Sept STANAG 6001 Testing Workshop BILC Update
Working Group on Level 4 Updates
Standard Setting for NGSS
Roadmap Towards a Validity Argument
LEVEL 4 READING PROFICIENCY
BILC Updates “DEVELOPING OPERATIONAL PROFICIENCY”
Job Analysis CHAPTER FOUR Screen graphics created by:
Best Practices in STANAG 6001 Testing
STANAG 6001 Testing Workshop
FUTURE BILC THEMES AND TOPICS
Training Teachers to Assess the Productive Skills
Budapest, Oct BILC Professional Seminar Authenticity in Training and Testing: Making It Real BILC Update BILC Secretariat.
Unit 7: Instructional Communication and Technology
“Language is the most complicated human behaviour” ”
Adrian Enright EUROCONTROL
TriaLling BAT2 Writing Prompts
Peggy Garza Associate BILC Secretary
BILC ANNUAL CONFERENCE 2019 Tartu, Estonia
Presentation transcript:

BILC Standardization Efforts & BAT, Round 2 Peggy Garza and Roxane Harrison

From NATO Standardization Office emails Be wise –STANDARDIZE Why standardize? Testing is a national responsibility. Linguistic interoperability cannot occur without standardization. NATO and Partner nations must have a common interpretation of the STANAG 6001 descriptors and how to accurately assess the language skills of their armed forces IAW STANAG 6001. Is the STANAG 6001 document enough to ensure standardization of language assessment across the alliance?

BILC Standardization Efforts Language Testing Seminars BILC Assistance to Nations Annual STANAG 6001 Testing Workshops Best Practices in STANAG 6001 Testing A Toolbox for Language Testers Working Group on Level 4 Benchmark Advisory Test (BAT)

Language Testing Seminars Language Testing Seminar (LTS) 2-week foundational course offered twice a year. Since 2000, 540 participants from 53 nations/NATO offices. 40 facilitators from 20 nations. Advanced Language Testing Seminar (ALTS) 3-week highly technical course for experienced language testers offered once a year. Since 2009, 105 participants from 31 nations/NATO offices. 14 facilitators from 9 nations.

LTS ALTS Introduction to the scale: STANAG 6001 Familiarization with L1- L3 BAT protocol for testing writing Test production process Introduction to Validity Roadmap Introduction to criterion-referenced testing ALTS Standardization exercises: STANAG 6001 Focus on L3 BAT protocols for testing speaking and writing Validity Roadmap Collect evidence/ documentation Practical activities in statistical analysis for criterion-referenced tests

Assistance to national testing organizations Azerbaijan Bosnia-Herzegovina Georgia Macedonia Moldova Ukraine

STANAG 6001 Testing Workshops Who attends? LTS graduates who are current STANAG 6001 testers Goals: To promote standardization of STANAG 6001 testing by providing a forum for the exchange of information, best practices and new approaches in the field of language testing Workshop findings are used to update the testing seminars and can lead to new projects and collaboration

STANAG 6001 Testing Workshops Introduction to BILC Testing speaking 2010 Sarajevo, Bosnia and Herzegovina Testing writing 2011 Stockholm, Sweden Testing listening 2012 Copenhagen, Denmark Integrating test development stages 2013 SHAPE Mons, Belgium Item development workshop 2014 Reichenau, Austria Challenges in testing (text editing, authenticity & Level 4) 2015 Vilnius, Lithuania Testing the test – How valid is our validity argument? 2016 Brno, Czech Republic Tester success 2017 Skopje, Macedonia STANAG 6001 Testing Workshops

Findings from Brno Validity Roadmap Criterion-referenced testing Alignment activities Angoffing Statistical analysis jMetrik Item Response Theory (IRT)

Findings from Brno Building the argument Evidence Documentation

Tools and techniques to check test item quality & alignment Findings from Brno Content-Tasks-Accuracy (CTA) statements Alignment worksheets Angoffing procedures

IRT Workshop Findings from Brno 11 nations attended Objectives Introduction to IRT concepts and the advantages of using Rasch IRT analysis Using jMetrik for Rasch IRT analysis Interpretation of the data for informed decisions about tests and items Individualized consultation on national test datasets IRT is the industry standard for computer- delivered and computer-adaptive testing

Findings from Skopje Best Practices Norming raters of speaking and writing tests Tools for tester success

Best Practices in STANAG 6001 Testing Findings from Skopje Best Practices in STANAG 6001 Testing Purpose of STANAG 6001 language proficiency tests Test design Test development Test administration Hiring, training and norming testers Stakeholders’ rights and responsibilities A method or technique that has consistently shown results superior to those achieved with other means, and that is used as a benchmark.

Norming Raters of Speaking &Writing Tests Findings from Skopje Nations submitted speaking and writing test samples for possible use in norming sessions BILC STANAG 6001 testing experts reviewed the tests and provided feedback Observations/Findings Elicitation techniques affect a rater’s ability to rate Non-alignment of tasks with elicited level affect rating and test-takers’ performance Failure to establish floor and test at the level of probes affect rating – lack of “can do” evidence versus overwhelming “can’t do” evidence Challenges with distinguishing between a 2+ and threshold 3 and between 0+ and 1 Under-representation of tasks and topics affect rating reliability Rigid testing protocols (not adapted/tailored to the test-taker) exert stress on the test-taker

BILC Website - www.natobilc.org Findings from Skopje BILC Policy Recommendations Toolbox for Testers Self- assessment of testing organizations Speaking and writing samples at different levels Paper on Level 4: A Conceptual Model and Implications for Testing Soon to be added: Template for job descriptions Content-task-accuracy alignment worksheets Test item moderation guidelines and checklists Evidence for building a validity argument

Level 4: Introduction and Background Formation of BILC Working Group on Level 4 Proficiency Istanbul 2010 Aims: to assist nations feeling ill-equipped for testing at Level 4 to amplify STANAG 6001 Level 4 descriptors to discuss testing related implications STANAG 6001 Level 4 Reading Test STANAG 6001 Plus Levels

Level 4 Construct Level 4 - Expert typically only achieved by individuals who use the foreign language extensively on a daily basis as part of their profession or specialization usually more characteristic of individual ability than of job requirements evaluative comprehension – reading/listening “beyond the lines” prerequisite: higher order thinking skills (e.g., deductive and inductive reasoning, analyzing, and synthesizing) STANAG 6001 Level 4 Reading Test STANAG 6001 Plus Levels

Level 4 Language Use Language use is highly precise, nuanced and effective Readily adapt and tailor language to suit the purpose and situation Firm grasp of various levels of style and register Ability to hypothesize and persuade In very demanding academic and professional settings: challenging and high-stakes situations negotiate and persuade effectively in international environments Examples of military-related tasks at Level 4: serve as spokesperson responsible for press releases and press conferences requiring nuanced, culturally appropriate communications act as an arbiter between warring factions during a sensitive peace-keeping assignment analyze the hidden communicative intent of diplomatic pronouncements STANAG 6001 Level 4 Reading Test STANAG 6001 Plus Levels

Level 4 WG Products Paper: “NATO STANAG 6001 Level 4 Language Proficiency – A Conceptual Model and Implications for Testing” (April 2013) Article “Defining and Assessing STANAG 6001 Level 4 Language Proficiency” (Chapter 10, Language in Uniform, Cambridge Scholars, 2015) Level 4 Reading Test Specifications Tutorial (principles of text rating, differences btw Level 3 and Level 4 texts, sample texts, sample test development procedure, etc.) Level 4 Test Prototype (Tester and Examinee Booklets for speaking/writing modality; administration and rating procedure) Test Familiarization Guide Feedback Questionnaire STANAG 6001 Level 4 Reading Test STANAG 6001 Plus Levels

Familiarization Workshop on L4 Reading Proficiency The workshop will cover the interpretation of the L4 reading descriptor, the differences between L3 and L4 reading texts and skills, testing techniques, rating criteria and scoring procedures 23-27 October 2017 Where: Partner Language Training Center Europe (PLTCE) Garmisch-Partenkirchen, Germany POC: Jana Vasilj-Begovic

Advisory measure for nations Evidence of standardization Benchmark Advisory Test (BAT) Then & Now Round 1: 2009 Round 2: 2018 Advisory measure for nations Evidence of standardization

BAT Purpose To provide an external measure against which nations can compare their national STANAG 6001 test results To promote relative parity of scale interpretation and application across national testing programs To standardize what is tested and how it is tested in the Alliance

BAT History Launched as a volunteer, collaborative project The BILC Test Working Group 13 members from 8 nations Contributions received from many other nations The original goal was to develop a reading test for Levels 1-3

BAT History (continued) Later awarded a competitive contract by ACT ACTFL working with BILC Working Group To develop tests in 4 skill modalities Reading and Listening tests piloted and validated Speaking and Writing tests developed Testers and raters trained and certified Test administration and reporting protocols developed 200 BAT 4-skills tests allocated under the contract Tests administered and rated Scores reported to nations

BAT Reading and Listening Tests Internet-delivered and computer scored Criterion-referenced tests Each proficiency level is tested separately Test takers take all items for Levels 1, 2, 3 20 texts at each level; one item with multiple choice responses per text The proficiency rating is based on “Floor” – sustained ability across a range of tasks and contexts specific to one level “Ceiling” – non-sustained ability at the next higher proficiency level

BAT Speaking Test Telephonic Oral Proficiency Interview Goal is to a produce a speech sample that best demonstrates the speaker’s highest level of spoken language ability across the tasks and contexts for the level Interview consists of Standardized structure of “level checks” and “probes” NATO- specific role-play situation Conducted and rated by one certified BAT-S Tester Independently second rated by a separate certified tester or rater Ratings must agree exactly Level and plus level scores are assigned Discrepancies are arbitrated

BAT Writing Test Internet-delivered Open constructed response Four, multi-level, prompts Prompts target tasks and contexts of STANAG levels 1,2,3 NATO- specific prompt Rated by a minimum of two certified BAT-W Raters Ratings must agree exactly Level and plus level scores are assigned Discrepancies are arbitrated

2009 BAT Administration Allocation to 11 nations All nations completed testing Testing began in May, 2009 Tests administered by LTI, the ACTFL Testing Office

Overview of all 2009 tests BAT Scores by Level Skill BAT Listening 177 Speaking 157 Reading 176 Writing 178

Comparing Scores by Level Listening - 2009 BAT-L National Test Level 0 2 - Level 1 35 20 Level 2 47 46 Level 3 67 74 Level 4 11 Observations – Listening Scores Exact agreement of BAT and National Scores is 59% When there is disagreement In 52 cases, disagreement is across one contiguous level In 10 cases, disagreement is across two levels (1/3 and 2/4) National score is HIGHER in 81% of the cases

Comparing Scores by Level Speaking - 2009 BAT-S National Test Level 1 41 21 Level 2 76 50 Level 3 20 60 Level 4 - 6 Observations – Speaking Scores Exact agreement of BAT and National Scores is 50% When there is disagreement In 62 cases, disagreement is across one contiguous level In 8 cases, disagreement is across two levels (1/3 and 2/4) National score is HIGHER in 97% of the cases

Comparing Scores by Level Reading - 2009 BAT-R National Test Level 1 32 13 Level 2 42 56 Level 3 76 70 Level 4 - 11 Observations – Reading Scores Exact agreement of BAT and National Scores is 65% When there is disagreement In 49 cases, disagreement is across one contiguous level In 3 cases, disagreement is across two levels (1/3 and 2/4) National score is HIGHER in 81% of the cases

Comparing Scores by Level Writing - 2009 BAT-W National Test Level 1 56 21 Level 2 93 80 Level 3 4 41 Level 4 - 11 Observations – Writing Scores Exact agreement of BAT and National Scores is 47% When there is disagreement In 66 cases, disagreement is across one contiguous level In 15 cases, disagreement is across two levels (1/3 and 2/4) National score is HIGHER in 98% of the cases

Accounting for Divergence For receptive skills Compensatory cut score setting Lack of alignment of author purpose, text type, and task in some test items Inadequate item response alternatives For productive skills Inadequate tester/rater norming Inconsistencies in interpretation of STANAG 6001

Benchmark Advisory Test (BAT) Round 2 Aim: Provide evidence of standardization of STANAG 6001 testing across the nations 21 nations interested in participating 10 tests per nation Will take place in 2018 First steps Application process BAT Speaking & Writing norming sessions

2018 BAT Application Process Nations complete questionnaire describing national STANAG 6001 tests Test design Administration procedures To minimize technical problems in test administration To provide insights into possible reasons for misalignment, if any Why?

2018 BAT Speaking & Writing Norming Sessions Norming sessions - maximum 10 participants What Experienced STANAG 6001 testers Who January 2018 & July 2018 When Partner Language Training Center Europe Where

Standardization BAT A Toolbox for Language Testers Language Testing BILC Assistance to Nations A Toolbox for Language Testers Language Testing Seminars Standardization Norming speaking and writing raters Annual STANAG 6001 Testing Workshops Best Practices in STANAG 6001 Testing

Thank you.