Chief of English Testing, Language Programs

Slides:

Advertisements

Similar presentations

Quality Control in Evaluation and Assessment

Advertisements

An introduction to the Literacy and Numeracy unit standards

Academy 2: Using Data to Assess Student Progress and Inform Educational Decisions in Culturally Responsive RTI Models Academy 2: Culturally Responsive.

A Tale of Two Tests STANAG and CEFR Comparing the Results of side-by-side testing of reading proficiency BILC Conference May 2010 Istanbul, Turkey Dr.

Collecting data Chapter 5

Testing What You Teach: Eliminating the “Will this be on the final

Achievement Level Descriptors & College Content-Readiness Webinar 1 November 15, 2012.

Alternative Assesment There is no single definition of ‘alternative assessment’ in the relevant literature. For some educators, alternative assessment.

1 Assessment Use Argument Nancy Powers Chief of English Testing Section SHAPE, Mons, Belgium Sept 2013.

Evaluating tests and examinations What questions to ask to make sure your assessment is the best that can be produced within your context. Dianne Wall.

1 The New Adaptive Version of the Basic English Skills Test Oral Interview Dorry M. Kenyon Funded by OVAE Contract: ED-00-CO-0130 The BEST Plus.

Consistency of Assessment

Overview of the CCSSO Criteria– Content Alignment in English Language Arts/Literacy Student Achievement Partners June 2014.

BILC Standardization Initiatives and Conference Objectives

Assessment Literacy for Language Teachers by Peggy Garza Partner Language Training Center Europe Associate BILC Secretary for Testing Programs.

Lessons from the moderation of controlled assessment in 2013

An Introduction to Argumentative Writing

Evaluating the Validity of NLSC Self-Assessment Scores Charles W. Stansfield Jing Gao Bill Rivers.

Validity & Practicality

Course on Data Analysis and Interpretation P Presented by B. Unmar Sponsored by GGSU PART 2 Date: 5 July

Including Quality Assurance Within The Theory of Action Presented to: CCSSO 2012 National Conference on Student Assessment June 27, 2012.

WELNS 670: Wellness Research Design Chapter 5: Planning Your Research Design.

Action Plans for Test Development and Administration STANAG 6001 BILC Conference, Mons, SHAPE, Belgium, September 2013 Ludmila Ianovici-Pascal Head, Language.

European and North Atlantic Office ICAO Regional Workshop Language Proficiency Requirements Paris September 2006 ELPAC English language Proficiency for.

Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.

Investigating the Impact of Vocabulary Strategy Training and E-Portfolios on Vocabulary Strategy Use and the Acquisition of Academic Vocabulary by Saudi.

 A test is said to be valid if it measures accurately what it is supposed to measure and nothing else.  For Example; “Is photography an art or a science?

VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.

Module 7- Evaluation: Quality and Standards. 17/02/20162 Overview of the Module How the evaluation will be done Questions and criteria Methods and techniques.

Helmingham Community Primary School Assessment Information Evening 10 February 2016.

Improve Own Learning and Performance. Progression from levels 1-3 Progression from levels 1-3 At all levels, candidates are required to show they can.

Monitoring and Assessment Presented by: Wedad Al –Blwi Supervised by: Prof. Antar Abdellah.

Argument-Driven Inquiry is designed to make reading, writing, and speaking a central component of doing science. It therefore enables science teachers.

ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.

Relating Foreign Language Curricula to the CEFR in the Maltese context

EVALUATING EPP-CREATED ASSESSMENTS

School – Based Assessment – Framework

Test Validation Topics in the BILC Testing Seminars

50 Years of BILC: The Evolution of STANAG – 2016 and the first Benchmark Advisory Test Ray Clifford 24 May 2016.

English language Proficiency for Aeronautical Communication

Introduction to the Workshop

Fullerton College SLOA Workshop:

QUESTIONNAIRE DESIGN AND VALIDATION

Chief of English Testing, Language Programs

Reliability and Validity in Research

1.13 Writing an Argument.

STANAG 6001 Testing Update and Introduction to the 2017 Workshop

GLoCALL & PCBET 2017 Joint Conference, 7-9 September 2017 at Universiti Teknologi Brunei, Brunei Darussalam, Presented at Room 1, 11:00-11:30. Effect of.

BILC Standardization Efforts & BAT, Round 2

IB Environmental Systems and Societies

VALIDITY Ceren Çınar.

Evaluate the effectiveness of the implementation of change plans

Next Generation (ACCUPLACER)

Assessing learners’ needs

Roadmap Towards a Validity Argument

Evaluation tools.

Best Practices in STANAG 6001 Testing

Basic Statistics for Non-Mathematicians: What do statistics tell us

STANAG 6001 Testing Workshop

STANAG 6001 Testing Workshop

BILC Professional Seminar - Zagreb, October 16, 2018 Maria Vargova

Assessment Use Argument

Measurement Concepts and scale evaluation

jot down your thoughts re:

BiH Test Piloting Mary Jo DI BIASE.

THE CURRICULUM STUDY CHAP 3 – METHODOLOGY.

A POCKET GUIDE TO PUBLIC SPEAKING 5TH EDITION Chapter 24

EDUC 2130 Quiz #10 W. Huitt.

Successful trialling: from trial and error to best practices

Presentation transcript:

Chief of English Testing, Language Programs BILC Testing Workshop Brno, Czech Republic September 6-8, 2016 The Evolution of STANAG 6001Test Validation practices Nancy Powers Chief of English Testing, Language Programs Department of National Defence Canada

Introduction Traditional validation Reading test Listening test

Traditional validation Traditionally, validation was seen a separate step (usually the final one) in the test development cycle Create item – level of item is based on experience of item writer Review item and revise it if necessary The text and item are not necessarily equal Trial it

Traditional validation (cont’d) Use an external criterion, a test which has already been validated, to group the candidates Determine the Discrimination Index (DI) The DI dictates at what level the item is working i.e. where in the test the item will be placed Total raw score Determine the correlation between the new test and the external criterion. If it is high, then the new test considered to be valid (concurrent validity)

Reading test After attending LTS and several BILC workshops, we incorporated validation at each step of test development, not just the final STANAG 6001 level descriptors and C/T/A statements Clifford’s text typology (criterion-referenced validity) Held several IRBs to review and revise the items

Reading test (con’t) Angoffed the items to ensure that the text and the item were at the same level, i.e. aligned Concerned with ensuring we had a proper sampling of a level (content validity) Trialled/Piloted

Reading (cont’d) Measured it against the BAT test (concurrent validity). This time, if the DI was not acceptable at a certain level, the item was discarded. Changed the way we scored as well. Instead of total raw score, looked at each level independently: REDS scoring REDS scoring supports criterion-referenced validity

Correlation Final correlation between the BAT and Version 1 = 0.88 Final correlation between the BAT and Version 2 = 0.84 Final correlation between the Version 1 and Version 2 = 0.89

Correlation?? Taken from statistics class at university: Size of correlation Interpretation 0.90 to 1.00 very high positive 0.70 to 0.90 high positive 0.50 to 0.70 moderate positive 0.30 to 0.50 low positive 0 to 0.30 little, if any

Listening test Our MTCP students – the majority between levels 1 and 2 in all skills, therefore very important for us to be able to discriminate between levels 0/0+/1 Current listening test does not discriminate well at the lower levels – designed more for proficiency levels 2-3 Lower level students leaving course with very low listening scores

Listening test (cont’d) Identified several issues with current listening test New listening test very different: paradigm shift Items that are aligned to appropriate level Computer delivered with headphones Allow for repetition of text, flag items, time management Use visuals to help focus attention Use authentic/semi authentic audio and video recordings

Listening (cont’d) Could not follow a traditional validation technique No external criterion against which to compare the new listening test Need to convince and justify to stakeholders that change is good – how to do that? Bachman and Palmer’s Assessment Use Argument framework

Assessment Use Argument Basically, it is comprised of 4 parts 1. Claims The beneficial consequences of an assessment The test will be equitable and will reflect existing educational and societal values The interpretations that are made will be meaningful, relevant and generalizable Adequate assessment records will be kept 2. Warrant – what do these claims mean?

Not everyone will agree with us 3. Rebuttal – counterclaim 4 Not everyone will agree with us 3. Rebuttal – counterclaim 4. Backing – evidence supporting the warrants includes feedback from stakeholders through questionnaires, verbal protocols, observations, interviews, previous research, statistical analyses, qualitative and quantitative data, washback effect

Evidence collected What have we done to collect evidence? Will administer BAT to control for various factors: i.e repetition of texts, computer delivered. Look at results from all three tests – current, BAT and new Hypothesize that scores will get better Triangulate these results with speaking results, as listening skills are demonstrated in OPI.

Conclusion We have expanded our view of validation from merely looking at numbers to building an argument that addresses all concerns. I believe that building a validity argument strengthens the claim that a test is a valid measure.

THANK YOU!