Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado.

Slides:

Advertisements

Similar presentations

Assessment types and activities

Advertisements

Ordinate Corporation Menlo Park, California 1 Workable Models of Standard Performance in English & Spanish 2 June 2005 EALTA Voss, Norway J. Bernstein,

Session 4: ASSESSING SPEAKING

TESTING SPEAKING AND LISTENING

Test of English as a Foreign Language - Measures English language proficiency and aptitude - College or university admissions requirement - World’s accessible.

L2 program design Content, structure, evaluation.

REFERENCES WHAT DOES IT CERTIFY? Cambridge English: First, commonly known as First Certificate in English (FCE) is an exam for people who need to prove.

Testing What You Teach: Eliminating the “Will this be on the final

Catia Cucchiarini Quantitative assessment of second language learners’ fluency in read and spontaneous speech Radboud University Nijmegen.

English Language Program Screening and Placement in English Classes Mary Wood SPEAK Test Beverley Earles.

Language Assessment System (LAS) Links TM Census Test.

Edition Version 1-11 Presented by Language Acquisition Branch.

Pearson Test of English (PTE)

TYPES OF TEST ITEMS/TASKS

Agenda What is TOEFL PBT? Sections of the TOEFL PBT Test of Written English (TWE) Listening Comprehension Structure and Written Expression Reading Comprehension.

Assessment of Training and Experience: Technology for Assessment Peter W. Foltz Pearson

EUROPEAN SCALES OF LANGUAGE PROFICIENCY BASIC CONCEPTS IN ASSESSMENT ASSESSING ORAL PERFORMANCE Language Assessment.

Listening Task Purpose of the test:

Stages of testing + Common test techniques

Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.

GETTING TO KNOW YOUR ENGLISH LANGUAGE LEARNERS (ELLs)

National Curriculum Key Stage 2

Information / Training Seminar March AGENDA  Information items  GCE review – consultation  Modern Languages Micro-site  Improving Performance.

An English Proficiency Test for Today’s Student Using Today’s Technology Marcie Mealia,

Challenges in Developing and Delivering a Valid Test Michael King and Mabel Li NAFLE, July 2013.

A Review of the Test of English for International Communication TOEIC Gillian Luellen Educational Measurement at the University of Kansas TOEIC Purpose.

Aptitude Tests - LAB  Paul Pimpleur developed Language Aptitude Battery in 1960s with 6 subtests Grade-Point Average in academic areas other than foreign.

Zolkower-SELL 1. 2 By the end of today’s class, you will be able to:  Describe the connection between language, culture and identity.  Articulate the.

Assessment Tools for Different Languages OPI & AP Dr. Jack Liu Associate Professor Coordinator of Chinese Program California State University - Fullerton.

Literacy is...  the quality or state of being literate, esp. the ability to read and write  An individual’s ability to construct, create, and communicate.

1 Who, What, Where, WENS? The Native Speaker in the ILR ECOLT 2010 October 2010 ILR Testing Committee ECOLT 2010 October 2010 ILR Testing Committee.

+ What is th CELDT? What you need to know to be successful on this important exam.

Principles in language testing What is a good test?

Elise Hardin & Erika Kroskos

The new languages GCSE: STRATEGIES FOR SUCCESSFUL IMPLEMENTATION.

What do IELTS candidates have to do?  Candidates must do all four test modules:  Listening  Reading  Writing  Speaking.

Arizona English Language Learner Assessment AZELLA

Copyright  2014 Pearson Education, Inc. or its affiliate(s). All rights reserved. Automatic Assessment of the Speech of Young English Learners Jian Cheng,

Arabic OPI Testing at DLIFLC A Brief History and Current Status Thomas S. Parry, Ph.D. Dean, Proficiency Standards DLIFLC 1.

The Four P’s of an Effective Writing Tool: Personalized Practice with Proven Progress April 30, 2014.

Are you ready to play…. Deal or No Deal? Deal or No Deal?

Lectures ASSESSING LANGUAGE SKILLS Receptive Skills Productive Skills Criteria for selecting language sub skills Different Test Types & Test Requirements.

FCE First Certificate in English. What is it ? FCE is for learners who have an upper- intermediate level of English, at Level B2 of the Common European.

DLIFLC 7-9 FEB 01 Diagnostic Assessment Thomas S. Parry Directorate of Continuing Education Defense Language Institute BILC Professional Seminar 2005 Sofia,

Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.

A Universal English Language Assessment Guide for University Students By Charlei ButterfieldRandall Feineis Dustin HeffnerBryan Mims Tae-Sik KimJieun Choi.

What are the stages of test construction??? Take a minute and try to think of these stages???

Comprehensive Assessment Sarah Coutts

Validity in Testing “Are we testing what we think we’re testing?”

TOEFL EXAM By: Alexandra Alfonso Code: TOEFL The Test of English as a Foreign Language (TOEFL) measures the ability of nonnative speakers of English.

Extemporaneous Speaking Skills By: Tom Farmer. Does anyone know what Extemporaneous speaking skills is or what it means? Does anyone know what Extemporaneous.

Outline  I. Introduction  II. Reading fluency components  III. Experimental study  1) Method and participants  2) Testing materials  IV. Interpretation.

MICHIGAN EXAMS Presentado por: Kelly Roa Roa

1 Instructing the English Language Learner (ELL) in the Regular Classroom.

THE CALIFORNIA ENGLISH LANGUAGE DEVELOPMENT TEST (CELDT) Poway Unified School District.

A CRITIQUE OF AN ASSESSMENT TOOL AT THE UNIVERSITY OF MICHIGAN BY ABEER EL-ANWAR Arabic Proficiency Test For College Level Prepared by Raii Rammuny and.

Key Stage 2 Portfolio. Llafaredd / Oracy Darllen / Reading Ysgrifennu / Writing Welsh Second Language.

Case Study of the TOEFL iBT Preparation Course: Teacher’s perspective Jie Chen UWO.

To my presentation about:  IELTS, meaning and it’s band scores.  The tests of the IELTS  Listening test.  Listening common challenges.  Reading.

PEARSON TEST OF ENGLISH Important Test Tips. DO’S.

AAPPL Assessment Follow Up June What is AAPPL Measure? The ACTFL Assessment of Performance toward Proficiency in Languages (AAPPL) is a performance-

Progress monitoring Is the Help Helping?.

Higher French – What to Expect

Automatic Fluency Assessment

TOEFL IBT Prepared by M.S.A FU.

Homework questions How does ACTFL define a beginning level learner? (p.30) What are the principles for teaching speaking to beginning learners? (pp.36-40)

Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov

National 3 Units.

Advanced Higher Units.

Presentation transcript:

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado 5 June 2009 Pearson Knowledge Technologies Palo Alto, California Jian Cheng Jared Bernstein Ulrike Pado Masa Suzuki

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Outline 1.Pearson Knowledge Technologies 2.How Versant tests operate 2. Versant Arabic Test (development) 3. Validation evidence 4. Predictive accuracy

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Pearson Knowledge Tech. (PKT) (KAT + Ordinate) are now PKT KAT ≈ {LSA, Essay Scoring, Write-to-Learn, PTE, etc.} Ordinate ≈ {Versant, ORF for NCES, VersaReader, PTE, etc.) PKT is part of Pearson Pearson ≈ { FT, Economist, Penguin, Longman, PsychCorp, … etc} PearsonKT is in Boulder, Colorado and Palo Alto, California.

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Test delivery Database tests, prompts, responses ENGLISH SPANISH DUTCH speech report Communication Network Delivery Interface California Anywhere Scoring system ARABIC

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Versant Database Test Delivery Server Scoring “The train’s been delayed by one hour ” How Versant tests operate

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Versant Arabic Test DLI purpose ~1000 students at DLI need predictive speaking tests Requirements Accurate test of Arabic listening & speaking Convenient to use at DLI and worldwide (ILR is costly) Suitable for repeated formative testing High peak capacity for mass screening

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Construct Comparison OPI Construct: Oral Proficiency as manifest in an Oral Proficiency Interview, is compatible with communicative competence as reflected in the functional level and/or complexity of content accurately produced. Versant Construct: facility in spoken language – the ability to understand spoken language and speak appropriately in response at a conversational pace on everyday topics.

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Versant Arabic Test Part A: Reading Part B: Repeat -1 Part C: Short Answers Part D: Sentence Builds Part E: Repeat -2 Part F: Passage Retelling Test Structure

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Versant Scoring ReadRepeat Sentence 1Sent BuildRepeat Sentence 2SAQ Passage Human Scoring VocabularySentence MasteryFluency Pronunciation 20%30% 20%

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder How Versants are developed (1) Scale Estimates Test Spec Versant Scores Native Test Developers Ordinate System Item Text Recorded Items Validation Concurrent ILR Interviews Arabic Learners Native Scribes Criteria Native Judges scale scores transcripts ILR Scores Arabic Natives Internal External (Versant Arabic Test)

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder kutubu al-waladi – the books of the boy kataba al-waladu – wrote the boy subj No disambiguating short vowels written Vowels carry phonetic information Vowels carry grammar information Arabic Challenges: Voweling

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder for visit of us – for our visit Complicates lexicon lookup, frequency estimates… “Short” Arabic items are harder than English items with the same number of words Complex Morphology li ziyaarat naa

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Development & Run-time Processes Compilation of expectation and runtime flow

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Training data sources Native Data EgyptSyriaIraqPalestineOtherTotal Learner Data DLINon-DLITotal Prompt Voices Country EgyptIraqJordanMoroccoLebanonPalestineSyria Voices F, M MFM Prompt Voices and Training Samples

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Reliability: Scores are consistent Validity: Native and non-native speakers should be clearly distinct MSA and dialect speakers should be distinct (since we’re testing MSA) Machine scores should predict human scores Validation Criteria

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Reliability Score Split-Half Reliability (N = 134) Test – Retest Reliability (N = 100) Overall Sentence Mastery Vocabulary Fluency Pronunciation

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Native ~ Non-Native Scores

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Natives by Countries

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Educated ~ Uneducated Speakers Cumulative Density Arabic Overall Score

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Machine – Human Comparison Score Correlation (N = 134) Overall0.97 Sentence Mastery0.97 Vocabulary0.96 Fluency0.84 Pronunciation0.83

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder How Versants Compare to OPIs Versant Arabic Overall Score ILR OPI Score (logits) N = 118 r = 0.87

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Spanish & English: Versant ~ Human ILR OPI Score (logits) Versant Spanish Score N = 37 r = 0.92 SpanishEnglish N = 37 r = 0.92 N = 151 r = 0.86

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Summary Versant Arabic Test (VAT) is in operation Based on a large and wide body of transcribed spoken material VAT is available on demand Returns consistent, accurate scores that reflect real-time skills with MSA VAT can triage or screen for OPI tests

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder النهاية Thanks to Waheed Samy, Naima Bousofara Omar, Eli Andrews, Mohamed Al-Saffar, Nazir Kikhia, Rula Kikhia,and Linda Istanbulli for item development and data collection/transcription in Arabic, and to Andy Freeman for providing diacritic markings.