Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado 5 June 2009 Pearson Knowledge Technologies Palo Alto, California Jian Cheng Jared Bernstein Ulrike Pado Masa Suzuki
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Outline 1.Pearson Knowledge Technologies 2.How Versant tests operate 2. Versant Arabic Test (development) 3. Validation evidence 4. Predictive accuracy
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Pearson Knowledge Tech. (PKT) (KAT + Ordinate) are now PKT KAT ≈ {LSA, Essay Scoring, Write-to-Learn, PTE, etc.} Ordinate ≈ {Versant, ORF for NCES, VersaReader, PTE, etc.) PKT is part of Pearson Pearson ≈ { FT, Economist, Penguin, Longman, PsychCorp, … etc} PearsonKT is in Boulder, Colorado and Palo Alto, California.
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Test delivery Database tests, prompts, responses ENGLISH SPANISH DUTCH speech report Communication Network Delivery Interface California Anywhere Scoring system ARABIC
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Versant Database Test Delivery Server Scoring “The train’s been delayed by one hour ” How Versant tests operate
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Versant Arabic Test DLI purpose ~1000 students at DLI need predictive speaking tests Requirements Accurate test of Arabic listening & speaking Convenient to use at DLI and worldwide (ILR is costly) Suitable for repeated formative testing High peak capacity for mass screening
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Construct Comparison OPI Construct: Oral Proficiency as manifest in an Oral Proficiency Interview, is compatible with communicative competence as reflected in the functional level and/or complexity of content accurately produced. Versant Construct: facility in spoken language – the ability to understand spoken language and speak appropriately in response at a conversational pace on everyday topics.
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Versant Arabic Test Part A: Reading Part B: Repeat -1 Part C: Short Answers Part D: Sentence Builds Part E: Repeat -2 Part F: Passage Retelling Test Structure
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Versant Scoring ReadRepeat Sentence 1Sent BuildRepeat Sentence 2SAQ Passage Human Scoring VocabularySentence MasteryFluency Pronunciation 20%30% 20%
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder How Versants are developed (1) Scale Estimates Test Spec Versant Scores Native Test Developers Ordinate System Item Text Recorded Items Validation Concurrent ILR Interviews Arabic Learners Native Scribes Criteria Native Judges scale scores transcripts ILR Scores Arabic Natives Internal External (Versant Arabic Test)
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder kutubu al-waladi – the books of the boy kataba al-waladu – wrote the boy subj No disambiguating short vowels written Vowels carry phonetic information Vowels carry grammar information Arabic Challenges: Voweling
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder for visit of us – for our visit Complicates lexicon lookup, frequency estimates… “Short” Arabic items are harder than English items with the same number of words Complex Morphology li ziyaarat naa
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Development & Run-time Processes Compilation of expectation and runtime flow
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Training data sources Native Data EgyptSyriaIraqPalestineOtherTotal Learner Data DLINon-DLITotal Prompt Voices Country EgyptIraqJordanMoroccoLebanonPalestineSyria Voices F, M MFM Prompt Voices and Training Samples
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Reliability: Scores are consistent Validity: Native and non-native speakers should be clearly distinct MSA and dialect speakers should be distinct (since we’re testing MSA) Machine scores should predict human scores Validation Criteria
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Reliability Score Split-Half Reliability (N = 134) Test – Retest Reliability (N = 100) Overall Sentence Mastery Vocabulary Fluency Pronunciation
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Native ~ Non-Native Scores
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Natives by Countries
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Educated ~ Uneducated Speakers Cumulative Density Arabic Overall Score
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Machine – Human Comparison Score Correlation (N = 134) Overall0.97 Sentence Mastery0.97 Vocabulary0.96 Fluency0.84 Pronunciation0.83
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder How Versants Compare to OPIs Versant Arabic Overall Score ILR OPI Score (logits) N = 118 r = 0.87
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Spanish & English: Versant ~ Human ILR OPI Score (logits) Versant Spanish Score N = 37 r = 0.92 SpanishEnglish N = 37 r = 0.92 N = 151 r = 0.86
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder Summary Versant Arabic Test (VAT) is in operation Based on a large and wide body of transcribed spoken material VAT is available on demand Returns consistent, accurate scores that reflect real-time skills with MSA VAT can triage or screen for OPI tests
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder النهاية Thanks to Waheed Samy, Naima Bousofara Omar, Eli Andrews, Mohamed Al-Saffar, Nazir Kikhia, Rula Kikhia,and Linda Istanbulli for item development and data collection/transcription in Arabic, and to Andy Freeman for providing diacritic markings.