Automatic Fluency Assessment

Slides:

Advertisements

Similar presentations

Assessment types and activities

Advertisements

KeTra.

Test of English as a Foreign Language - Measures English language proficiency and aptitude - College or university admissions requirement - World’s accessible.

Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon National Institute for Testing.

Catia Cucchiarini Quantitative assessment of second language learners’ fluency in read and spontaneous speech Radboud University Nijmegen.

EE3P BEng Final Year Project – 1 st meeting SLaTE – Speech and Language Technology in Education Martin Russell

TESTING ORAL PRODUCTION Presented by: Negin Maddah.

® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.

Principles for teaching speaking 1.Give students practice with both fluency and accuracy 2.Provide opportunities for students to interact by using pair.

Confidential and Proprietary. Copyright © 2010 Educational Testing Service. All rights reserved. Catherine Trapani Educational Testing Service ECOLT: October.

Michael Daller and Nivja De Jong.  The aim of the present study is to operationalize language dominance in bilinguals with structurally different languages.

Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.

Lesson Seven Reliability. Contents  Definition of reliability Definition of reliability  Indication of reliability: Reliability coefficient Reliability.

Why is ASR Hard? Natural speech is continuous

DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

Building High Quality Databases for Minority Languages such as Galician F. Campillo, D. Braga, A.B. Mourín, Carmen García-Mateo, P. Silva, M. Sales Dias,

® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service.

A Review of the Test of English for International Communication TOEIC Gillian Luellen Educational Measurement at the University of Kansas TOEIC Purpose.

暑假班國小加註英語專長六學分班 Schedule for the first two weeks: Speaking Domain

Chapter 6 ~~~~~ Oral And English Language Learner/Bilingual Assessment.

On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.

Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang.

The relationship between objective properties of speech and perceived pronunciation quality in read and spontaneous speech was examined. Read and spontaneous.

Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University Korea.

Chap. 2 Principles of Language Assessment

Copyright © 2015 by Educational Testing Service. 1 Feature Selection for Automated Speech Scoring Anastassia Loukina, Klaus Zechner, Lei Chen, Michael.

Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.

A Universal English Language Assessment Guide for University Students By Charlei ButterfieldRandall Feineis Dustin HeffnerBryan Mims Tae-Sik KimJieun Choi.

ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.

© 2013 by Larson Technical Services

Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?

1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

To my presentation about:  IELTS, meaning and it’s band scores.  The tests of the IELTS  Listening test.  Listening common challenges.  Reading.

A Bayesian Network Classifier for Word-level Reading Assessment Joseph Tepperman 1, Matthew Black 1, Patti Price 2, Sungbok Lee 1, Abe Kazemzadeh 1, Matteo.

A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,

Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.

Language Assessment.

Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang

Advantage Track Program an alternative to the TOEFL & IELTS

التوجيه الفني العام للغة الإنجليزية

Automatic Speech Recognition

Ford Foundation International Fellowships Program - Philippines

Useful websites for independent study

Automatic Speech Recognition

Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang

ASSESSMENT ON SPEAKING ENGLISH & Scoring criteria

Introduction of IELTS Test

Supplementary Table 1. PRISMA checklist

Conditional Random Fields for ASR

BBI 2420 ORAL INTERACTION SKILLS

پرسشنامه کارگاه.

SPEAKING ASSESSMENT Joko Nurkamto UNS Solo 11/8/2018.

Preparing for the Verbal Reasoning Measure

BBI 2409 English for Academic Purposes PJJ

Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov

Statistical Models for Automatic Speech Recognition

SPEAKING ASSESSMENT Joko Nurkamto UNS Solo 12/3/2018.

TOEFL iBT Speaking Overview

Perceptions on L2 fluency-perspectives of untrained raters

Towards Automatic Fluency Assessment

Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,

SECOND TEST Writing & Oral.

Why do we assess?.

TOEFL iBT Speaking Overview

Educational Testing Service (ETS)

Qualities of a good data gathering procedures

Presentation transcript:

Automatic Fluency Assessment Suma Bhat

The Problem Language fluency Component of oral proficiency Indicative of effort of speech production Indicates effectiveness of speech Language proficiency testing Automated methods of language assessment Fundamental importance Automatic assessment of language fluency

Why is it hard? Fluency a subjective quantity Measurement of fluency requires Choice of right quantifiers Means of measuring the quantifiers Automatic scores should Correlate well with human assessment Interpretable

Automatic Speech Scoring Automatic scoring of predictable speech factual information in short answers (Leacock & Chodorow, 2003) read speech PhonePass (Bernstein, 1999) Automatic scoring of unpredictable speech spontaneous speech SpeechRater (Zechner, 2009)

State of the art SpeechRater from Educational Testing Services (2008, 2009) Uses ASR for automatic assessment of English speaking proficiency In use as online practice test for TOEFL internet based test (iBT) takers since 2006

Proficiency assessment in SpeechRater Test aspects of language competence Delivery (fluency, pronunciation) Language use (vocabulary and grammar) Topical development (content, coherence and organization) Current system Scores fluency and language use Overall proficiency score Combination of measures of fluency and language use Multiple Regression and CART scoring module

SpeechRater Architecture

System Speech recognizer Trained on 40 hours of non-native speech Evaluation set 10 hours of non-native speech Word accuracy 50% Feature set Fluency features Mean silence duration, Articulation rate Vocabulary Word types per second Pronunciation Global acoustic model score Grammar Global language model score

Performance Measured in Human-Computer score correlation Multiple Regression based scoring 0.57 CART based scoring 0.57 Compared with inter-human agreement 0.74

Requirements Superior quality audio recordings for ASR training tens of hours of language specific speech tens of hours of transcription Language-specific resources

Is this the end? What if language-specific resources are scarce? superior quality audio recordings for ASR training hours of language specific speech hours of transcription Tested language is a minority language ASR performance affected Alternative methods sought

Alternative method Our approach (autorater) makes signal level measurements to obtain quantifiers of fluency Constructs classifier based on 20 second-segments of speech Requires no transcription

Autorater Preprocessor Feature Extractor Classifier Fluency score Speech signal Preprocessor Feature Extractor Classifier Fluency score Scorer

Measurements Convert stereo to mono Downsample to 16kbps Extract pitch and intensity information Segment signal into speech and silence Feature extraction Used Praat Using sox

Feature Extractor dur1=duration of speech without silent pauses dur2= total duration of speech Name Definition Articulation Rate Number of syllable-nuclei/dur1 Rate of Speech Number of syllable nuclei/dur2 Phonation/time ratio dur1/dur2 Mean length of silent pauses Mean length of silent pauses Number of silent pauses per second Number of silent pauses/dur2 Number of filled pauses per second Number of filled pauses/dur2

Classifier Logistic regression model Target scores: Human-rated scores Variables: Measurements of the quantifiers PTR, ROS, MLS Observed scores: Real values between 0 and 1 (inclusive)

Experiments 3 configurations of the classifier Rater-independent model: Most general form Rated utterances are considered independent of the raters Does not take into account individual rater bias Rater-biased model: additional binary features equal to the number of raters indicates individual rater-bias Rater-tuned model: One model per rater

Quantifier-Score correlation (sig.) Name Correlation with Fluency Rate of Speech (ROS) 0.24 Phonation/time ratio (PTR) 0.36 Mean length of silent pauses (MLS) -0.22 Number of silent pauses (SIL) -0.30 Number of silent pauses per second (SPS) -0.32 Total length of Silence (LOS) -0.41

Results –Pilot rating (Trained) Model Mean-squared error R-bias 0.33 R-tuned 0.38 R-ind 0.44 R-tuned: Best model 0.197 worst model 0.5 Inter-rater agreement 48.2%

Based on training data and part of the newspaper data ASR for our data Input Data Language Model Based on training data Based on training data and part of the newspaper data Training Data 12.82% 6.59% Testing Data 11.23% 4.75%

Summary Quantifiers obtained from low-level acoustic measurements are good indicators of fluency Logistic regression models for automated scoring of spontaneous speech appropriate Main contribution Alternative method of automatic fluency assessment Useful in resource-scarce testing Main Result: Rater-biased logistic regression model for scoring fluency