Automatic Fluency Assessment

Automatic Fluency Assessment
Suma Bhat

The Problem Language fluency Component of oral proficiency
Indicative of effort of speech production Indicates effectiveness of speech Language proficiency testing Automated methods of language assessment Fundamental importance Automatic assessment of language fluency

Why is it hard? Fluency a subjective quantity
Measurement of fluency requires Choice of right quantifiers Means of measuring the quantifiers Automatic scores should Correlate well with human assessment Interpretable

Automatic Speech Scoring
Automatic scoring of predictable speech factual information in short answers (Leacock & Chodorow, 2003) read speech PhonePass (Bernstein, 1999) Automatic scoring of unpredictable speech spontaneous speech SpeechRater (Zechner, 2009)

State of the art SpeechRater from Educational Testing Services (2008, 2009) Uses ASR for automatic assessment of English speaking proficiency In use as online practice test for TOEFL internet based test (iBT) takers since 2006

Proficiency assessment in SpeechRater
Test aspects of language competence Delivery (fluency, pronunciation) Language use (vocabulary and grammar) Topical development (content, coherence and organization) Current system Scores fluency and language use Overall proficiency score Combination of measures of fluency and language use Multiple Regression and CART scoring module

SpeechRater Architecture

System Speech recognizer Trained on 40 hours of non-native speech
Evaluation set 10 hours of non-native speech Word accuracy 50% Feature set Fluency features Mean silence duration, Articulation rate Vocabulary Word types per second Pronunciation Global acoustic model score Grammar Global language model score

Performance Measured in Human-Computer score correlation
Multiple Regression based scoring 0.57 CART based scoring 0.57 Compared with inter-human agreement 0.74

Requirements Superior quality audio recordings for ASR training
tens of hours of language specific speech tens of hours of transcription Language-specific resources

Is this the end? What if language-specific resources are scarce?
superior quality audio recordings for ASR training hours of language specific speech hours of transcription Tested language is a minority language ASR performance affected Alternative methods sought

Alternative method Our approach (autorater)
makes signal level measurements to obtain quantifiers of fluency Constructs classifier based on 20 second-segments of speech Requires no transcription

Autorater Preprocessor Feature Extractor Classifier Fluency score
Speech signal Preprocessor Feature Extractor Classifier Fluency score Scorer

Measurements Convert stereo to mono Downsample to 16kbps
Extract pitch and intensity information Segment signal into speech and silence Feature extraction Used Praat Using sox

Feature Extractor dur1=duration of speech without silent pauses
dur2= total duration of speech Name Definition Articulation Rate Number of syllable-nuclei/dur1 Rate of Speech Number of syllable nuclei/dur2 Phonation/time ratio dur1/dur2 Mean length of silent pauses Mean length of silent pauses Number of silent pauses per second Number of silent pauses/dur2 Number of filled pauses per second Number of filled pauses/dur2

Classifier Logistic regression model Target scores: Human-rated scores
Variables: Measurements of the quantifiers PTR, ROS, MLS Observed scores: Real values between 0 and 1 (inclusive)

Experiments 3 configurations of the classifier
Rater-independent model: Most general form Rated utterances are considered independent of the raters Does not take into account individual rater bias Rater-biased model: additional binary features equal to the number of raters indicates individual rater-bias Rater-tuned model: One model per rater

Quantifier-Score correlation (sig.)
Name Correlation with Fluency Rate of Speech (ROS) 0.24 Phonation/time ratio (PTR) 0.36 Mean length of silent pauses (MLS) -0.22 Number of silent pauses (SIL) -0.30 Number of silent pauses per second (SPS) -0.32 Total length of Silence (LOS) -0.41

Results –Pilot rating (Trained)
Model Mean-squared error R-bias 0.33 R-tuned 0.38 R-ind 0.44 R-tuned: Best model worst model 0.5 Inter-rater agreement 48.2%

Based on training data and part of the newspaper data
ASR for our data Input Data Language Model Based on training data Based on training data and part of the newspaper data Training Data 12.82% 6.59% Testing Data 11.23% 4.75%

Summary Quantifiers obtained from low-level acoustic measurements are good indicators of fluency Logistic regression models for automated scoring of spontaneous speech appropriate Main contribution Alternative method of automatic fluency assessment Useful in resource-scarce testing Main Result: Rater-biased logistic regression model for scoring fluency

Automatic Fluency Assessment

Similar presentations

Presentation on theme: "Automatic Fluency Assessment"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automatic Fluency Assessment

Similar presentations

Presentation on theme: "Automatic Fluency Assessment"— Presentation transcript:

Similar presentations

About project

Feedback