Detecting missrecognitions Predicting with prosody
Missrecognitions - papers “Predicting automatic speech recognition performance using prosodic cues” - TooT “Generalizing prosodic prediction of speech recognition errors” – W99
Missrecognitions - generalities What are they? WER – Word error rate CA – concept accuracy Why it is important to detect them? User dificulty to correct system missundertandings User frustration by unnecessary confirmations or rejections
Prosody to the rescue!!! Prosodic features used: Fundamental frequency (f0) Energy (rms) Duration of speaker turn (dur) Pause preceding turn (ppau) Speaking rate (tempo) Silence in speaker turn (zeros)
Predicting Missrecognitions - results Rule based learner (RIPPER) Characteristics of missrecognitions: Higher in pitch Louder, longer Less internal space Improved prediction with prosody TooT – 6.53% vs 22.23% W99 – 22.77% vs 26.14%
Predicting Missrecognitions - comments Is WER a adequate measure? Do we model the ASR capabilities or its training set? Comparing with ASR confidence score learning is ok?
Detecting user corrections Predicting with prosody
User corrections - papers “Corrections in spoken dialog systems” “Identifying user corrections automatically in spoken dialog systems”
User corrections - generalities What are they? Why it is important to detect them? Recognized much more poorly Tuning dialog strategies ASR for hyperarticulated speech Change of initiative and confirmation strategy
User corrections - insights Types: REP – repetition PAR – paraphrase ADD – content added OMIT – content omitted ADD/OMIT Characterized by prosodic features associated with hyperarticulation – but not the same
Predicting user corrections Rule based learner on TooT corpus Features: PROS, ASR, SYS, POS, DIA 15.72% error rate on Raw+ASR+ SYS+POS+PreTurn