ASRA: Automatic Speech Recognition & Assessment J.-S. Roger Jang (張智星) jang@mirlab.org http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University
Introduction to ASRA ASRA: Automatic speech recognition & assessment Functionality Speech assessment or speech scoring Voice-command-based speech recognition Languages Mandarin, English, Taiwanese, Japanese Required toolboxes Utility toolbox SAP toolbox ASR toolbox
Examples of Speech Assessment Test examples saEnglish01.m saChinese01.m saTaiwanese01.m goSaDemo.m Applications 唸唸不忘 or 背書機 (Recital machine) Read & Say Click to play each phone! Word score Phone score Pitch curve
Approach to Speech Assessment Text to phonetic alphabets Forced alignment Phone-based scoring Pitch tracking
Texts to Phonetic Alphabets (1/2) Chinese Exhaustive method 朝(ㄓㄠ )辭白(ㄅㄞˊ)帝彩雲間 朝(ㄓㄠ )辭白(ㄅㄛˊ)帝彩雲間 朝(ㄔㄠˊ)辭白(ㄅㄞˊ)帝彩雲間 朝(ㄔㄠˊ)辭白(ㄅㄛˊ)帝彩雲間 Word segmentation 基隆廟口吃小吃 三人參加會議 Taiwanese No text, no pronunciation dictionary, no word corpus Everything is much harder!
Texts to Phonetic Alphabets (2/2) English Exhaustive method based on CMU pronouncing dictionary Multimedia Grapheme-to-phoneme conversion: The process of using machine learning or statistical approaches to generate the most probably phone list for a word not in the pronunciation dictionary Arnold Schwarzenegger Genre classification Japanese Exhaustive search
Forced Alignment Align given utterance to a sequence of phones represented as a lexicon net Lexicon net for “What are you allergic to” Optional silence Heteronym (破音字)
Lexicon Net for Detecting Confusing Syllables 日本人說華語的「母語干擾」現象 「打哈欠(qian)」誤唸為「打哈見(jian) 一次(ci)旅行」誤唸為「一字(zi)旅行」 「晚安(an)」誤唸為「晚ㄤ(ang)」 Lexicon net for 「天氣熱、打哈欠」:
Error Pattern Detection To detect utterances which start/stop anywhere:
Scoring Computation Phone-based scoring Higher-level scoring Identify the interval of each phone by forced alignment Compare the phone utterance to its competing phone models to get a ranking, and the ranking is converted to a score Example: The 38 competing phone models of “w+uh” are k+uh g+uh l+uh b+uh p+uh t+uh w+uh d+uh jh+uh f+uh sh+uh hh+uh y+uh ch+uh r+uh zh+uh th+uh n+uh z+uh er+uh ey+uh m+uh ih+uh ae+uh aw+uh iy+uh eh+uh ao+uh uw+uh ay+uh ah+uh oy+uh aa+uh v+uh ow+uh s+uh ng+uh. The 0-based ranking of “w+uh” is converted to a score between 0 and 100. Higher-level scoring Word score: Time weighted average of phone scores Sentence score: Time weighted average of word scores, with discount factors derived from unusually short/long phones
Examples of Voice Command Recognition goVcDemo.m Applications 成語接龍 (Speech-enabled Chinese idiom relay) 一語中的 (Speech-enabled Chinese idiom riddle) No optional silence between words