Download presentation
Presentation is loading. Please wait.
1
ASRA: Automatic Speech Recognition & Assessment
J.-S. Roger Jang (張智星) MIR Lab, CSIE Dept. National Taiwan University
2
Introduction to ASRA ASRA: Automatic speech recognition & assessment
Functionality Speech assessment or speech scoring Voice-command-based speech recognition Languages Mandarin, English, Taiwanese, Japanese Required toolboxes Utility toolbox SAP toolbox ASR toolbox
3
Examples of Speech Assessment
Test examples saEnglish01.m saChinese01.m saTaiwanese01.m goSaDemo.m Applications 唸唸不忘 or 背書機 (Recital machine) Read & Say Click to play each phone! Word score Phone score Pitch curve
4
Approach to Speech Assessment
Text to phonetic alphabets Forced alignment Phone-based scoring Pitch tracking
5
Texts to Phonetic Alphabets (1/2)
Chinese Exhaustive method 朝(ㄓㄠ )辭白(ㄅㄞˊ)帝彩雲間 朝(ㄓㄠ )辭白(ㄅㄛˊ)帝彩雲間 朝(ㄔㄠˊ)辭白(ㄅㄞˊ)帝彩雲間 朝(ㄔㄠˊ)辭白(ㄅㄛˊ)帝彩雲間 Word segmentation 基隆廟口吃小吃 三人參加會議 Taiwanese No text, no pronunciation dictionary, no word corpus Everything is much harder!
6
Texts to Phonetic Alphabets (2/2)
English Exhaustive method based on CMU pronouncing dictionary Multimedia Grapheme-to-phoneme conversion: The process of using machine learning or statistical approaches to generate the most probably phone list for a word not in the pronunciation dictionary Arnold Schwarzenegger Genre classification Japanese Exhaustive search
7
Forced Alignment Align given utterance to a sequence of phones represented as a lexicon net Lexicon net for “What are you allergic to” Optional silence Heteronym (破音字)
8
Lexicon Net for Detecting Confusing Syllables
日本人說華語的「母語干擾」現象 「打哈欠(qian)」誤唸為「打哈見(jian) 一次(ci)旅行」誤唸為「一字(zi)旅行」 「晚安(an)」誤唸為「晚ㄤ(ang)」 Lexicon net for 「天氣熱、打哈欠」:
9
Error Pattern Detection
To detect utterances which start/stop anywhere:
10
Scoring Computation Phone-based scoring Higher-level scoring
Identify the interval of each phone by forced alignment Compare the phone utterance to its competing phone models to get a ranking, and the ranking is converted to a score Example: The 38 competing phone models of “w+uh” are k+uh g+uh l+uh b+uh p+uh t+uh w+uh d+uh jh+uh f+uh sh+uh hh+uh y+uh ch+uh r+uh zh+uh th+uh n+uh z+uh er+uh ey+uh m+uh ih+uh ae+uh aw+uh iy+uh eh+uh ao+uh uw+uh ay+uh ah+uh oy+uh aa+uh v+uh ow+uh s+uh ng+uh. The 0-based ranking of “w+uh” is converted to a score between 0 and 100. Higher-level scoring Word score: Time weighted average of phone scores Sentence score: Time weighted average of word scores, with discount factors derived from unusually short/long phones
11
Examples of Voice Command Recognition
goVcDemo.m Applications 成語接龍 (Speech-enabled Chinese idiom relay) 一語中的 (Speech-enabled Chinese idiom riddle) No optional silence between words
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.