Presentation is loading. Please wait.

Presentation is loading. Please wait.

ASRA: Automatic Speech Recognition & Assessment

Similar presentations

Presentation on theme: "ASRA: Automatic Speech Recognition & Assessment"— Presentation transcript:

1 ASRA: Automatic Speech Recognition & Assessment
J.-S. Roger Jang (張智星) MIR Lab, CSIE Dept. National Taiwan University

2 Introduction to ASRA ASRA: Automatic speech recognition & assessment
Functionality Speech assessment or speech scoring Voice-command-based speech recognition Languages Mandarin, English, Taiwanese, Japanese Required toolboxes Utility toolbox SAP toolbox ASR toolbox

3 Examples of Speech Assessment
Test examples saEnglish01.m saChinese01.m saTaiwanese01.m goSaDemo.m Applications 唸唸不忘 or 背書機 (Recital machine) Read & Say Click to play each phone! Word score Phone score Pitch curve

4 Approach to Speech Assessment
Text to phonetic alphabets Forced alignment Phone-based scoring Pitch tracking

5 Texts to Phonetic Alphabets (1/2)
Chinese Exhaustive method 朝(ㄓㄠ )辭白(ㄅㄞˊ)帝彩雲間 朝(ㄓㄠ )辭白(ㄅㄛˊ)帝彩雲間 朝(ㄔㄠˊ)辭白(ㄅㄞˊ)帝彩雲間 朝(ㄔㄠˊ)辭白(ㄅㄛˊ)帝彩雲間 Word segmentation 基隆廟口吃小吃 三人參加會議 Taiwanese No text, no pronunciation dictionary, no word corpus  Everything is much harder!

6 Texts to Phonetic Alphabets (2/2)
English Exhaustive method based on CMU pronouncing dictionary Multimedia Grapheme-to-phoneme conversion: The process of using machine learning or statistical approaches to generate the most probably phone list for a word not in the pronunciation dictionary Arnold Schwarzenegger Genre classification Japanese Exhaustive search

7 Forced Alignment Align given utterance to a sequence of phones represented as a lexicon net Lexicon net for “What are you allergic to” Optional silence Heteronym (破音字)

8 Lexicon Net for Detecting Confusing Syllables
日本人說華語的「母語干擾」現象 「打哈欠(qian)」誤唸為「打哈見(jian) 一次(ci)旅行」誤唸為「一字(zi)旅行」 「晚安(an)」誤唸為「晚ㄤ(ang)」 Lexicon net for 「天氣熱、打哈欠」:

9 Error Pattern Detection
To detect utterances which start/stop anywhere:

10 Scoring Computation Phone-based scoring Higher-level scoring
Identify the interval of each phone by forced alignment Compare the phone utterance to its competing phone models to get a ranking, and the ranking is converted to a score Example: The 38 competing phone models of “w+uh” are k+uh g+uh l+uh b+uh p+uh t+uh w+uh d+uh jh+uh f+uh sh+uh hh+uh y+uh ch+uh r+uh zh+uh th+uh n+uh z+uh er+uh ey+uh m+uh ih+uh ae+uh aw+uh iy+uh eh+uh ao+uh uw+uh ay+uh ah+uh oy+uh aa+uh v+uh ow+uh s+uh ng+uh. The 0-based ranking of “w+uh” is converted to a score between 0 and 100. Higher-level scoring Word score: Time weighted average of phone scores Sentence score: Time weighted average of word scores, with discount factors derived from unusually short/long phones

11 Examples of Voice Command Recognition
goVcDemo.m Applications 成語接龍 (Speech-enabled Chinese idiom relay) 一語中的 (Speech-enabled Chinese idiom riddle) No optional silence between words

Download ppt "ASRA: Automatic Speech Recognition & Assessment"

Similar presentations

Ads by Google