ASRA: Automatic Speech Recognition & Assessment

Slides:



Advertisements
Similar presentations
齊來學中文 Let’s Learn Chinese 英國中文學校聯會出版 Published by the UK Federation of Chinese Schools.
Advertisements

Feature Selection for Pattern Recognition J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan University ( 台灣大學 資訊工程系 )
Dynamic Time Warping (DTW)
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Unit 5 I am watching TV. Language Goal: Talk about what people are doing What is she doing? She is writing. What is he doing? He is cleaning. What are.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
(2) 「頭」﹕念輕聲,標識名詞 (2) 「頭」﹕念輕聲,標識名詞 饅.頭 vs 窩頭 ﹑ 床頭 ﹑ 喉頭 ﹑ 橋頭 饅.頭 vs 窩頭 ﹑ 床頭 ﹑ 喉頭 ﹑ 橋頭 木頭 ﹑ 石頭 ﹑ 饅頭 ﹑ 芋頭 ﹑ 骨頭 ﹑ 枕頭 ﹑ 舌頭 木頭 ﹑ 石頭 ﹑ 饅頭 ﹑ 芋頭 ﹑ 骨頭 ﹑ 枕頭 ﹑ 舌頭 前頭.
Get to know English 1.One word multiple meanings 2.One meaning multiple words 3.Same word different sound 4.Same sound different word.
Basic Features of Audio Signals ( 音訊的基本特徵 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan.
Linguistics phonetic symbols. 先下載 IPA 字型檔案,執行安裝。 由於這個程式的字型目錄設定錯誤, 所以等重新開機時就會發現字型消失。 所以必須根據以下步驟來讓 Windows 加入 IPA 字型。
Two Foreign Language Learners: Two Foreign Language Learners: Nakajima Akira & Sato Keiji Group Members: Carrie Angel Fiona
How IPA is Used in SSML and PLS Paolo Baggia, Loquendo Wed. August 9 th, 2006.
2015/9/131 Stress Detection J.-S. Roger Jang ( 張智星 ) MIR LabMIR Lab, CSIE Dept., National Taiwan Univ.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
Longman Elect 透過閱讀、聆聽、說話及寫作去學習句 式、文法及生字 Learn the sentence structures, grammar and vocabulary through reading, listening, speaking and writing.
Speech Assessment 語音評測 J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept, Tsing.
National Taiwan University
LREC 2008, Marrakech, Morocco1 Automatic phone segmentation of expressive speech L. Charonnat, G. Vidal, O. Boëffard IRISA/Cordial, Université de Rennes.
! !美洲華語 李雅莉老師製作 TextVocabularyidiomStoryChallenge $100 $200 $300 $400 $500 $600 $100 $200 $300 $400 $500 $600 $100 $200 $300 $400 $500 $600 $100 $200.
Binary Search Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
English—Spring 2011 Course Introduction. Main texts Wegmann, Brenda, and Miki Knezevic. Mosaic 2: Reading.
Introduction to Probability Theory ‧ 3- 1 ‧ Speaker: Chuang-Chieh Lin Advisor: Professor Maw-Shang Chang National Chung Cheng University Dept. CSIE, Computation.
Letter to Phoneme Alignment Using Graphical Models N. Bolandzadeh, R. Rabbany Dept of Computing Science University of Alberta 1 1.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao.
Story-telling and Democracy An Analysis of Ethnic Dialogue Workshop in Civil Society FAN, Yun ( 范雲 ) Department of Sociology, National Taiwan University.
Some Research Activities in MIR Lab J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS.
! !美洲華語 李雅莉老師製作 TextVocabularyusageStoryChallenge $100 $200 $300 $400 $500 $600 $100 $200 $300 $400 $500 $600 $100 $200 $300 $400 $500 $600 $100 $200.
第二冊第七課 中國新年是春節 第一週.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Discussions on Audio Melody Extraction (AME) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
Simulation of Stock Trading J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
Linear Classifiers (LC) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
Final Project: English Preposition Usage Checker J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
Introduction to Music Information Retrieval (MIR)
Introduction to ISMIR/MIREX
Onset Detection, Tempo Estimation, and Beat Tracking
Search in Google's N-grams
MIR Lab: R&D Foci and Demos ( MIR實驗室:研發重點及展示)
DP for Optimum Strategies in Games
Query by Singing/Humming via Dynamic Programming
Letter-Sound Correspondence
Natural Language Processing and Speech Enabled Applications
Introduction to Pattern Recognition
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
Intro to Machine Learning
Text-To-Speech System for English
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
ML for FinTech: Some Examples
Introduction to Music Information Retrieval (MIR)
Speech Technology for Language Learning
The Alphabet ABC’S Kindergarten.
Search in OOXX Games J.-S. Roger Jang (張智星) MIR Lab, CSIE Dept.
Introduction to Music Information Retrieval (MIR)
Audio Books for Phonetics Research
Machine Learning in FinTech
Digital Speech Processing
National Taiwan University
Endpoint Detection ( 端點偵測)
Applications of Heaps J.-S. Roger Jang (張智星) MIR Lab, CSIE Dept.
Query by Singing/Humming via Dynamic Programming
Scientific Computing: Closing 科學計算:結語
Prediction in Stock Trading
Game Trees and Minimax Algorithm
s Click on the star to hear the sound. Can you say it yourself?
Duration & Pitch Modification via WSOLA
s Click on the star to hear the sound. Can you say it yourself?
National Taiwan University
2017 APSIPA A Study on Landmark Detection Based on CTC and Its Application to Pronunciation Error Detection Chuanying Niu1, Jinsong Zhang1, Xuesong Yang2.
Presentation transcript:

ASRA: Automatic Speech Recognition & Assessment J.-S. Roger Jang (張智星) jang@mirlab.org http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University

Introduction to ASRA ASRA: Automatic speech recognition & assessment Functionality Speech assessment or speech scoring Voice-command-based speech recognition Languages Mandarin, English, Taiwanese, Japanese Required toolboxes Utility toolbox SAP toolbox ASR toolbox

Examples of Speech Assessment Test examples saEnglish01.m saChinese01.m saTaiwanese01.m goSaDemo.m Applications 唸唸不忘 or 背書機 (Recital machine) Read & Say Click to play each phone! Word score Phone score Pitch curve

Approach to Speech Assessment Text to phonetic alphabets Forced alignment Phone-based scoring Pitch tracking

Texts to Phonetic Alphabets (1/2) Chinese Exhaustive method 朝(ㄓㄠ )辭白(ㄅㄞˊ)帝彩雲間 朝(ㄓㄠ )辭白(ㄅㄛˊ)帝彩雲間 朝(ㄔㄠˊ)辭白(ㄅㄞˊ)帝彩雲間 朝(ㄔㄠˊ)辭白(ㄅㄛˊ)帝彩雲間 Word segmentation 基隆廟口吃小吃 三人參加會議 Taiwanese No text, no pronunciation dictionary, no word corpus  Everything is much harder!

Texts to Phonetic Alphabets (2/2) English Exhaustive method based on CMU pronouncing dictionary Multimedia Grapheme-to-phoneme conversion: The process of using machine learning or statistical approaches to generate the most probably phone list for a word not in the pronunciation dictionary Arnold Schwarzenegger Genre classification Japanese Exhaustive search

Forced Alignment Align given utterance to a sequence of phones represented as a lexicon net Lexicon net for “What are you allergic to” Optional silence Heteronym (破音字)

Lexicon Net for Detecting Confusing Syllables 日本人說華語的「母語干擾」現象 「打哈欠(qian)」誤唸為「打哈見(jian) 一次(ci)旅行」誤唸為「一字(zi)旅行」 「晚安(an)」誤唸為「晚ㄤ(ang)」 Lexicon net for 「天氣熱、打哈欠」:

Error Pattern Detection To detect utterances which start/stop anywhere:

Scoring Computation Phone-based scoring Higher-level scoring Identify the interval of each phone by forced alignment Compare the phone utterance to its competing phone models to get a ranking, and the ranking is converted to a score Example: The 38 competing phone models of “w+uh” are k+uh g+uh l+uh b+uh p+uh t+uh w+uh d+uh jh+uh f+uh sh+uh hh+uh y+uh ch+uh r+uh zh+uh th+uh n+uh z+uh er+uh ey+uh m+uh ih+uh ae+uh aw+uh iy+uh eh+uh ao+uh uw+uh ay+uh ah+uh oy+uh aa+uh v+uh ow+uh s+uh ng+uh. The 0-based ranking of “w+uh” is converted to a score between 0 and 100. Higher-level scoring Word score: Time weighted average of phone scores Sentence score: Time weighted average of word scores, with discount factors derived from unusually short/long phones

Examples of Voice Command Recognition goVcDemo.m Applications 成語接龍 (Speech-enabled Chinese idiom relay) 一語中的 (Speech-enabled Chinese idiom riddle) No optional silence between words