Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky

Slides:



Advertisements
Similar presentations
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Advertisements

Language Model for Cyrillic Mongolian to Traditional Mongolian Conversion Feilong Bao, Guanglai Gao, Xueliang Yan, Hongwei Wang
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
“Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
Speech Recognition. What makes speech recognition hard?
Detection of Recognition Errors and Out of the Spelling Dictionary Names in a Spelled Name Recognizer for Spanish R. San-Segundo, J. Macías-Guarasa, J.
1 Language Model (LM) LING 570 Fei Xia Week 4: 10/21/2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA A A.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
SI485i : NLP Set 3 Language Models Fall 2012 : Chambers.
Automatic Continuous Speech Recognition Database speech text Scoring.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Word-subword based keyword spotting with implications in OOV detection Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Transliteration Transliteration CS 626 course seminar by Purva Joshi Mugdha Bapat Aditya Joshi Manasi Bapat
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Integrated Stochastic Pronunciation Modeling Dong Wang Supervisors: Simon King, Joe Frankel, James Scobbie.
Learning Phonetic Similarity for Matching Named Entity Translation and Mining New Translations Wai Lam, Ruizhang Huang, Pik-Shan Cheung ACM SIGIR 2004.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Chinese Word Segmentation and Statistical Machine Translation Presenter : Wu, Jia-Hao Authors : RUIQIANG.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Introduction of Grphones Dong Wang 05/05/2008. Content  Grphones  Graphone-based LVCSR  Graphone-based STD.
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
Word and Sub-word Indexing Approaches for Reducing the Effects of OOV Queries on Spoken Audio Beth Logan Pedro J. Moreno Om Deshmukh Cambridge Research.
Improving out of vocabulary name resolution The Hanks David Palmer and Mari Ostendorf Computer Speech and Language 19 (2005) Presented by Aasish Pappu,
AGA 4/28/ NIST LID Evaluation On Use of Temporal Dynamics of Speech for Language Identification Andre Adami Pavel Matejka Petr Schwarz Hynek Hermansky.
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
Semi-supervised Dialogue Act Recognition Maryam Tavafi.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
1 Introduction to Natural Language Processing ( ) Language Modeling (and the Noisy Channel) AI-lab
Problems of Modeling Phone Deletion in Conversational Speech for Speech Recognition Brian Mak and Tom Ko Hong Kong University of Science and Technology.
Learning Phonetic Similarity for Matching Named Entity Translations and Mining New Translations Wai Lam Ruizhang Huang Pik-Shan Cheung Department of Systems.
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
HANGMAN OPTIMIZATION Kyle Anderson, Sean Barton and Brandyn Deffinbaugh.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
Language Identification and Part-of-Speech Tagging
Phonics at Chawton CE Primary School
Pre-Kindergarten Scope & Sequence Unit 8: Spring is in the Air
Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI
N-Gram Based Approaches
Text-To-Speech System for English
Kindergarten Scope & Sequence Unit 10: School’s Out!
Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2
Word Pronunciation Julia Hirschberg CS /18/2018.
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
Jun Wu and Sanjeev Khudanpur Center for Language and Speech Processing
TEACHING READING Indawan Syahri 12/8/2018 indawansyahri.
Family History Technology Workshop
Presented by Wen-Hung Tsai Speech Lab, CSIE, NTNU 2005/07/13
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Language model using HTK
Language Transfer of Audio Word2Vec:
Presentation transcript:

Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky OOV Detection and Recovery using Hybrid Models with Different Fragments Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky

Outline Introduction Method Experiment Setup Results Conclusion

Introduction Most speech recognition systems are closed-vocabulary recognizers. On average, one OOV introduces 1.2 word errors. OOV words are usually content words, such as names, locations, etc.

Introduction Most speech recognition systems are closed-vocabulary recognizers. On average, one OOV introduces 1.2 word errors. OOV words are usually content words, such as names, locations, etc.

Introduction There are several approaches for detecting OOVs: hybrid language model confidence scores and other combine hybrid LM with confidence metrics

Introduction There are several approaches for detecting OOVs: hybrid language model confidence scores and other combine hybrid LM with confidence metrics we report the performance of OOV detection and recovery using the fragment-word hybrid LM

Fragments Phone Subword Graphone

Introduction Phone and subword only model the phonetic level; There are several approaches for detecting OOVs: hybrid language model confidence scores and other combine hybrid LM with confidence metrics Phone and subword only model the phonetic level; graphone also considers orthography.

phone hybrid system an N-gram phone LM was trained and combined with a word LM to generate the hybrid LM. Then during decoding, OOVs were represented by phone sequences.

Subword hybrid system First we add all phones to a subword inventory to ensure the full coverage of all possible words. Then, for each iteration, the most frequent subword bigram is merged and added to the subword inventory.

Graphone hybrid system A graphone is a graphome-phoneme pair of English letters and phones. a trigram joint-sequence model is trained from a dictionary and used to segment in- vocabulary (IV) words into graphone sequences. Tokens have a minimum and maximum number of letters and phones; graphones from 1 up to L letters and phones were allowed.

Method OOV Detection OOV Recovery

OOV Detection A fragment-hybrid LM is applied during decoding to detect the presence of OOVs. We trained an open-vocabulary word LM from a large text corpus and a closed-vocabulary fragment LM from the pronunciations of all words in a dictionary. When training the word LM, all OOVs were matched to the same unknown token “ 𝑢𝑛𝑘 ”

OOV Detection 𝑃 𝐻 𝑓 𝑖 = 𝑃 𝑊 ( 𝑢𝑛𝑘 )∙ 𝑃 𝐹 ( 𝑓 𝑖 )∙ 𝐶 𝑂𝑂𝑉 the unigram probability of a fragment in the hybrid LM, 𝑃 𝐻 𝑓 𝑖 , is calculated as 𝑃 𝐻 𝑓 𝑖 = 𝑃 𝑊 ( 𝑢𝑛𝑘 )∙ 𝑃 𝐹 ( 𝑓 𝑖 )∙ 𝐶 𝑂𝑂𝑉 we can compute N-gram probabilities in the hybrid LM.

OOV Recovery After OOV detection, appropriate spellings were generated for OOVs using the recognized fragment sequences. For the graphene hybrid system, we can simply concatenate letters from the detected graphene sequence to restore the OOV written form. For the phone and subword hybrid systems, a phoneme-to-grapheme conversion was applied. To achieve good phoneme-to-grapheme conversion performance, we trained a 6-gram joint sequence model with short length units.

Experiment Setup The WSJ0 text corpus was used for word LM training. the top 5k and 20k words in this text corpus were used as vocabulary, yielding an OOV rate of 2% for both tasks. Then an open-vocabulary 5k-word LM and 20k-word LM were trained. The recognition dictionary was generated using CMUdict (v.0.7a).

Evaluation method 𝑅𝑒𝑐𝑎𝑙𝑙= #correctly detected OOVs #OOVs in reference ×100% 𝑃𝑒𝑟𝑐𝑖𝑠𝑖𝑜𝑛= #correctly detected OOVs #OOVs reported ×100%

Graphone hybrid system OOV detection Phone hybrid system Subword hybrid system Graphone hybrid system

Comparing the three systems OOV detection Comparing the three systems

OOV recovery

Conclusion We found that the subword and graphone hybrid systems are significantly better than the phone hybrid system for both OOV detection and recovery tasks. Furthermore, for the subword and graphone hybrid systems, more training data allows more fragments thus better coverage; provided sufficient training data are available, longer subwords or graphones are preferable to shorter ones; the graphone hybrid system provides a larger relative improvement in the recovery task.