Handing Uncertain Observations in Unsupervised Topic-Mixture Language Model Adaptation Ekapol Chuangsuwanich 1, Shinji Watanabe 2, Takaaki Hori 2, Tomoharu.

Slides:



Advertisements
Similar presentations
Combining Heterogeneous Sensors with Standard Microphones for Noise Robust Recognition Horacio Franco 1, Martin Graciarena 12 Kemal Sonmez 1, Harry Bratt.
Advertisements

Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.
Measuring the Influence of Long Range Dependencies with Neural Network Language Models Le Hai Son, Alexandre Allauzen, Franc¸ois Yvon Univ. Paris-Sud and.
歡迎 IBM Watson 研究員 詹益毅 博士 蒞臨 國立台灣師範大學. Hai-Son Le, Ilya Oparin, Alexandre Allauzen, Jean-Luc Gauvain, Franc¸ois Yvon ICASSP 2011 許曜麒 Structured Output.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Bag-Of-Word normalized n-gram models ISCA 2008 Abhinav Sethy, Bhuvana Ramabhadran IBM T. J. Watson Research Center Yorktown Heights, NY Presented by Patty.
Confidence Measures for Speech Recognition Reza Sadraei.
Effective Gaussian mixture learning for video background subtraction Dar-Shyang Lee, Member, IEEE.
Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models C. J. Leggetter and P. C. Woodland Department of.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
A Survey of ICASSP 2013 Language Model Department of Computer Science & Information Engineering National Taiwan Normal University 報告者:郝柏翰 2013/06/19.
Large Language Models in Machine Translation Conference on Empirical Methods in Natural Language Processing 2007 報告者:郝柏翰 2013/06/04 Thorsten Brants, Ashok.
Discriminative Syntactic Language Modeling for Speech Recognition Michael Collins, Brian Roark Murat, Saraclar MIT CSAIL, OGI/OHSU, Bogazici University.
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,
1 Sentence-extractive automatic speech summarization and evaluation techniques Makoto Hirohata, Yosuke Shinnaka, Koji Iwano, Sadaoki Furui Presented by.
Real Time Appearance Based Hand Tracking The 19th International Conference on Pattern Recognition (ICPR) December 7-11, 2008, Tampa Convention Center,
1 Hidden Markov Model 報告人:鄒昇龍. 2 Outline Introduction to HMM Activity of HMM Problem and Solution Conclusion Reference.
Algoritmi e Programmazione Avanzata
Online Kinect Handwritten Digit Recognition Based on Dynamic Time Warping and Support Vector Machine Journal of Information & Computational Science, 2015.
Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie.
Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.
Efficient Language Model Look-ahead Probabilities Generation Using Lower Order LM Look-ahead Information Langzhou Chen and K. K. Chin Toshiba Research.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.
Recurrent neural network based language model Tom´aˇs Mikolov, Martin Karafia´t, Luka´sˇ Burget, Jan “Honza” Cˇernocky, Sanjeev Khudanpur INTERSPEECH 2010.
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
1 Unsupervised Adaptation of a Stochastic Language Model Using a Japanese Raw Corpus Gakuto KURATA, Shinsuke MORI, Masafumi NISHIMURA IBM Research, Tokyo.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
Using Conversational Word Bursts in Spoken Term Detection Justin Chiu Language Technologies Institute Presented at University of Cambridge September 6.
Quiz Week 8 Topical. Topical Quiz (Section 2) What is the difference between Computer Vision and Computer Graphics What is the difference between Computer.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Using Neural Network Language Models for LVCSR Holger Schwenk and Jean-Luc Gauvain Presented by Erin Fitzgerald CLSP Reading Group December 10, 2004.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Lec. 19 – Hypothesis Testing: The Null and Types of Error.
COMPUTER SYSTEM FUNDAMENTAL Genetic Computer School INTRODUCTION TO ARTIFICIAL INTELLIGENCE LESSON 11.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Pruning Analysis for the Position Specific Posterior Lattices for Spoken Document Search Jorge Silva University of Southern California Ciprian Chelba and.
Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Tight Coupling between ASR and MT in Speech-to-Speech Translation Arthur Chan Prepared for Advanced Machine Translation Seminar.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
Xiaolin Wang Andrew Finch Masao Utiyama Eiichiro Sumita
Online Multiscale Dynamic Topic Models
Statistical Models for Automatic Speech Recognition
Parallelizing Incremental Bayesian Segmentation (IBS)
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
Statistical Models for Automatic Speech Recognition
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Handing Uncertain Observations in Unsupervised Topic-Mixture Language Model Adaptation Ekapol Chuangsuwanich 1, Shinji Watanabe 2, Takaaki Hori 2, Tomoharu Iwata 2, James Glass 1 報告者:郝柏翰 2013/03/05 ICASSP MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts, USA 2 NTT Communication Science Laboratories, NTT Corporation, Japan

Outline Introduction Topic Tracking Language Model(TTLM) TTLM Using Confusion Network Inputs(TTLMCN) Experiments Conclusion 2

Introduction In a real environment, acoustic and language features often vary depending on the speakers, speaking styles and topic changes. To accommodate these changes, speech recognition approaches that include the incremental tracking of changing environments have attracted attention. This paper proposes a topic tracking language model that can adaptively track changes in topics based on current text information and previously estimated topic models in an on-line manner. 3

TTLM Tracking temporal changes in language environments 4

TTLM A long session of speech input is divided into chunks Each chunk is modeled by different topic distributions The current topic distribution depends on the topic distribution of the past H chunks and precision parameters α as follows: 5

TTLM With the topic distribution, the unigram probability of a word w m in the chunk can be recovered using the topic and word probabilities Where θ is the unigram probabilities of word w m in topic k The adapted n-gram can be used for a 2nd pass recognition for better results. 6

TTLMCN Consider a confusion network with M word slots. Each word slot m can contain different number of arcs A m with each arc containing a word w ma and a corresponding arc posterior d ma. S m is binary selection parameter, where s m = 1 indicates that the arc is selected. 7 chunk 1 chunk 2 chunk 3 slot 1 slot 2 slot 3 A 1 =3 …

TTLMCN 8 For each chunk t, we can write the joint distribution of words, latent topics and arc selections conditioned on the topic probabilities, unigram probabilities, and arc posteriors as follows:

TTLMCN Graphical representation of TTLMCN 9

Experiments(MIT-OCW) MIT-OCW is mainly composed of lectures given at MIT. Each lecture is typically two hours long. We segmented the lectures using Voice Activity Detectors into utterances averaging two seconds each. 10

Compare with TTLM and TTLMCN We can see that the topic probability of TTLMCNI is more similar to the oracle experiment than TTLM, especially in the low probability regions. KL between TTLM and ORACLE was 3.3, TTLMCN was

Conclusion We described an extension for the TTLM in order to handle errors in speech recognition. The proposed model used a confusion network as input instead of just one ASR hypothesis which improved performance even in high WER situations. The gain in word error rate was not very large since the LM typically contributed little to the performance of LVCSR. 12

Significance Test (T-Test) H0 :實驗組與對照組的常態分佈一致 H1 :實驗組與對照組的常態分佈不一致 13

Significance Test (T-Test) 14 Example X Y