UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.

Slides:

Advertisements

Similar presentations

Advances in WP2 Torino Meeting – 9-10 March

Advertisements

Measuring the Influence of Long Range Dependencies with Neural Network Language Models Le Hai Son, Alexandre Allauzen, Franc¸ois Yvon Univ. Paris-Sud and.

Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.

Re-ranking for NP-Chunking: Maximum-Entropy Framework By: Mona Vajihollahi.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.

An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

Image Recognition and Processing Using Artificial Neural Network Md. Iqbal Quraishi, J Pal Choudhury and Mallika De, IEEE.

Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.

A Survey of ICASSP 2013 Language Model Department of Computer Science & Information Engineering National Taiwan Normal University 報告者：郝柏翰 2013/06/19.

AN ITERATIVE METHOD FOR MODEL PARAMETER IDENTIFICATION 4. DIFFERENTIAL EQUATION MODELS E.Dimitrova, Chr. Boyadjiev E.Dimitrova, Chr. Boyadjiev BULGARIAN.

1 International Computer Science Institute Data Sampling for Acoustic Model Training Özgür Çetin International Computer Science Institute Andreas Stolcke.

Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.

Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

We introduce the use of Confidence c as a weighted vote for the voting machine to avoid low confidence Result r of individual expert from affecting the.

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.

REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,

DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.

Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,

Multiple parallel hidden layers and other improvements to recurrent neural network language modeling ICASSP 2013 Diamantino Caseiro, Andrej Ljolje AT&T.

1 Hidden Markov Model 報告人：鄒昇龍. 2 Outline Introduction to HMM Activity of HMM Problem and Solution Conclusion Reference.

Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie.

1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.

Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.

Efficient Language Model Look-ahead Probabilities Generation Using Lower Order LM Look-ahead Information Langzhou Chen and K. K. Chin Toshiba Research.

1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.

MODEL FOR DEALING WITH DUAL-ROLE FACTORS IN DEA: EXTENSIONS GONGBING BI,JINGJING DING,LIANG LIANG,JIE WU Presenter : Gongbing Bi School of Management University.

Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,

Large Vocabulary Continuous Speech Recognition. Subword Speech Units.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Weakly Supervised Training For Parsing Mandarin Broadcast Transcripts Wen Wang ICASSP 2008 Min-Hsuan Lai Department of Computer Science & Information Engineering.

Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.

MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.

A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-

Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.

Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.

Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.

A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.

Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者：郝柏翰 2013/05/23.

Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ； Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.

Reza Yazdani Albert Segura José-María Arnau Antonio González

Online Multiscale Dynamic Topic Models

Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.

An overview of decoding techniques for LVCSR

Adversarial Learning for Neural Dialogue Generation

Mohamed Kamel Omar and Lidia Mangu ICASSP 2007

TECHNICAL REPORTS WRITING

Presentation transcript:

UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa Division of Information Sciences, Graduate School of Advanced Integration Science, Chiba University 2012 ICASSP 2014/04/14 報告者 : 陳思澄

Outline Introduction Direct likelihood maximization selection Unsupervised self adaptation Unsupervised CV adaptation Experiments Conclusions

Introduction In deployment of speech recognition systems, it is often required to train a high performance language model using an existing domain independent data. Since such training data contains sentences that are not relevant to the recognition task, it is important to select a subset of the data to make a domain matched language model. When in-domain development data is available, a strategy is to make a model based on the development data, and select sentences in the training data that gives higher likelihood with that model.

Introduction However, this strategy has an essential problem that the most frequent patterns in the development data are excessively emphasized in the selected training subset. To avoid the problem, another selection strategy has been proposed that selects a subset of training data so as to directly maximize development set likelihood. We apply DLMS (Direct likelihood maximization selection) to iterative unsupervised adaptation for presentation speech recognition.

Introduction A problem of the iterative unsupervised adaptation is that adapted models are estimated including recognition errors and it limits the adaptation performance. To solve the problem, we introduce the framework of unsupervised cross-validation (CV) adaptation that has originally been proposed for acoustic model adaptation.

Direct likelihood maximization selection It selects articles in a training set so as to maximize adaptation data likelihood based on an objective function shown in Equation (1). where CA (w) is count of word w in an adaptation set and PT is a language model estimated on a selected training subset. First an importance of an article is scored by the objective function (1) making a model using a whole training set removing that article. Lower score means that article is important for the adaptation set. After all the articles are scored independently, N articles with the lowest scores are selected and an adapted language model is made.

Direct likelihood maximization selection The relative entropy method proposed by Sethy et al. [2] is based on minimizing relative entropy shown in Equation (2). where PA is a language model estimated on an adaptation subset.

Unsupervised self adaptation Unsupervised self adaptation. M is model, T is recognition hypothesis, D is recognition/adaptation data. Hypothesis T is made from data D using initial model M. Using that hypothesis, adapted model M is made. The adapted model is used to recognize the same data in the next iteration.

Unsupervised self adaptation A widely used unsupervised adaptation framework is to first decode recognition data and use that output to estimate an adapted model. Since decoder output is used as adaptation data, it has an advantage that it does not require a separate in-domain development data. A disadvantage of this procedure is that recognition errors in a decoding output are reinforced during the iteration, since the same data is used both in the decoding step and the model update step. This problem decreases the efficiency of the adaptation.

Unsupervised CV adaptation M is model and D is recognition/adaptation data. M(k) is k-th CV model, D(k) is k-th exclusive data subset, and T (k) is recognition hypothesis of k-th subset using M(k). The data used for the decoding step and for the model update step are separated. M(0) denotes a global CV model and T (0) denotes a hypothesis by M(0).

Unsupervised CV adaptation With this procedure, the data used for the decoding step and for the model update step are effectively separated minimizing the undesired effect of reinforcing the errors. Because the utterances used for model estimation are not decoded by that model, it is unlikely that the same recognition error is repeated.

EXPERIMENTAL SETUP Corpus: Spontaneous Japanese (CSJ) The speech recognition system was based on the T 3 WFST decoder. The initial language model used in the recognition system was a trigram model estimated using all the training data. The dictionary size : 30k

EXPERIMENTAL RESULTS Tables 1 and 2 show properties of language models made by the self adaptation and five fold CV adaptation, respectively. About 1/4 to 1/3 sentences were selected by the selective adaptations.

Table 2. Properties of language models estimated based on proposed CV adaptation. Zero-th iteration shows results of a presentation independent initial model estimated using all training data. Test-set perplexities by the CV models are evaluated according to CV partitioning and are averaged in log domain.

Figure 3 shows number of iterations and word error rates. The first iteration by self adaptation corresponds to the conventional two-pass method. Fig. 3. Number of iterations and word error rates. Zero-th iteration indicates initial model. CV-folds K is five for CV adaptation. GCV indicates global CV model.

Figure 4 shows the number of CV-folds K and word error rates. Fig. 4. Number of CV-folds K and word error rates at fourth iteration. CV-fold 1 indicates self adaptation.

Conclusions Direct likelihood maximization selection (DLMS) has been applied to unsupervised language model adaptation for presentation speech recognition. In order to reduce the disadvantage of the self adaptation strategy, unsupervised CV adaptation framework has been introduced that has originally been proposed for acoustic model adaptation. Experimental results show that adaptation performance by DLMS is further improved by incorporating the CV adaptation framework.