COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT, 6900-800.

Slides:

Advertisements

Similar presentations

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.

Advertisements

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.

N-gram model limitations Important question was asked in class: what do we do about N-grams which were not in our training corpus? Answer given: we distribute.

SI485i : NLP Set 4 Smoothing Language Models Fall 2012 : Chambers.

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.

Measuring the Influence of Long Range Dependencies with Neural Network Language Models Le Hai Son, Alexandre Allauzen, Franc¸ois Yvon Univ. Paris-Sud and.

Statistical Topic Modeling part 1

Bag-Of-Word normalized n-gram models ISCA 2008 Abhinav Sethy, Bhuvana Ramabhadran IBM T. J. Watson Research Center Yorktown Heights, NY Presented by Patty.

Generative Topic Models for Community Analysis

Topic Modeling with Network Regularization Md Mustafizur Rahman.

1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.

Distributed Representations of Sentences and Documents

Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.

An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.

1 Advanced Smoothing, Evaluation of Language Models.

Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.

A Survey of ICASSP 2013 Language Model Department of Computer Science & Information Engineering National Taiwan Normal University 報告者：郝柏翰 2013/06/19.

1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.

11/24/2006 CLSP, The Johns Hopkins University Random Forests for Language Modeling Peng Xu and Frederick Jelinek IPAM: January 24, 2006.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Large Language Models in Machine Translation Conference on Empirical Methods in Natural Language Processing 2007 報告者：郝柏翰 2013/06/04 Thorsten Brants, Ashok.

A Comparison of Statistical Significance Tests for Information Retrieval Evaluation CIKM´07, November 2007.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.

Combining Statistical Language Models via the Latent Maximum Entropy Principle Shaojum Wang, Dale Schuurmans, Fuchum Peng, Yunxin Zhao.

Japanese Spontaneous Spoken Document Retrieval Using NMF-Based Topic Models Xinhui Hu, Hideki Kashioka, Ryosuke Isotani, and Satoshi Nakamura National.

DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.

Chapter6. Statistical Inference : n-gram Model over Sparse Data 이 동 훈 Foundations of Statistic Natural Language Processing.

Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.

Multiple parallel hidden layers and other improvements to recurrent neural network language modeling ICASSP 2013 Diamantino Caseiro, Andrej Ljolje AT&T.

1 Hidden Markov Model 報告人：鄒昇龍. 2 Outline Introduction to HMM Activity of HMM Problem and Solution Conclusion Reference.

Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.

Handing Uncertain Observations in Unsupervised Topic-Mixture Language Model Adaptation Ekapol Chuangsuwanich 1, Shinji Watanabe 2, Takaaki Hori 2, Tomoharu.

Efficient Language Model Look-ahead Probabilities Generation Using Lower Order LM Look-ahead Information Langzhou Chen and K. K. Chin Toshiba Research.

1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.

1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.

Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,

UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.

Recurrent neural network based language model Tom´aˇs Mikolov, Martin Karafia´t, Luka´sˇ Burget, Jan “Honza” Cˇernocky, Sanjeev Khudanpur INTERSPEECH 2010.

Yuya Akita , Tatsuya Kawahara

National Taiwan University, Taiwan

Ngram models and the Sparcity problem. The task Find a probability distribution for the current word in a text (utterance, etc.), given what the last.

A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.

Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.

MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.

A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-

Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.

Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.

Link Distribution on Wikipedia [0407]KwangHee Park.

Natural Language Processing Statistical Inference: n-grams

A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR

Statistical Models for Automatic Speech Recognition Lukáš Burget.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

HANGMAN OPTIMIZATION Kyle Anderson, Sean Barton and Brandyn Deffinbaugh.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.

An Empirical Study on Language Model Adaptation Jianfeng Gao, Hisami Suzuki, Microsoft Research Wei Yuan Shanghai Jiao Tong University Presented by Patty.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者：郝柏翰 2013/05/23.

Active, Semi-Supervised Learning for Textual Information Access Anastasia Krithara¹, Cyril Goutte², Massih-Reza Amini³, Jean-Michel Renders¹ Massih-Reza.

Language Modeling Part II: Smoothing Techniques Niranjan Balasubramanian Slide Credits: Chris Manning, Dan Jurafsky, Mausam.

Olivier Siohan David Rybach

Statistical Models for Automatic Speech Recognition

Neural Language Model CS246 Junghoo “John” Cho.

N-Gram Model Formulas Word sequences Chain rule of probability

Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,

Presentation transcript:

COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT, de la Gauchetiere Ouest, Montreal (Quebec), H5A 1K6, Canada 2014/03/28 報告者 : 陳思澄

Outline Introduction Review of PLSA and UBPLSA models CPLSA model Comparison of UBPLSA & CPLSA models Experiments Conclusions

Introduction In this paper, we propose a novel context-based probabilistic latent semantic analysis (PLSA) language model for speech recognition. We compare our approach with a recently proposed unsmoothed bigram PLSA model where only the seen bigram probabilities are calculated, which causes computing the incorrect topic probability for the present history context of the unseen document. This allows computing all the possible bigram probabilities of the seen history context using the model. It properly computes the topic probability of an unseen document for each history context present in the document.

Review of PLSA and UBPLSA models CPLSA model PLSA model: First a document is selected with probability A topic is then chosen with probability and finally a word is generated with probability The probability of word given a document can be estimated as:

UBPLSA MODEL The bigram PLSA model uses in computing the probability of word given the bigram history and the document : Updated bigram PLSA (UBPLSA) model, It can model the topic probability for the document given a context, using the word co-occurrences in the document. In this model, the probability of the word given the document dl and the word history is computed as:

The EM procedure for training the model takes the following two steps: E-step: M-step:

Proposed CPLSA model The CPLSA model is similar to the original PLSA model except the topic is further conditioned on the history context as like the UBPLSA model.

Proposed CPLSA model Using this model, we can compute the bigram probability using the unigram probabilities of topics as: E-step: M-step:

Proposed CPLSA model From Equations 8 and 10, we can see that the model can compute all the possible bigram probabilities of the seen history context in the training set. Therefore, the model can overcome the problem of computing topic probability of the test document using the UBPLSA model, which causes the problem in the computation of the bigram probabilities of the test document.

COMPARISON OF UBPLSA & CPLSA MODELS UBPLSA model : The topic weights of the unseen test document cannot be computed properly as for some history contexts. The topic probabilities are assigned zeros as the bigram probabilities in the training model are not smoothed. CPLSA model : Can solve the problem of finding the topic probabilities for the test set as the model can assign probabilities to all possible bigrams of the seen history context in the training set. The model needs the unigram probabilities for topics that can reduce a vast amount of memory requirements and the complexity over the UBPLSA model.

EXPERIMENTS Experimental setup: We randomly selected 500 documents from the ’87-89 WSJ corpus for training the UBPLSA and the CPLSA models. The total number of words in the documents is 224,995. We tested the proposed approach for various sizes of topics. We performed the experiments five times and the results are averaged.

EXPERIMENTS The perplexity results are described in Table 1. we can see that the perplexities are decreased with increasing topics. The proposed CPLSA model outperforms both the PLSA and the UBPLSA models. Both the UBPLSA and CPLSA models outperform the PLSA model significantly.

We performed the paired t-test on the perplexity results of the UBPLSA and CPLSA models and their interpolated form with the significance level of The p-values for different topic sizes are described in Table 2. P-values obtained from the paired t test on the perplexity results: we can note that all p-values are less than the significance level Therefore, the perplexity improvements of CPLSA model over UBPLSA model are statistically significant.

In the first pass, we used the back-off trigram background language model for lattice generation. In the second pass, we applied the interpolated model of the LM adaptation approaches for lattice rescoring. WER results for different topic sizes: Background +PLSA Background +UBPLSA Background +CPLSA Topic20 (WER) 8.11%7.84%7.62%7.41% Topic40 (WER) 8.11%7.84%7.66%7.34%

We also performed a paired t test on the WER results for the interpolated form of the UBPLSA and CPLSA models with a significance level The p-values of the test are explained in Table 3. We can see that the p-values are smaller than the significance level Therefore, the WER improvements are statistically significant.

Conclusions In this paper, we proposed a new context-based PLSA model where the topic is further conditioned on the history context to the original PLSA model. Our proposed model gives a way to find all the possible bigram probabilities of the seen history context in the training set, which helps to find the topic weights of the unseen test documents correctly and thereby gives the correct bigram probabilities to the test document. Moreover, the proposed approach saves complexity and memory space requirements over the other approach as the proposed approach uses unigram probabilities instead of bigram probabilities for topics.