Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP 2011 2014/1/17.

Slides:

Advertisements

Similar presentations

Statistics for Improving the Efficiency of Public Administration Daniel Peña Universidad Carlos III Madrid, Spain NTTS 2009 Brussels.

Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.

Dynamic Bayesian Networks (DBNs)

Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.

Bag-Of-Word normalized n-gram models ISCA 2008 Abhinav Sethy, Bhuvana Ramabhadran IBM T. J. Watson Research Center Yorktown Heights, NY Presented by Patty.

Visual Recognition Tutorial

A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.

Chapter 6 Introduction to Sampling Distributions

Statistical Methods Chichang Jou Tamkang University.

Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.

QBM117 Business Statistics Statistical Inference Sampling Distribution of the Sample Mean 1.

Carnegie Mellon Exact Maximum Likelihood Estimation for Word Mixtures Yi Zhang & Jamie Callan Carnegie Mellon University Wei Xu.

Chapter 11 Multiple Regression.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Distributed Representations of Sentences and Documents

Discriminant Analysis Testing latent variables as predictors of groups.

JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, (2014) BERLIN CHEN, YI-WEN CHEN, KUAN-YU CHEN, HSIN-MIN WANG2 AND KUEN-TYNG YU Department of Computer.

Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.

A Survey of ICASSP 2013 Language Model Department of Computer Science & Information Engineering National Taiwan Normal University 報告者：郝柏翰 2013/06/19.

1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.

Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.

Combining Statistical Language Models via the Latent Maximum Entropy Principle Shaojum Wang, Dale Schuurmans, Fuchum Peng, Yunxin Zhao.

Japanese Spontaneous Spoken Document Retrieval Using NMF-Based Topic Models Xinhui Hu, Hideki Kashioka, Ryosuke Isotani, and Satoshi Nakamura National.

Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,

DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Efficient Language Model Look-ahead Probabilities Generation Using Lower Order LM Look-ahead Information Langzhou Chen and K. K. Chin Toshiba Research.

1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Summary  Extractive speech summarization aims to automatically select an indicative set of sentences from a spoken document to concisely represent the.

Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.

Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.

UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.

Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.

MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.

A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-

Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.

Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.

NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.

Topics Semester I Descriptive statistics Time series Semester II Sampling Statistical Inference: Estimation, Hypothesis testing Relationships, casual models.

1 Tom Edgar’s Contribution to Model Reduction as an introduction to Global Sensitivity Analysis Procedure Accounting for Effect of Available Experimental.

An Empirical Study on Language Model Adaptation Jianfeng Gao, Hisami Suzuki, Microsoft Research Wei Yuan Shanghai Jiao Tong University Presented by Patty.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.

Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者：郝柏翰 2013/05/23.

Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.

Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

Chapter 7. Classification and Prediction

Ch3: Model Building through Regression

Language Models for Information Retrieval

Econ 3790: Business and Economics Statistics

Propagation of Error Berlin Chen

Presentation transcript:

Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17 Reporter: 陳思澄

Outline Introduction Basic Relevance Model(RM) Topic-based Relevance Model Modeling Pairwise Word Association Experimental Conclusion

In the relevance modeling to IR, each query is assumed to be associated with an unknown relevance class, and documents that are relevant to the information need expressed in the query are samples drawn from. When RM is applied to language modeling in speech recognition, we can conceptually regard the search history as a query and each of its immediately succeeding words as a document, and estimate a relevance model for modeling the relationship between and. Introduction Relevance Documents Query

Basic Relevance Model The task of language modeling in speech recognition can be interpreted as calculating the conditional probability. is a search history, usually expressed as a sequence, and is one of its possible immediately succeeding word. Because the relevance class of each search history is not known in advance, A local feedback-like can be used to obtain a set of relevant documents to estimate the joint probability.

Basic Relevance Model where is the probability that we would randomly select and is the joint probability of simultaneously observing H and w in. The joint probability of observing H together with w is: Bag-of-word assumption: Assume the words are conditionally Independent given and their order is no importance.

Basic Relevance Model The conditional probability: The background n-gram language model trained on a large general corpus can provide the generic constraint information of lexical regularities.

TRM makes a step forward to incorporate latent topic information into RM modeling Relevance documents of each search history are assumed to share a same set of latent topic variables describing the “word-document” co-occurrence characteristics. Topic-based Relevance Model

TRM can be represented by: ( Word of the document all come from the same topic.)

Modeling Pairwise Word Association Instead of using RM to model the association between an entire search history and a newly decoded word, we can also use RM to render the pairwise word association between a word in the history and a newly decoded word.

Modeling Pairwise Word Association A “composite” conditional probability for the search history to predict can be obtained by linearly combining of all words in the history: Where the value of the nonnegative weighting coefficients are empirically set to be exponentially decayed.

By the same token, a set of latent topics to describe word-word co-occurrence relationships in a relevant document, and the pairwise word association between a history word and the decoded word is thus modeled by

Experimental setup Speech corpus: 196 hours(MATBN) Vocabulary size: 72 thousands words Trigram language model was estimated from a background text corpus consisting of 170 million Chinese characters. The baseline rescoring procedure with the background trigram language model results in a character error rate(CER) of 20.08% on the test set. Experimental 1. We assess the effectiveness of RM and PRM with respect to different numbers of retrieved documents being used to approximate the relevance class. 2.Measure the goodness of RM and PRM when a set of latent topic is additionally employed to describe the word-word co-occurrence relationships in a relevant document,when the resulting models are TRM and TPRM. 3. Compare the proposed methods with several well-practiced language model adaption methods.

Experimental Document No. RMPRM This reveals that only a small subset of relevant documents retrieved from the contemporaneous corpus is sufficient enough for dynamic language model adaptation. PRM shows its superiority over RM for almost all adaptation settings. Results of RM and PRM (in CER(%))

Experimental Topic NOTRMTPRM Uniform Priors Dirichet Priors Results of TRM and TPRM (in CER(%)) While simply assuming that the model parameters are uniformly distributed tends to perform Slightly worse than that with the Dirichlat prior assumption with their best setting.

Experimental These results are at the same performance level as that obtained by TPRM. On the other hand, TBLM has its best CER of 19.32%, for which the corresponding number of trigger pairs was determined using the development set. Our proposed methods seem to be good surrogates for the exiting language model adaptation methods, in terms of the CER reduction. Topic No.PLSALDAWTMWVM

Conclusion We study a novel use of relevance information for dynamic language model adaptation in speech recognition. Our methods not only inherit the merits of several existing techniques but also provide a flexible but systematic way to render the lexical and topical relationships between a search history and an upcoming word. Empirical results on large vocabulary continuous speech recognition seem to demonstrate the utility of the presented models. These methods can also be used to expand query models for spoken document retrieval (SDR) tasks.