Jun Wu and Sanjeev Khudanpur Center for Language and Speech Processing

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Zhijie Yan, Qiang Huo and Jian Xu Microsoft Research Asia
Language Modeling.
Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,
Center for Language and Speech Processing, The Johns Hopkins University. April Maximum Entropy Language Modeling with Syntactic, Semantic and Collocational.
Error Analysis: Indicators of the Success of Adaptation Arindam Mandal, Mari Ostendorf, & Ivan Bulyko University of Washington.
VESTEL database realistic telephone speech corpus:  PRNOK5TR: 5810 utterances in the training set  PERFDV: 2502 utterances in testing set 1 (vocabulary.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,
1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
1IBM T.J. Waston CLSP, The Johns Hopkins University Using Random Forests Language Models in IBM RT-04 CTS Peng Xu 1 and Lidia Mangu 2 1. CLSP, the Johns.
11/24/2006 CLSP, The Johns Hopkins University Random Forests for Language Modeling Peng Xu and Frederick Jelinek IPAM: January 24, 2006.
March 24, 2005EARS STT Workshop1 A Study of Some Factors Impacting SuperARV Language Modeling Wen Wang 1 Andreas Stolcke 1 Mary P. Harper 2 1. Speech Technology.
Discriminative Syntactic Language Modeling for Speech Recognition Michael Collins, Brian Roark Murat, Saraclar MIT CSAIL, OGI/OHSU, Bogazici University.
COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
CMU Robust Vocabulary-Independent Speech Recognition System Hsiao-Wuen Hon and Kai-Fu Lee ICASSP 1991 Presenter: Fang-Hui CHU.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
1 Using a Large LM Nicolae Duta Richard Schwartz EARS Technical Workshop September 5, Martigny, Switzerland.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Recurrent neural network based language model Tom´aˇs Mikolov, Martin Karafia´t, Luka´sˇ Burget, Jan “Honza” Cˇernocky, Sanjeev Khudanpur INTERSPEECH 2010.
Problems of Modeling Phone Deletion in Conversational Speech for Speech Recognition Brian Mak and Tom Ko Hong Kong University of Science and Technology.
Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.
Supertagging CMSC Natural Language Processing January 31, 2006.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
An i-Vector PLDA based Gender Identification Approach for Severely Distorted and Multilingual DARPA RATS Data Shivesh Ranjan, Gang Liu and John H. L. Hansen.
Center for Language and Speech Processing, The Johns Hopkins University. May Maximum Entropy Language Modeling with Syntactic, Semantic and Collocational.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.
H ADVANCES IN MANDARIN BROADCAST SPEECH RECOGNITION Overview Goal Build a highly accurate Mandarin speech recognizer for broadcast news (BN) and broadcast.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Bayes Risk Minimization using Metric Loss Functions R. Schlüter, T. Scharrenbach, V. Steinbiss, H. Ney Present by Fang-Hui, Chu.
Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”
Smoothing Issues in the Strucutred Language Model
N-best list reranking using higher level phonetic, lexical, syntactic and semantic knowledge sources Mithun Balakrishna, Dan Moldovan and Ellis K. Cave.
True/False questions (3pts*2)
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
Language Modelling By Chauhan Rohan, Dubois Antoine & Falcon Perez Ricardo Supervised by Gangireddy Siva 1.
ASSESSING THE USABILITY OF MODERN STANDARD ARABIC DATA IN ENHANCING THE LANGUAGE MODEL OF LIMITED SIZE DIALECT CONVERSATIONS Authers:- Tiba Zaki Abulhameed.
CS 388: Natural Language Processing: Syntactic Parsing
Jun Wu Department of Computer Science and
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Chapter 2: A Simple One Pass Compiler
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Presented by Wen-Hung Tsai Speech Lab, CSIE, NTNU 2005/07/13
Research on the Modeling of Chinese Continuous Speech Recognition
David Kauchak CS159 – Spring 2019
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Hsien-Chin Lin, Chi-Yu Yang, Hung-Yi Lee, Lin-shan Lee
Learning Long-Term Temporal Features
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presenter : Jen-Wei Kuo
David Kauchak CS159 – Spring 2019
Artificial Intelligence 2004 Speech & Natural Language Processing
Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Professor Junghoo “John” Cho UCLA
Presentation transcript:

Combining Non-local, Syntactic and N-gram Dependencies in Language Modeling Jun Wu and Sanjeev Khudanpur Center for Language and Speech Processing Johns Hopkins University Baltimore, MD 21218 September 9, 1999 NSF STIMULATE Grant No. IRI-9618874 Center for Language and Speech Processing, Johns Hopkins University.

Center for Language and Speech Processing, Johns Hopkins University. Motivation Analysts and financial officials in the former British colony consider the contract essential to the revival of the Hong Kong futures exchange. N-gram models only take local correlation between words into account. Several dependencies in natural language with longer and sentence-structure dependent spans may compensate for this deficiency. Need a model that exploits topic and syntax. Center for Language and Speech Processing, Johns Hopkins University.

Training a Topic Sensitive Model Cluster the training data by topic. TF-IDF vector (excluding stop words). Cosine similarity. K-means clustering. Select topic dependent words: Estimate an ME model with topic unigram constraints: f ( w ) f ( w ) × log t > threshold t f ( w ) ) , ( | 1 2 topic w Z e P i - × = l where ] [ # , ) | ( 1 2 topic w P i = å - Center for Language and Speech Processing, Johns Hopkins University.

Recognition Using a Topic-Sensitive Model Detect the current topic from recognizer’s N-best hypotheses. Using N-best hypotheses causes little degradation (in perplexity and WER). Assign a new topic for each utterance. Topic assignment for each utterance is better than topic assignment for the whole conversation. Recognize lattices using the topic sensitive model. See Khudanpur and Wu ICASSP’99 for details. Center for Language and Speech Processing, Johns Hopkins University.

Exploiting Syntactic Dependencies contract NP ended VP The h with a loss of 7 cents after w DT NN VBD IN CD NNS i-2 i-1 i All sentences in the training set are parsed by a left-to-right parser. A stack of parse trees for each sentence prefix is generated. T i Center for Language and Speech Processing, Johns Hopkins University.

Exploiting Syntactic Dependencies (Cont.) A probability is assigned to each word as: å Î - × = i S T W h w P ) | ( , 1 2 r It is assumed that most of the useful information is embedded in the 2 preceding words and 2 preceding head words. See Chelba and Jelinek Eurospeech’99 for details. Center for Language and Speech Processing, Johns Hopkins University.

Training a Syntactic ME Model Estimate an ME model with syntactic constraints: ) , | ( 1 2 - i h w P × = Z e l where ] , [ # ) | ( 1 2 - = å i w h P Center for Language and Speech Processing, Johns Hopkins University.

Combining Topic, Syntactic and N-gram Dependencies in an ME Framework Probabilities are assigned as: å Î - × = i S T W topic h w P ) | ( , 1 2 r The ME composite model is trained: P ( w | w , w , h , h , topic ) i i - 2 i - 1 i - 2 i - 1 e l ( w ) × e l ( w , w ) × e l ( w , w , w ) × e l ( h , w ) × e l ( h , h , w ) × e l ( topic , w ) i i - 1 i i - 2 i - 1 i i - 1 i i - 2 i - 1 i i = Z ( w , w , h , h , topic ) i - 2 i - 1 i - 2 i - 1 Only marginal counts are required to constrain the model. Center for Language and Speech Processing, Johns Hopkins University.

Experimental Setup (Switchboard) American English speakers. Conversational (human to human) telephone speech. 22K Vocabulary. 2 hours test set (18K words). State-of-art speaker independent systems: 30-35% WER. Results presented here do not have speaker adaptation. Center for Language and Speech Processing, Johns Hopkins University.

Center for Language and Speech Processing, Johns Hopkins University. Experimental Results Baseline trigram WER is 38.5%. Topic-dependent constraints alone reduce perplexity by 7% and WER by 0.7% absolute. Head word constraints result in 7% reduction in perplexity and 0.8% absolute in WER. Topic-dependent constraints and syntactic constraints together reduce the perplexity by 12% and WER by 1.3% absolute. The gains from topic and syntactic dependencies are nearly additive. Center for Language and Speech Processing, Johns Hopkins University.

Content Words vs. Stop words 1/5 of test tokens are content-bearing words. The topic sensitive model reduces WER by 1.4% on content words, which is twice as much as the overall improvement (0.7%). The syntactic model improves WER on both content words and stop words evenly. The composite model has the advantage of both models and reduces WER on content words more significantly (1.8%). Center for Language and Speech Processing, Johns Hopkins University.

Head Words inside vs. outside 3gram Range The WER of the baseline trigram model is relatively high when head words are beyond trigram range. Topic model helps when trigram is inappropriate. The WER reduction for syntactic model (1.5%) is more than the overall reduction (0.8%) when head words are outside trigram range. The WER reduction for composite model (2.3%) is more than the overall reduction (1.3%) when head words are inside trigram range. Center for Language and Speech Processing, Johns Hopkins University.

Further Insight Into the Performance The composite model reduces the WER of content words by 3.5% absolute when the syntactic predicting information is beyond trigram range. Center for Language and Speech Processing, Johns Hopkins University.

Center for Language and Speech Processing, Johns Hopkins University. Concluding Remarks Topic LM reduces PPL by 7%, WER by 0.7% (absolute). Syntax LM reduces PPL by 7%, WER by 0.8% (absolute). Composite LM reduces PPL by 12%, WER by 1.3% (absolute). Non-local dependencies are complementary and their gains are almost additive. The WER on content words reduces by 1.8%, most of it due to topic dependence. The WER on head words beyond trigram range reduces by 2.3%, most of it due to syntactic dependence. Center for Language and Speech Processing, Johns Hopkins University.

Ongoing and Future Work Further improve the model by using non-terminal labels in the partial parse. Apply this model to lattice rescoring. Apply this method to other tasks (Broadcast News). Center for Language and Speech Processing, Johns Hopkins University.