Smoothing Issues in the Strucutred Language Model

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Albert Gatt Corpora and Statistical Methods Lecture 11.
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,
Center for Language and Speech Processing, The Johns Hopkins University. April Maximum Entropy Language Modeling with Syntactic, Semantic and Collocational.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Original Tree:
Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Växjö University Joakim Nivre Växjö University. 2 Who? Växjö University (800) School of Mathematics and Systems Engineering (120) Computer Science division.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.
1IBM T.J. Waston CLSP, The Johns Hopkins University Using Random Forests Language Models in IBM RT-04 CTS Peng Xu 1 and Lidia Mangu 2 1. CLSP, the Johns.
11/24/2006 CLSP, The Johns Hopkins University Random Forests for Language Modeling Peng Xu and Frederick Jelinek IPAM: January 24, 2006.
March 24, 2005EARS STT Workshop1 A Study of Some Factors Impacting SuperARV Language Modeling Wen Wang 1 Andreas Stolcke 1 Mary P. Harper 2 1. Speech Technology.
Discriminative Syntactic Language Modeling for Speech Recognition Michael Collins, Brian Roark Murat, Saraclar MIT CSAIL, OGI/OHSU, Bogazici University.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,
Chapter6. Statistical Inference : n-gram Model over Sparse Data 이 동 훈 Foundations of Statistic Natural Language Processing.
University of Edinburgh27/10/20151 Lexical Dependency Parsing Chris Brew OhioState University.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
Tokenization & POS-Tagging
Recurrent neural network based language model Tom´aˇs Mikolov, Martin Karafia´t, Luka´sˇ Burget, Jan “Honza” Cˇernocky, Sanjeev Khudanpur INTERSPEECH 2010.
Language and Statistics
CS Machine Learning and Statistical Natural Language Processing Prof. Shlomo Argamon, Room: 237C Office Hours: Mon 3-4 PM Book:
Supertagging CMSC Natural Language Processing January 31, 2006.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Center for Language and Speech Processing, The Johns Hopkins University. May Maximum Entropy Language Modeling with Syntactic, Semantic and Collocational.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
Using Neural Network Language Models for LVCSR Holger Schwenk and Jean-Luc Gauvain Presented by Erin Fitzgerald CLSP Reading Group December 10, 2004.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Language Model for Machine Translation Jang, HaYoung.
N-best list reranking using higher level phonetic, lexical, syntactic and semantic knowledge sources Mithun Balakrishna, Dan Moldovan and Ellis K. Cave.
CSC 594 Topics in AI – Natural Language Processing
Exploiting the distance and the occurrence of words for language modeling Chong Tze Yuang.
PRESENTED BY: PEAR A BHUIYAN
Tools for Natural Language Processing Applications
CKY Parser 0Book 1 the 2 flight 3 through 4 Houston5 6/19/2018
Chapter 11 Language Modeling
Authorship Attribution Using Probabilistic Context-Free Grammars
CSC 594 Topics in AI – Natural Language Processing
Syntactic Category Prediction for Improving Translation Quality in English-Korean Machine Translation Sung-Dong Kim, Dept. of Computer Engineering, Hansung.
Language Modelling By Chauhan Rohan, Dubois Antoine & Falcon Perez Ricardo Supervised by Gangireddy Siva 1.
CS 388: Natural Language Processing: Statistical Parsing
Probabilistic and Lexicalized Parsing
Machine Learning in Natural Language Processing
CSCE 590 Web Scraping - NLTK
CKY Parser 0Book 1 the 2 flight 3 through 4 Houston5 11/16/2018
LING/C SC 581: Advanced Computational Linguistics
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
Jun Wu Department of Computer Science and
LING 581: Advanced Computational Linguistics
Jun Wu and Sanjeev Khudanpur Center for Language and Speech Processing
Language and Statistics
CSCE 771 Natural Language Processing
Presented by Wen-Hung Tsai Speech Lab, CSIE, NTNU 2005/07/13
Statistical Machine Translation Papers from COLING 2004
CSCE 590 Web Scraping - NLTK
Artificial Intelligence 2004 Speech & Natural Language Processing
Deep Neural Network Language Models
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Presentation transcript:

Smoothing Issues in the Strucutred Language Model The Center for Language and Speech Processing The Johns Hopkins University 3400 N. Charles Street, Barton Hall Baltimore, MD 21218 Woosung Kim, Sanjeev Khudanpur, and Jun Wu The Center for Language and Speech Processing, The Johns Hopkins University {woosung, sanjeev, junwu}@clsp.jhu.edu Introduction The Structured Language Model(SLM) - An attempt to exploit the syntactic structure of natural language - Consists of a predictor, a tagger and a parser - Jointly assigns a probability to a word sequence and parse structure - Still suffers from data sparseness problem, Deleted Interpolation(DI) has been used  Use of Kneser-Ney smoothing to improve the performance Experiment Result N-Best Rescoring Test Set PPL as a Function of l ASR WER for SWB Two corpora Wall Street Journal(WSJ) Upenn Treebank for LM PPL test Switchboard(SWB) For ASR WER test as well as LM PPL Tokenization Original SWB tokenization Examples : They’re, It’s, etc.  Not Suitable for syntactic analysis Treebank tokenization Examples : They ‘re, It ‘s, etc. Kneser-Ney Smoothing Backoff Nonlinear Interpolation Smoothing 3gram SLM Inptl Deleted Intpl 39.1% 38.6% 38.2% KN-BO(Predictor) 38.3% 37.7% 37.5% KN-BO(All Modules) 37.8% Nonlinear Intpl 38.1% 37.6% NI w/Deleted Est. Speech Recognizer (Baseline LM) 100 Best Hyp Speech (New LM) Rescoring 1 hypothesis Database Size Specifications(in Words) Item WSJ SWB Word Voc. Part-Of-Speech Tags Non-Terminal Tags Parser Operations 10K(open) 40 54 136 21K(closed) 49 64 112 LM Dev. Set LM Check Set LM Test Set ASR Test Set 885K 117K 82K - 2.07M 216K 20K Concluding Remarks KN smoothing of the SLM shows modest but consistent improvements – both PPL and WER Future Work SLM with Maximum Entropy Models But Maximum Entropy Model training requires heavy computation Fruitful results in the selection of features for the Maximum Entropy Models The Structured Language Model(SLM) Example of a Partial Parse Probability estimation in the SLM ended VP with PP loss NP of PP Language Model Perplexity contract NP loss NP cents NP WSJ Corpus SWB EM Iter. 3gram SLM Intpl Smoothing EM0 EM3 70 73 72 67 66 Deleted 162 166 154 149 146 64 63 60 KN-BO (Predictor) 152 139 137 All Modules 170 153 141 140 65 61 Nonlinear 132 131 NI w/Deleted Estimation 145 150 130 The contract ended with a loss of 7 cents after DT NN VBD IN DT NN IN CD NNS Parse tree probability predictor tagger LM PPL parser This research was partially supported by the U.S. National Science Foundation via STIMULATE grant No. 9618874