Discriminative Syntactic Language Modeling for Speech Recognition Michael Collins, Brian Roark Murat, Saraclar MIT CSAIL, OGI/OHSU, Bogazici University.

Slides:



Advertisements
Similar presentations
Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Search-Based Structured Prediction
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.
Measuring the Influence of Long Range Dependencies with Neural Network Language Models Le Hai Son, Alexandre Allauzen, Franc¸ois Yvon Univ. Paris-Sud and.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
Syntactic And Sub-lexical Features For Turkish Discriminative Language Models ICASSP 2010 Ebru Arısoy, Murat Sarac¸lar, Brian Roark, Izhak Shafran Bang-Xuan.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
1 I256: Applied Natural Language Processing Marti Hearst Sept 25, 2006.
An SVM Based Voting Algorithm with Application to Parse Reranking Paper by Libin Shen and Aravind K. Joshi Presented by Amit Wolfenfeld.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
A Survey of ICASSP 2013 Language Model Department of Computer Science & Information Engineering National Taiwan Normal University 報告者:郝柏翰 2013/06/19.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
March 24, 2005EARS STT Workshop1 A Study of Some Factors Impacting SuperARV Language Modeling Wen Wang 1 Andreas Stolcke 1 Mary P. Harper 2 1. Speech Technology.
A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Date: 2014/02/25 Author: Aliaksei Severyn, Massimo Nicosia, Aleessandro Moschitti Source: CIKM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Building.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.
Handing Uncertain Observations in Unsupervised Topic-Mixture Language Model Adaptation Ekapol Chuangsuwanich 1, Shinji Watanabe 2, Takaaki Hori 2, Tomoharu.
Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.
Daisy Arias Math 382/Lab November 16, 2010 Fall 2010.
CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
CSA2050 Introduction to Computational Linguistics Parsing I.
Recurrent neural network based language model Tom´aˇs Mikolov, Martin Karafia´t, Luka´sˇ Burget, Jan “Honza” Cˇernocky, Sanjeev Khudanpur INTERSPEECH 2010.
Hidden-Variable Models for Discriminative Reranking Jiawen, Liu Spoken Language Processing Lab, CSIE National Taiwan Normal University Reference: Hidden-Variable.
Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature Deyu Zhou, Yulan He and Chee Keong Kwoh School of Computer Engineering.
Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협.
Weakly Supervised Training For Parsing Mandarin Broadcast Transcripts Wen Wang ICASSP 2008 Min-Hsuan Lai Department of Computer Science & Information Engineering.
Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.
Supertagging CMSC Natural Language Processing January 31, 2006.
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Chapter 8. Situated Dialogue Processing for Human-Robot Interaction in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans Sabaleuski.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Julia Hockenmaier and Mark Steedman.   The currently best single-model statistical parser (Charniak, 1999) achieves Parseval scores of over 89% on the.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C
PRESENTED BY: PEAR A BHUIYAN
Parsing in Multiple Languages
Jun Wu and Sanjeev Khudanpur Center for Language and Speech Processing
Natural Language to SQL(nl2sql)
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Discriminative Syntactic Language Modeling for Speech Recognition Michael Collins, Brian Roark Murat, Saraclar MIT CSAIL, OGI/OHSU, Bogazici University 報告者:郝柏翰 2013/02/26 43rd Annual Meeting of the Association for Computational Linguistics

Outline Introduction Parse Tree Feature Experiments Conclusion 2

Introduction 3 1)SLM in my mind 2)p-value

Introduction Word n-gram models have been extremely successful as language models (LMs) for speech recognition. However, modeling long-span dependency can help the language model better predict words. This paper describes a method for incorporating syntactic features into the language model, for reranking approach, using discriminative parameter estimation techniques. 4

Introduction Our approach differs from previous work in a couple of important respects. 1)through the feature vector representations we can essentially incorporate arbitrary sources of information from the string or parse tree into the model. We would argue that our method allows considerably more flexibility in terms of the choice of features in the model. 2)second contrast between our work and previous work, is in the use of discriminative parameter estimation techniques. 5

Parameter Estimation 6 We are reporting results using just the perceptron algorithm. This has allowed us to explore more of the potential feature space than we would have been able to do using the more costly GCLM estimation techniques.

Parse Tree Feature Figure 2 shows a Penn Treebank style parse tree that is of the sort produced by the parser. 7

Parse Tree Feature Sequences derived from a parse tree 1)POS-tag sequence: we/PRP helped/VBD her/PRP paint/VB the/DT house/NN 2)Shallow parse tag sequence: we/NP b helped/VP b her/NP b paint/VP b the/NP b house/NP c 3)Shallow parse tag plus POS tag: we/PRP-NP b helped/VBD-VP b her/PRP-NP b paint/VB-VP b the/DT-NP b house/NN-NP c 4)Shallow category with lexical head sequence: we/NP helped/VP her/NP paint/VP house/NP 8

P : Parent Node HC : Head Child C : non-head Child 9 + : right - : left 1 : adjacent 2: non-adjacent

Experiments The training set consists of transcribed utterances ( words). The oracle score for the 1000-best lists was 16.7%. 10 n-gram perceptron partition into 28 sets

Experiments 11

Experiments 12 This yielded 35.2% WER, a reduction of 0.3% absolute over what was achieved with just n-grams, which is significant at p < 0.001, reaching a total reduction of 1.2% over the baseline recognizer.

Conclusion The results presented in this paper are a first step in examining the potential utility of syntactic features for discriminative language modeling for speech recognition. The best of which gave a small but significant improvement beyond what was provided by the n-gram features. Future work will include a further investigation of parser- derived features. In addition, we plan to explore the alternative parameter estimation methods 13