Experiments in Adaptive Language Modeling Lidia Mangu & Geoffrey Zweig.

Slides:



Advertisements
Similar presentations
Language Modeling.
Advertisements

Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.
The SRI 2006 Spoken Term Detection System Dimitra Vergyri, Andreas Stolcke, Ramana Rao Gadde, Wen Wang Speech Technology & Research Laboratory SRI International,
SRI 2001 SPINE Evaluation System Venkata Ramana Rao Gadde Andreas Stolcke Dimitra Vergyri Jing Zheng Kemal Sonmez Anand Venkataraman.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Adaptation Resources: RS: Unsupervised vs. Supervised RS: Unsupervised.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Investigation on Mandarin Broadcast News Speech Recognition Mei-Yuh Hwang, Xin Lei, Wen Wang*, Takahiro Shinozaki University of Washington, *SRI 9/19/2006,
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Automatic Spelling Correction Probability Models and Algorithms Motivation and Formulation Demonstration of a Prototype Program The Underlying Probability.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
June 14th, 2005Speech Group Lunch Talk Kofi A. Boakye International Computer Science Institute Mixed Signals: Speech Activity Detection and Crosstalk in.
WEB-DATA AUGMENTED LANGUAGE MODEL FOR MANDARIN SPEECH RECOGNITION Tim Ng 1,2, Mari Ostendrof 2, Mei-Yuh Hwang 2, Manhung Siu 1, Ivan Bulyko 2, Xin Lei.
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,
IBM ASR Workshop Paris, France Sept 2000 Towards Superhuman Speech Recognition Mukund Padmanabhan and Michael Picheny Human Language Technologies.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
The 2000 NRL Evaluation for Recognition of Speech in Noisy Environments MITRE / MS State - ISIP Burhan Necioglu Bryan George George Shuttic The MITRE.
1 International Computer Science Institute Data Sampling for Acoustic Model Training Özgür Çetin International Computer Science Institute Andreas Stolcke.
1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.
1IBM T.J. Waston CLSP, The Johns Hopkins University Using Random Forests Language Models in IBM RT-04 CTS Peng Xu 1 and Lidia Mangu 2 1. CLSP, the Johns.
11/24/2006 CLSP, The Johns Hopkins University Random Forests for Language Modeling Peng Xu and Frederick Jelinek IPAM: January 24, 2006.
March 24, 2005EARS STT Workshop1 A Study of Some Factors Impacting SuperARV Language Modeling Wen Wang 1 Andreas Stolcke 1 Mary P. Harper 2 1. Speech Technology.
INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.
1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát.
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.
1 Using TDT Data to Improve BN Acoustic Models Long Nguyen and Bing Xiang STT Workshop Martigny, Switzerland, Sept. 5-6, 2003.
Improving out of vocabulary name resolution The Hanks David Palmer and Mari Ostendorf Computer Speech and Language 19 (2005) Presented by Aasish Pappu,
11 Effects of Explicitly Modeling Noise Words Chia-lin Kao, Owen Kimball, Spyros Matsoukas.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
AQUAINT Herbert Gish and Owen Kimball June 11, 2002 Answer Spotting.
1 Update on WordWave Fisher Transcription Owen Kimball, Chia-lin Kao, Jeff Ma, Rukmini Iyer, Rich Schwartz, John Makhoul.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Recurrent neural network based language model Tom´aˇs Mikolov, Martin Karafia´t, Luka´sˇ Burget, Jan “Honza” Cˇernocky, Sanjeev Khudanpur INTERSPEECH 2010.
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
ISL Meeting Recognition Hagen Soltau, Hua Yu, Florian Metze, Christian Fügen, Yue Pan, Sze-Chen Jou Interactive Systems Laboratories.
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.
Using Neural Network Language Models for LVCSR Holger Schwenk and Jean-Luc Gauvain Presented by Erin Fitzgerald CLSP Reading Group December 10, 2004.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
An Empirical Study on Language Model Adaptation Jianfeng Gao, Hisami Suzuki, Microsoft Research Wei Yuan Shanghai Jiao Tong University Presented by Patty.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
NTNU SPEECH AND MACHINE INTELEGENCE LABORATORY Discriminative pronunciation modeling using the MPE criterion Meixu SONG, Jielin PAN, Qingwei ZHAO, Yonghong.
H ADVANCES IN MANDARIN BROADCAST SPEECH RECOGNITION Overview Goal Build a highly accurate Mandarin speech recognizer for broadcast news (BN) and broadcast.
Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI
The Development of the AMI System for the Transcription of Speech in Meetings Thomas Hain, Lukas Burget, John Dines, Iain McCowan, Giulia Garau, Martin.
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
Jun Wu Department of Computer Science and
Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st )
Jun Wu and Sanjeev Khudanpur Center for Language and Speech Processing
Mining Anchor Text for Query Refinement
Learning Long-Term Temporal Features
Presenter : Jen-Wei Kuo
STATE-OF-THE-ART SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Presentation transcript:

Experiments in Adaptive Language Modeling Lidia Mangu & Geoffrey Zweig

Motivation Multi-domain recognition IBM Superhuman Recognition Program –Switchboard / Fisher –Voic –Call Center –ICSI Meetings One-size LM may not fit all –Even a gigantic LM

Lots of Past Work Kneser & Steinbiss ’93 –On The Dynamic Adaptation of Stochastic Language Modeling” –Tune mixing weights to suit particular text Chen, Gauvain, Lamel, Adda & Adda ’01 –“Language Model Adaptation for Broadcast News Transcription” –Build and add new LMs from relevant training data Florian & Yarowsky ’99 – Hierarchical LMs Gao, Li & Lee ’00 – Upweight training counts whose frequency is similar to that in test Seymore & Rosenfeld ’97- Interpolate Topic LMs Bacchiani & Roark ’03 – MAP adaptation for voic Many others.

Plan of Attack No adaptation: The Superhuman LM –8-way LM from multiple domains Baseline adaptation: Adjust interpolation weights per conversation Extended adaptation: build new LM from relevant training data

Description of Atomic LMs SWB + CallHome –3.4M words, 1.4M 3-gms Broadcast News –148M words, 38M 3-gms Financial Call Ceneters –655K words, 303K 3-gms UW Web data (conversational-like) –192M words, 48M 3-gms SWB Cellular –244K words, 134K 3-gms UW Web data (meeting-like) –28M words, 12M 3-gms UW Newsgroup data –102M words, 34M 3-gms Voic –1.1M words, 551K 3-gms

Description of Lattice-Building Models & Process Generate lattices with bigram LM –Word-internal acoustic context –3.6K acoustic units; 142K gaussians –PLP + VTLN + FMLLR + MMI LM rescoring w/ 8-way interpolated LM Acoustic rescoring w/ cross-word AM –Cross-word AM –10K acoustic units; 589K gaussians –PLP + VTLN + FMLLR + ML Adapt on scripts of the last step –Adjust interpolation weights to minimize perplexity on decoded scripts

Baseline Adaptation Results UnadaptedSupervisedUnsupervised Meetings Call-center Call-center Swb ‘ Voic Ave. Benefit-1.1%-0.9%

Results on RT03 UnadaptedSupervisedUnsupervised Fisher SWB All Benefit-0.2%

Conclusions Simple adaptation effective for a multi- domain system –Contrasts some previous results on BN Not very sensitive to initial decoding errors Dynamic LM construction to be explored