Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.

Slides:



Advertisements
Similar presentations
Zhijie Yan, Qiang Huo and Jian Xu Microsoft Research Asia
Advertisements

Neural networks Introduction Fitting neural networks
Measuring the Influence of Long Range Dependencies with Neural Network Language Models Le Hai Son, Alexandre Allauzen, Franc¸ois Yvon Univ. Paris-Sud and.
歡迎 IBM Watson 研究員 詹益毅 博士 蒞臨 國立台灣師範大學. Hai-Son Le, Ilya Oparin, Alexandre Allauzen, Jean-Luc Gauvain, Franc¸ois Yvon ICASSP 2011 許曜麒 Structured Output.
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Advances in WP2 Nancy Meeting – 6-7 July
Speaker Adaptation for Vowel Classification
VESTEL database realistic telephone speech corpus:  PRNOK5TR: 5810 utterances in the training set  PERFDV: 2502 utterances in testing set 1 (vocabulary.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
A Survey of ICASSP 2013 Language Model Department of Computer Science & Information Engineering National Taiwan Normal University 報告者:郝柏翰 2013/06/19.
6. Experimental Analysis Visible Boltzmann machine with higher-order potentials: Conditional random field (CRF): Exponential random graph model (ERGM):
Functional Brain Signal Processing: EEG & fMRI Lesson 8 Kaushik Majumdar Indian Statistical Institute Bangalore Center M.Tech.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 23 Nov 2, 2005 Nanjing University of Science & Technology.
CONTENTS:  Introduction  What is neural network?  Models of neural networks  Applications  Phases in the neural network  Perceptron  Model of fire.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Survey of ICASSP 2013 section: feature for robust automatic speech recognition Repoter: Yi-Ting Wang 2013/06/19.
1 RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, TX 75265
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Well Log Data Inversion Using Radial Basis Function Network Kou-Yuan Huang, Li-Sheng Weng Department of Computer Science National Chiao Tung University.
Multiple parallel hidden layers and other improvements to recurrent neural network language modeling ICASSP 2013 Diamantino Caseiro, Andrej Ljolje AT&T.
Algoritmi e Programmazione Avanzata
Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.
Efficient Language Model Look-ahead Probabilities Generation Using Lower Order LM Look-ahead Information Langzhou Chen and K. K. Chin Toshiba Research.
EE459 Neural Networks Examples of using Neural Networks Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.
Akram Bitar and Larry Manevitz Department of Computer Science
Institute of Intelligent Power Electronics – IPE Page1 A Dynamical Fuzzy System with Linguistic Information Feedback Xiao-Zhi Gao and Seppo J. Ovaska Institute.
Recurrent neural network based language model Tom´aˇs Mikolov, Martin Karafia´t, Luka´sˇ Burget, Jan “Honza” Cˇernocky, Sanjeev Khudanpur INTERSPEECH 2010.
Design a personalized e-learning system based on item response theory and artificial neural network approach Ahmad Baylari, Gh.A. Montazer IT Engineering.
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Artificial Intelligence, Expert Systems, and Neural Networks Group 10 Cameron Kinard Leaundre Zeno Heath Carley Megan Wiedmaier.
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
Reservoir Uncertainty Assessment Using Machine Learning Techniques Authors: Jincong He Department of Energy Resources Engineering AbstractIntroduction.
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
Using Neural Network Language Models for LVCSR Holger Schwenk and Jean-Luc Gauvain Presented by Erin Fitzgerald CLSP Reading Group December 10, 2004.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
Mutual Information Brian Dils I590 – ALife/AI
Nonlinear balanced model residualization via neural networks Juergen Hahn.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
Olivier Siohan David Rybach
Convolutional Sequence to Sequence Learning
Deep Learning Amin Sobhani.
Compact Bilinear Pooling
Building Adaptive Basis Function with Continuous Self-Organizing Map
2 Research Department, iFLYTEK Co. LTD.
Intelligent Information System Lab
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Efficient Estimation of Word Representation in Vector Space
Prof. Carolina Ruiz Department of Computer Science
Neural Networks Advantages Criticism
Cheng-Kuan Wei1 , Cheng-Tao Chung1 , Hung-Yi Lee2 and Lin-Shan Lee2
GSPT-AS-based Neural Network Design
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Deep Neural Network Language Models
Akram Bitar and Larry Manevitz Department of Computer Science
Prof. Carolina Ruiz Department of Computer Science
Presentation transcript:

Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department of Computer Science & Information Engineering National Taiwan Normal University

2 Outline Introduction Neural Network Language Model Neural Network LM Adaptation Experiments Conclusion

Introduction To reduce computational cost, existing forms of NNLMs only model the probabilities of a small and more frequent subset of the whole vocabulary, commonly referred to as the shortlist. First, NNLM parameters are trained only using the statistics of in- shortlist words thus introduces an undue bias to them. This may poorly represent the properties of complete word sequences found in the training data as the n-gram sequences have “gaps” in them. Secondly, as there is no explicit modeling of probabilities of out-of- shortlist (OOS) words in the output layer, statistics associated with them will also be discarded in network optimisation. 3

Introduction Due to the previously mentioned data sparsity issue, directly adapting n-gram probabilities is impractical on limited amounts of data. In this paper an NNLM adaptation scheme by cascading an additional layer between the projection and hidden layer is proposed. This scheme provides a direct adaptation of NNLMs via a non-linear, discriminative transformation to a new domain. 4

Neural Network Language Model 5

6 (1) (2)

Neural Network Language Model 7 (3) (4)

Neural Network LM Adaptation 8

The precise location to introduce the adaptation layer is determined by two factors. First, as very limited amounts of data, for example, only a few hundred words per audio snippet, are available, a compact structure of the adaptation layer should be used. The number of input and output layer nodes are often very large, for example, tens of thousands of words, for the Arabic speech recognition task considered in this paper. In contrast, much fewer nodes, in the range of a few hundreds, are often used for projection and hidden layers. 9

Neural Network LM Adaptation Second, non-linear activation functions are used in hidden and output layers of standard NNLMs. It is also preferable to retain the same discriminative power and non-linearity during NNLM adaptation. Due to these reasons, the proposed adaptation layer is cascaded between projection and hidden layers. It acts as a linear input transformation to the hidden layer of a task independent NNLM. Using this form of network architecture NNLM probabilities can be adapted to the target domain via a non-linear and discriminative mapping function. During adaptation, only the part of network connecting the adaptation layer and projection layer are updated (shown as dashed lines in Figure 2), while other parameters (shown as solid lines in Figure 2) are fixed. 10

Experiments 11

Experiments 12

Experiments 13

Conclusion This paper investigated an improved NNLM architecture and a NNLM adaptation method using a cascaded network. Consistent WER reductions obtained on a state-of-the-art LVCSR task suggest the improved network architecture and proposed adaptation scheme are useful. Future research will focus on more complex forms of modeling of OOS words in NNLMs and improving robustness for NNLM adaptation. 14