Deep Neural Network Language Models

Slides:



Advertisements
Similar presentations
Zhijie Yan, Qiang Huo and Jian Xu Microsoft Research Asia
Advertisements

Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Advances in WP2 Torino Meeting – 9-10 March
Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.
Measuring the Influence of Long Range Dependencies with Neural Network Language Models Le Hai Son, Alexandre Allauzen, Franc¸ois Yvon Univ. Paris-Sud and.
歡迎 IBM Watson 研究員 詹益毅 博士 蒞臨 國立台灣師範大學. Hai-Son Le, Ilya Oparin, Alexandre Allauzen, Jean-Luc Gauvain, Franc¸ois Yvon ICASSP 2011 許曜麒 Structured Output.
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Advances in WP2 Nancy Meeting – 6-7 July
Radial Basis Function Networks 표현아 Computer Science, KAIST.
Distributed Representations of Sentences and Documents
MLP Exercise (2006) Become familiar with the Neural Network Toolbox in Matlab Construct a single hidden layer, feed forward network with sigmoidal units.
Stanford CS224S Spring 2014 CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2014 Lecture 16: Acoustic Modeling.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Multiple parallel hidden layers and other improvements to recurrent neural network language modeling ICASSP 2013 Diamantino Caseiro, Andrej Ljolje AT&T.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.
Recurrent neural network based language model Tom´aˇs Mikolov, Martin Karafia´t, Luka´sˇ Burget, Jan “Honza” Cˇernocky, Sanjeev Khudanpur INTERSPEECH 2010.
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
Tom Ko and Brian Mak The Hong Kong University of Science and Technology.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.
Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Using Neural Network Language Models for LVCSR Holger Schwenk and Jean-Luc Gauvain Presented by Erin Fitzgerald CLSP Reading Group December 10, 2004.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning Ronan Collobert Jason Weston Presented by Jie Peng.
Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.
NTNU Speech and Machine Intelligence Laboratory 1 Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models 2016/05/31.
Survey on state-of-the-art approaches: Neural Network Trends in Speech Recognition Survey on state-of-the-art approaches: Neural Network Trends in Speech.
1 Deep Recurrent Neural Networks for Acoustic Modelling 2015/06/01 Ming-Han Yang William ChanIan Lane.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
Olivier Siohan David Rybach
Big data classification using neural network
Deep Residual Learning for Image Recognition
Deep Neural Network for Robust Speech Recognition With Auxiliary Features From Laser-Doppler Vibrometer Sensor Zhipeng Xie1,2,Jun Du1, Ian McLoughlin3.
Learning Deep Generative Models by Ruslan Salakhutdinov
End-To-End Memory Networks
Deep Feedforward Networks
Exploiting the distance and the occurrence of words for language modeling Chong Tze Yuang.
Diagram of Neural Network Forward propagation and Backpropagation
Building Adaptive Basis Function with Continuous Self-Organizing Map
2 Research Department, iFLYTEK Co. LTD.
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
Conditional Random Fields for ASR
Intro to NLP and Deep Learning
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Efficient Estimation of Word Representation in Vector Space
Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules
Neural Language Model CS246 Junghoo “John” Cho.
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Distributed Learning of Multilingual DNN Feature Extractors using GPUs
Introduction to Neural Networks
A Markov Random Field Model for Term Dependencies
Vessel Extraction in X-Ray Angiograms Using Deep Learning
Deep Learning Hierarchical Representations for Image Steganalysis
Neuro-Computing Lecture 4 Radial Basis Function Network
Automatic Speech Recognition: Conditional Random Fields for ASR
Word Embedding Word2Vec.
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
Outline Background Motivation Proposed Model Experimental Results
Yi Zhao1, Yanyan Shen*1, Yanmin Zhu1, Junjie Yao2
Sequence Student-Teacher Training of Deep Neural Networks
Cheng-Kuan Wei1 , Cheng-Tao Chung1 , Hung-Yi Lee2 and Lin-Shan Lee2
Word2Vec.
A unified extension of lstm to deep network
Neural Machine Translation using CNN
Learning and Memorization
DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS Mirac Goksu Ozturk1, Okan Ulusoy1, Cenk.
Presentation transcript:

Deep Neural Network Language Models Ebru Arısoy, Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran 2012 NAACL 報告者:郝柏翰 2012/07/24

Outline Introduction Neural Network Language Models Experimental Set-up Experimental Results Conclusion and Future Work

Introduction Most NNLMs are trained with one hidden layer. Deep neural networks (DNNs) with more hidden layers have been shown to capture higher-level discriminative information about input features. Motivated by the success of DNNs in acoustic modeling, we explore deep neural network language models (DNN LMs) in this paper. Results on a Wall Street Journal (WSJ) task demonstrate that DNN LMs offer improvements over a single hidden layer NNLM.

Neural Network Language Models 𝑑 𝑗 =tanh⁡ 𝑙=1 𝑛−1 ×𝑃 𝑀 𝑗𝑙 𝐶 𝑙 + 𝑏 𝑗 ∀𝑗=1,…,𝐻 𝑜 𝑖 = 𝑗=1 𝐻 𝑉 𝑖𝑗 𝑑 𝑗 + 𝑘 𝑖 ∀𝑖=1,…,𝑁 𝑝 𝑖 = exp⁡( 𝑜 𝑖 ) 𝑟=1 𝑁 exp⁡( 𝑜 𝑟 ) =𝑃( 𝑤 𝑗 =𝑖| ℎ 𝑗 ) Use shortlist to reduce complexity.

The difference between DNNLM and RNNLM RNNLM does not have projection layer DNNLM has 𝑁×𝑃 parameters in look-up table and a weight matrix containing (𝑛−1)×𝑃×𝐻 parameters. RNNLM has a weight matrix containing (𝑁+𝐻)×𝐻 parameters. RNNLM uses the full vocabulary at output layer RNNLM uses 20K words. DNNLN uses 10K words. Note that the additional hidden layer will introduce extra parameters

Experimental Results Corpus 1993 WSJ 900K sentences (23.5M words) We also experimented with different number of dimensions for the features, namely 30, 60 and 120.

Experimental Results Our best result on the test set is obtained with a 3-layer DNN LM with 500 hidden units and 120 dimensional features. However, we need to compare deep networks with shallow networks with the same number of parameters in order to conclude that DNN LM is better than NNLM.

Experimental Results We trained different NNLM architectures with varying projection and hidden layer dimensions. All of these models have roughly the same number of parameters (8M) as our best DNN LM model, 3-layer DNN LM with 500 hidden units and 120 dimensional features.

Experimental Results We also tested the performance of NNLM and DNN LM with 500 hidden units and 120-dimensional features after linearly interpolating with the 4-gram baseline language model. After linear interpolation with the 4-gram baseline language model, both the perplexity and WER improve for NNLM and DNN LM

Experimental Results One problem with deep neural networks, especially those with more than 2 or 3 hidden layers, is that training can easily get stuck in local minima, resulting in poor solutions. Therefore, it may be important to apply pre-training instead of randomly initializing the weights.

Conclusions Our preliminary experiments on WSJ data showed that deeper networks can also be useful for language modeling. We also compared shallow networks with deep networks with the same number of parameters. One important observation in our experiments is that perplexity and WER improvements are more pronounced with the increased projection layer dimension in NNLM than the increased number of hidden layers in DNN LM. We also investigated discriminative pre-training for DNN LMs, however, we do not see consistent gains.