Using Neural Network Language Models for LVCSR Holger Schwenk and Jean-Luc Gauvain Presented by Erin Fitzgerald CLSP Reading Group December 10, 2004.

Slides:



Advertisements
Similar presentations
Neural networks Introduction Fitting neural networks
Advertisements

Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Neural network architectures and learning algorithms Author : Bogdan M. Wilamowski Source : IEEE INDUSTRIAL ELECTRONICS MAGAZINE Date : 2011/11/22 Presenter.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.
Measuring the Influence of Long Range Dependencies with Neural Network Language Models Le Hai Son, Alexandre Allauzen, Franc¸ois Yvon Univ. Paris-Sud and.
Automatic Speech Recognition II  Hidden Markov Models  Neural Network.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
歡迎 IBM Watson 研究員 詹益毅 博士 蒞臨 國立台灣師範大學. Hai-Son Le, Ilya Oparin, Alexandre Allauzen, Jean-Luc Gauvain, Franc¸ois Yvon ICASSP 2011 許曜麒 Structured Output.
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Artificial Neural Networks
1IBM T.J. Waston CLSP, The Johns Hopkins University Using Random Forests Language Models in IBM RT-04 CTS Peng Xu 1 and Lidia Mangu 2 1. CLSP, the Johns.
11/24/2006 CLSP, The Johns Hopkins University Random Forests for Language Modeling Peng Xu and Frederick Jelinek IPAM: January 24, 2006.
March 24, 2005EARS STT Workshop1 A Study of Some Factors Impacting SuperARV Language Modeling Wen Wang 1 Andreas Stolcke 1 Mary P. Harper 2 1. Speech Technology.
Large Language Models in Machine Translation Conference on Empirical Methods in Natural Language Processing 2007 報告者:郝柏翰 2013/06/04 Thorsten Brants, Ashok.
Artificial Neural Network Theory and Application Ashish Venugopal Sriram Gollapalli Ulas Bardak.
Deep Learning Neural Network with Memory (1)
Neural Networks Chapter 6 Joost N. Kok Universiteit Leiden.
Appendix B: An Example of Back-propagation algorithm
NEURAL NETWORKS FOR DATA MINING
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
Efficient Language Model Look-ahead Probabilities Generation Using Lower Order LM Look-ahead Information Langzhou Chen and K. K. Chin Toshiba Research.
Multi-Layer Perceptron
Recurrent neural network based language model Tom´aˇs Mikolov, Martin Karafia´t, Luka´sˇ Burget, Jan “Honza” Cˇernocky, Sanjeev Khudanpur INTERSPEECH 2010.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
An Artificial Neural Network Approach to Surface Waviness Prediction in Surface Finishing Process by Chi Ngo ECE/ME 539 Class Project.
PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Computer Architecture Lecture 26 Past and Future Ralph Grishman November 2015 NYU.
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Chapter 6 Neural Network.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
Evolutionary Computation Evolving Neural Network Topologies.
Authors: F. Zamora-Martínez, V. Frinken, S. España-Boquera, M.J. Castro-Bleda, A. Fischer, H. Bunke Source: Pattern Recognition, Volume 47, Issue 4, April.
Neural Networks: An Introduction and Overview
Machine Learning Supervised Learning Classification and Regression
Ananya Das Christman CS311 Fall 2016
CSE 473 Introduction to Artificial Intelligence Neural Networks
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
A First Look at Music Composition using LSTM Recurrent Neural Networks
Speech Generation Using a Neural Network
CSE 573 Introduction to Artificial Intelligence Neural Networks
Word Embedding Word2Vec.
Deep Neural Networks for Onboard Intelligence
Ch4: Backpropagation (BP)
Artificial Neural Networks
Artificial Neural Networks
Temporal Back-Propagation Algorithm
Structure of a typical back-propagated multilayered perceptron used in this study. Structure of a typical back-propagated multilayered perceptron used.
COSC 4335: Part2: Other Classification Techniques
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
Neural Networks: An Introduction and Overview
Ch4: Backpropagation (BP)
Deep Neural Network Language Models
Artificial Neural Networks / Spring 2002
Presentation transcript:

Using Neural Network Language Models for LVCSR Holger Schwenk and Jean-Luc Gauvain Presented by Erin Fitzgerald CLSP Reading Group December 10, 2004

Using Neural Network LMs for LVCSR2 Introduction  Build and use neural networks to estimate LM posterior probabilities for ASR tasks  Idea:  Project word indices onto continuous space  Resulting smooth prob fns of word representations generalize better to unknown ngrams  Still an n-gram approach, but posteriors interpolated for any poss. context; no backing off  Result: significant WER reduction with small computational costs

December 10, 2004Using Neural Network LMs for LVCSR3 Architecture Standard fully connected multilayer perceptron hjhj ckck djdj oioi Input projection layer hidden layer output layer w j-n+1 w j-n+2 w j-1 p i = P(w j =i| h j ) p N = P(w j =N| h j ) p 1 = P(w j =1| h j ) N N N = 51k P =50 H ≈1k N M V bk

December 10, 2004Using Neural Network LMs for LVCSR4 ckck Architecture djdj oioi P H N M V bk d = tanh(M*c+b) p i = P(w j =i| h j ) p N = P(w j =N| h j ) o = tanh(V*d+k)

December 10, 2004Using Neural Network LMs for LVCSR5 Training  Train with std back propagation algorithm  Error fn: cross entropy  Weight decay regularization used  Targets set to 1 for w j and to 0 otherwise  These outputs shown to cvg to posterior probs  Back-prop through projection layer  NN learns best projection of words onto continuous space for prob estimation task

Optimizations

December 10, 2004Using Neural Network LMs for LVCSR7 Fast Recognition Techniques 1) Lattice Rescoring 2) Shortlists 3) Regrouping 4) Block mode 5) CPU optimization

December 10, 2004Using Neural Network LMs for LVCSR8 Fast Recognition Techniques 1) Lattice Rescoring Decode with std backoff LM to build latticesDecode with std backoff LM to build lattices 2) Shortlists 3) Regrouping 4) Block mode 5) CPU optimization

December 10, 2004Using Neural Network LMs for LVCSR9 Fast Recognition Techniques 1) Lattice Rescoring 2) Shortlists NN only predicts high freq subset of vocabNN only predicts high freq subset of vocab 3) Regrouping 4) Block mode 5) CPU optimization Redistributes probability mass of shortlist words

December 10, 2004Using Neural Network LMs for LVCSR10 ckck Shortlist optimization djdj oioi P H N M V b k p i = P(w j =i| h j ) p S = P(w j =S| h j )

December 10, 2004Using Neural Network LMs for LVCSR11 Fast Recognition Techniques 1) Lattice Rescoring 2) Shortlists 3) Regrouping – Optimization of #1 Collect and sort LM prob requestsCollect and sort LM prob requests All prob requests with same h t : only one fwd pass necessaryAll prob requests with same h t : only one fwd pass necessary 4) Block mode 5) CPU optimization

December 10, 2004Using Neural Network LMs for LVCSR12 Fast Recognition Techniques 1) Lattice Rescoring 2) Shortlists 3) Regrouping 4) Block mode Several examples propagated through NN at onceSeveral examples propagated through NN at once Takes advantage of faster matrix operationsTakes advantage of faster matrix operations 5) CPU optimization

December 10, 2004Using Neural Network LMs for LVCSR13 ckck Block mode calculations djdj oioi P H N M V bk d = tanh(M*c+b) o = tanh(V*d+k)

December 10, 2004Using Neural Network LMs for LVCSR14 C Block mode calculations D O M V bk D = tanh(M*C+B) O = (V*D+K)

December 10, 2004Using Neural Network LMs for LVCSR15 Fast Recognition – Test Results Techniques 1) Lattice Rescoring – ave 511 nodes 2) Shortlists (2000)– 90% prediction coverage 3.8M 4gms req’d, 3.4M processed by NN3.8M 4gms req’d, 3.4M processed by NN 3) Regrouping – only 1M fwd passes req’d 4) Block mode – bunch size=128 5) CPU optimization Total processing < 9min (0.03xRT) Without optimizations, 10x slower

December 10, 2004Using Neural Network LMs for LVCSR16 Fast Training Techniques 1) Parallel implementations Full connections req low latency; very costlyFull connections req low latency; very costly 2) Resampling techniques Optimum floating pt operations best with continuous memory locationsOptimum floating pt operations best with continuous memory locations

December 10, 2004Using Neural Network LMs for LVCSR17 Fast Training Techniques 1) Floating point precision – 1.5x faster 2) Suppress internal calcs – 1.3x faster 3) Bunch mode – 10+x faster Fwd + back propagation for many examples at onceFwd + back propagation for many examples at once 4) Multiprocessing – 1.5x faster 47 hours  1h27m with bunch size 128

Application to CTS and BN LVCSR

December 10, 2004Using Neural Network LMs for LVCSR19 Application to ASR  Neural net LM techniques focus on CTS bc  Far less in-domain training data  data sparsity  NN can only handle sm amount of training data  New Fisher CTS data – 20M words (vs 7M)  BN data: 500M words

December 10, 2004Using Neural Network LMs for LVCSR20 Application to CTS  Baseline: Train standard backoff LMs for each domain and then interpolate  Expt #1: Interpolate CTS neural net with in-domain back-off LM  Expt #2: Interpolate CTS neural net with full data back-off LM

December 10, 2004Using Neural Network LMs for LVCSR21 Application to CTS - PPL  Baseline: Train standard backoff LMs for each domain and then interpolate   In-domain PPL: 50.1 Full data PPL: 47.5  Expt #1: Interpolate CTS neural net with in-domain back-off LM   In-domain PPL: 45.5  Expt #2: Interpolate CTS neural net with full data back-off LM   Full data PPL: 44.2

December 10, 2004Using Neural Network LMs for LVCSR22 Application to CTS - WER  Baseline: Train standard backoff LMs for each domain and then interpolate   In-domain WER: 19.9 Full data WER: 19.3  Expt #1: Interpolate CTS neural net with in-domain back-off LM   In-domain WER: 19.1  Expt #2: Interpolate CTS neural net with full data back-off LM   Full data WER: 18.8

December 10, 2004Using Neural Network LMs for LVCSR23 Application to BN  Only subset of 500M available words could be used for training – 27M train set  Still useful:  NN LM gave 12% PPL gain over backoff on small 27M set  NN LM gave 4% PPL gain over backoff on full 500M word training set  Overall WER reduction of 0.3% absolute

December 10, 2004Using Neural Network LMs for LVCSR24 Conclusion  Neural net LM provide significant improvements in PPL and WER  Optimizations can speed NN training by 20x and lattice rescoring in less than 0.05xRT  While NN LM was developed for and works best with CTS, gains found in BN task too