Download presentation
Presentation is loading. Please wait.
Published byNorman Blake Modified over 9 years ago
1
Recurrent neural network based language model Tom´aˇs Mikolov, Martin Karafia´t, Luka´sˇ Burget, Jan “Honza” Cˇernocky, Sanjeev Khudanpur INTERSPEECH 2010 報告者:郝柏翰 2012/06/29
2
Outline Introduction Model description WSJ experiments NIST RT05 experiments Conclusion and future work 2
3
Model description 3
4
4
5
WSJ experiments it is very time consuming to train RNN LM on large data, it takes several weeks to train the most complex models. we report results for combined models - linear interpolation with weight 0.75 for RNN LM and 0.25 for backoff LM is used in all these experiments. To correctly rescore n-best lists with backoff models that are trained on subset of data used by recognizer, we use open vocabulary language models. To improve results, outputs from various RNN LMs with different architectures can be linearly interpolated. 5
6
WSJ experiments Actually, by mixing static and dynamic RNN LMs with larger learning rate used when processing testing data ( = 0:3), the best perplexity result was 112. 6
7
WSJ experiments We have also tried to combine RNN models and discriminatively trained LMs, with no significant improvement. We can conclude that RNN based models can reduce WER by around 12%, relatively, compared to backoff model trained on 5x more data. 7
8
NIST RT05 experiments The amount of training data was 115 hours of meeting speech from ICSI, NIST, ISL and AMI training corpora. RT05 –Four gram LM used in AMI system was trained on various data sources. Total amount of LM training data was more than 1.3G words. RT09 –The RT09 LM was extended by additional CHIL and web data. Next change was in lowering cut-offs, e.g. the minimum count for 4-grams was set to 3 instead of 4. 8
9
NIST RT05 experiments To train the RNN LM, we selected in domain data that consists of meeting transcriptions and Switchboard corpus, for a total of 5.4M words. RNN training was too time consuming with more data. This means that RNNs are trained on tiny subset of the data that are used to construct the RT05 and RT09 LMs. 9
10
Conclusion and future work In WSJ experiments, word error rate reduction is around 18% for models trained on the same amount of data, and 12% when backoff model is trained on 5 times more data than RNN model. For NIST RT05, we can conclude that models trained on just 5.4M words of in-domain data can outperform big backoff models, which are trained on hundreds times more data. Obtained results are breaking myth that language modeling is just about counting n-grams, and that the only reasonable way how to improve results is by acquiring new training data. 10
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.