Download presentation
Presentation is loading. Please wait.
1
Mengyuan Zhao CSLT, RIIT, Tsinghua University 2016-01-06
Summary of 2015 Mengyuan Zhao CSLT, RIIT, Tsinghua University
2
Contents Research Project Other DAE/CDAE
Speaker adapted ASR / Language vector in ASR Dark knowledge Project Bi-lingual AM Tag-LM Phone number / Car number recognition LSTM AM (based on nnet1) Parallel AM training (based on nnet3) Other Server purchase/maintain
3
Research
4
Research DAE/CDAE Paper published. (APSIPA 2015 best paper award)
5
Research Speaker adapted ASR Language vector in ASR Dark knowledge
WER 6.42% 6.11% (Aurora4) Language vector in ASR WER CN 20.83% 20.26% WER EN 57.50% 47.17% Dark knowledge Knowledge transfer become standard process in AM training (20% improvement in car number recognition);
6
Project
7
Project Bi-lingual AM Tag-lm Chinese, OK; English, OK;
Chinese + few English words, OK; Tag-lm Deal with new words out of LM word list, for example, actor names, movie names.
8
Project Phone number / Car number recognition Phone number recognition
SER < 10% Car number recognition SER 50% 20% (better than Baidu)
9
Project LSTM AM (based on nnet1) LSTM has strong learning ability;
LSTM perform so good with small LM; LSTM is easy to diverge and over-fitting; Not so good with big LM; Training is so slow (7×slower than DNN).
10
Project Parallel AM training (based on nnet3)
Speed up 4-5 times (with 8 GPUs); Lots of investigations and experiments Network type (DNN, TDNN, LSTM, CTC); Activation function (RELU, sigmoid, P-norm); Network structure (TDNN, 7*2048, RELU); Big-lm decoder; Discriminative training (MPE); Best model WER: 16k: 9.04% 8.45%; 8k: 14.61% 12.66%
11
Other
12
Other Purchase / Install Maintain Computer: 2 servers, 3 PCs;
GPU: 2*GTX970; Hard disk: /work3; n*cpu fans; Maintain Psychology lecture.
13
Thanks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.