Presentation is loading. Please wait.

Presentation is loading. Please wait.

1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát.

Similar presentations


Presentation on theme: "1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát."— Presentation transcript:

1 1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát

2 2M4 speech recognition The Recogniser Front end n -best lattice generation Best first decoding (Ducoder) Trigram language model (SRILM) Word internal triphone models MLLR adaptation (HTK) Cross word triphone models Recognition output Lattice rescoring Time synchronous decoding (HTK) MLLR adaptation (HTK) Recognition output

3 3M4 speech recognition System limitations N-best list rescoring not optimal Adaptation must be performed on two sets of acoustic models Many more hyper-parameters to tune manually SRILM is not efficient on very large language models (greater than 10e+9 words)

4 4M4 speech recognition Advances since last meeting Models trained on two databases –SWITCHBOARD recogniser Acoustic & language models trained on 200 hours of speech –ICSI meetings recogniser Acoustic models trained on 40 hours of speech Language model is a combination of SWB and ICSI Improvements mainly affect the Switchboard models 16kHz sampling rate used throughout

5 5M4 speech recognition Advances since last meeting Adaptation of word internal context dependent models Unified the phone sets and pronunciation dictionaries –Improved the pronunciation dictionary for Switchboard –Now using the ICSI dictionary with missing pronunciations imported from the ISIP dictionary Better handling of multiple pronunciations during acoustic model training General bug fixes

6 6M4 speech recognition Results overview SWB trnICSI trn SWB trn ICSI adpt ICSI trn ICSI adpt ICSI trn M4 adpt SWB trn M4 adpt SWB trn ICSI adpt M4 adpt SWB 55.05 45.41 ICSI52.3653.9949.27 M4 73.47 * 79.17 † 84.67 * 81.27 † % word error rates * Results from lapel mics † Results from beam former

7 7M4 speech recognition Results: adaptation vs. direct training on ICSI ICSI trained SWB trained ICSI adapted Monophone models *73.3778.89 Context dependent word internal models * 66.0870.59 Lattice rescoring (none or spkr independent adaptation) 52.3453.99 Lattice rescoring (speaker adaptation) 49.2751.18 % word error rates * Results from Ducoder using all pruning

8 8M4 speech recognition Acoustic model adaptation issue Acoustic models are presently not very adaptive –Better MLLR code required (next slide) –More training data required Need to make better use of the combined ICSI/SWB training data for M4.

9 9M4 speech recognition Other news The next version of HTK’s adaptation code will be made available to M4 before the official public release. Sheffield to acquire HTK LVCSR decoder –Licensing issues to be resolved –May be able to make binaries available to M4 partners


Download ppt "1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát."

Similar presentations


Ads by Google