Page 1 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Sanjay Patil, Jun-Won Suh Human and Systems Engineering Experimental Study Effect of parameter variation on WER Performance Automatic Speech Recognition System
Page 2 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Details of the system: HMM Speech Recognition System TIDigits Database (41300 utterances, sentences), 11 words – zero to 9, O Cross-word, loop grammar Objective: To study the ASR performance as a function of.. WER = fn ( frame, Window, IP, State-tying) Frame = 5 ms to 50 ms Window = 5 ms to 50 ms IP = -10 to -200 State-tying = {split, merge, occupancy} => total # of tied states Details of the experiment
Page 3 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Test Results for varying Frame-Window Variation on WER
Page 4 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Test Results for varying Frame-Window Variation on Time
Page 5 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Training Schedule
Page 6 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Command line to run the experiment tidigit_decode -model_type xwrd_triphone -train_mode baum_welch -decode_mode loop_grammar options: -model_type : [what type of model you want to build] xwrd_triphone : context-dependent cross-word triphone models -train_mode : [specifies the training algorithm to use] baum_welch : the standard Baum-Welch, forward-backward algorithm -decode_mode : [specifies the type of decoding to perform] loop_grammar : decodes using a grammar where any digit can follow any other digit with equal probability
Page 7 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Language Model Combining Acoustic and Language Models Language Model contribution = P(W) LM IP N(W) LM — language model scale [ we did not observe change in WER] IP — Insertion Penalty – Penalty of inserting a new word. IP is determined empirically to optimize the recognition performance
Page 8 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Test Results for varying Insertion Penalty on WER Same will be true for other combinations of Frame and Window pair. The remaining two are: (Frame, Window) pair (10, 25) and (15, 25)
Page 9 of 10 ASR – effect of five parameters on the WER performance of HMM SR system State-Tying Results Ref. : Naveen’s Thesis. These results are from Naveen’s Thesis
Page 10 of 10 ASR – effect of five parameters on the WER performance of HMM SR system References J.Picone. “Lecture.” [online]. Available: X. Huang, A. Acero, H. Hon, Spoken Language Processing (Prentice Hall, 2001) F. Jelinek, Statistical Methods for Speech Recognition (The MIT Press, 1999) Naveen Parihar and J. Picone, “Aurora Working Group: DSR Front End LVCSR Evaluation – Baseline recognition System Description,” [online] Available: reports/aurora_frontend/2001/report_072101_v7.pdf. Naveen Parihar, “Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation,” MS Thesis, Dec. 2003, [online] J. Zhao, X. Zhang, A. Ganapathiraju, N. Deshmukh, J. Picone, “Tutorial for Decision Tree-Based State Tying for Acoustic Modeling,”, June 1999 [online] S.J.Young, J.J.Odell, P.C.Woodland, “Tree-Based State Tying for high accuracy acoustic modelling,” May [online]. Available:
Page 11 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Questions
Page 12 of 10 ASR – effect of five parameters on the WER performance of HMM SR system State-TyingState-Tying (reference 3)