Page 1 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Sanjay Patil, Jun-Won Suh Human and Systems Engineering Experimental.

Slides:



Advertisements
Similar presentations
Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Advertisements

Building an ASR using HTK CS4706
Speech Recognition with Hidden Markov Models Winter 2011
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
ECE 8443 – Pattern Recognition Objectives: Acoustic Modeling Language Modeling Feature Extraction Search Pronunciation Modeling Resources: J.P.: Speech.
PERFORMANCE ANALYSIS OF AURORA LARGE VOCABULARY BASELINE SYSTEM Naveen Parihar, and Joseph Picone Center for Advanced Vehicular Systems Mississippi State.
HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR.
Motivation Traditional approach to speech and speaker recognition:
2004/11/161 A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition LAWRENCE R. RABINER, FELLOW, IEEE Presented by: Chi-Chun.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
The Acoustic/Lexical model: Exploring the phonetic units; Triphones/Senones in action. Ofer M. Shir Speech Recognition Seminar, 15/10/2003 Leiden Institute.
Advances in WP2 Nancy Meeting – 6-7 July
Application of HMMs: Speech recognition “Noisy channel” model of speech.
LORIA Irina Illina Dominique Fohr Chania Meeting May 9-10, 2007.
專題研究 WEEK3 LANGUAGE MODEL AND DECODING Prof. Lin-Shan Lee TA. Hung-Tsung Lu.
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
The 2000 NRL Evaluation for Recognition of Speech in Noisy Environments MITRE / MS State - ISIP Burhan Necioglu Bryan George George Shuttic The MITRE.
1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.
Sridhar Raghavan Dept. of Electrical and Computer Engineering Mississippi State University URL:
An Analysis of the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical.
Hybrid Systems for Continuous Speech Recognition Issac Alphonso Institute for Signal and Information Processing Mississippi State.
Speech and Language Processing
CMU Shpinx Speech Recognition Engine Reporter : Chun-Feng Liao NCCU Dept. of Computer Sceince Intelligent Media Lab.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation Author: Naveen Parihar Inst. for Signal and Info. Processing Dept.
LINEAR DYNAMIC MODEL FOR CONTINUOUS SPEECH RECOGNITION Ph.D. Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing.
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
IMPROVING RECOGNITION PERFORMANCE IN NOISY ENVIRONMENTS Joseph Picone 1 Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Algoritmi e Programmazione Avanzata
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
S. Raghavan,H. P. Greeley, J. Berg, E. Friets, J. PiconeJ. P. Wilson Dept. of ECECreare, Inc., Mississippi State UniversityNew Hampshire Presented by:
HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Author :Neeraj Deshmukh, Aravind Ganapathiraju and Joseph Picone.
8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
SPEECH RECOGNITION Presented to Dr. V. Kepuska Presented by Lisa & Za ECE 5526.
Network Training for Continuous Speech Recognition Author: Issac John Alphonso Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng.
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering ISIP_VERIFY, ISIP_DECODER_DEMO,
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
1 Speech Processing. 2 Speech Processing: Text:  Spoken language processing Huang, Acero, Hon, Prentice Hall, 2000  Discrete time processing of speech.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires an impractical amount of data. Solution: Create clusters and.
Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info.
PHASE-BASED DUAL-MICROPHONE SPEECH ENHANCEMENT USING A PRIOR SPEECH MODEL Guangji Shi, M.A.Sc. Ph.D. Candidate University of Toronto Research Supervisor:
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.
An Analysis of the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical.
Sridhar Raghavan and Joseph Picone URL:
Olivier Siohan David Rybach
Automatic Speech Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Course Projects Speech Recognition Spring 1386
專題研究 week3 Language Model and Decoding
8.0 Search Algorithms for Speech Recognition
Automatic Speech Recognition
Automatic Speech Recognition: Conditional Random Fields for ASR
Lecture 10: Speech Recognition (II) October 28, 2004 Dan Jurafsky
Presentation transcript:

Page 1 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Sanjay Patil, Jun-Won Suh Human and Systems Engineering Experimental Study Effect of parameter variation on WER Performance Automatic Speech Recognition System

Page 2 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Details of the system: HMM Speech Recognition System TIDigits Database (41300 utterances, sentences), 11 words – zero to 9, O Cross-word, loop grammar Objective: To study the ASR performance as a function of.. WER = fn ( frame, Window, IP, State-tying) Frame = 5 ms to 50 ms Window = 5 ms to 50 ms IP = -10 to -200 State-tying = {split, merge, occupancy} => total # of tied states Details of the experiment

Page 3 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Test Results for varying Frame-Window Variation on WER

Page 4 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Test Results for varying Frame-Window Variation on Time

Page 5 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Training Schedule

Page 6 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Command line to run the experiment tidigit_decode -model_type xwrd_triphone -train_mode baum_welch -decode_mode loop_grammar options: -model_type : [what type of model you want to build] xwrd_triphone : context-dependent cross-word triphone models -train_mode : [specifies the training algorithm to use] baum_welch : the standard Baum-Welch, forward-backward algorithm -decode_mode : [specifies the type of decoding to perform] loop_grammar : decodes using a grammar where any digit can follow any other digit with equal probability

Page 7 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Language Model Combining Acoustic and Language Models Language Model contribution = P(W) LM IP N(W) LM — language model scale [ we did not observe change in WER] IP — Insertion Penalty – Penalty of inserting a new word. IP is determined empirically to optimize the recognition performance

Page 8 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Test Results for varying Insertion Penalty on WER Same will be true for other combinations of Frame and Window pair. The remaining two are: (Frame, Window) pair (10, 25) and (15, 25)

Page 9 of 10 ASR – effect of five parameters on the WER performance of HMM SR system State-Tying Results Ref. : Naveen’s Thesis. These results are from Naveen’s Thesis

Page 10 of 10 ASR – effect of five parameters on the WER performance of HMM SR system References J.Picone. “Lecture.” [online]. Available: X. Huang, A. Acero, H. Hon, Spoken Language Processing (Prentice Hall, 2001) F. Jelinek, Statistical Methods for Speech Recognition (The MIT Press, 1999) Naveen Parihar and J. Picone, “Aurora Working Group: DSR Front End LVCSR Evaluation – Baseline recognition System Description,” [online] Available: reports/aurora_frontend/2001/report_072101_v7.pdf. Naveen Parihar, “Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation,” MS Thesis, Dec. 2003, [online] J. Zhao, X. Zhang, A. Ganapathiraju, N. Deshmukh, J. Picone, “Tutorial for Decision Tree-Based State Tying for Acoustic Modeling,”, June 1999 [online] S.J.Young, J.J.Odell, P.C.Woodland, “Tree-Based State Tying for high accuracy acoustic modelling,” May [online]. Available:

Page 11 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Questions

Page 12 of 10 ASR – effect of five parameters on the WER performance of HMM SR system State-TyingState-Tying (reference 3)