Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems http://sourceforge.net/projects/cmusphinx/

Slides:



Advertisements
Similar presentations
LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0)
Advertisements

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Sphinx-3 to 3.2 Mosur Ravishankar School of Computer Science, CMU Nov 19, 1999.
Speech Recognition with Hidden Markov Models Winter 2011
Combining Heterogeneous Sensors with Standard Microphones for Noise Robust Recognition Horacio Franco 1, Martin Graciarena 12 Kemal Sonmez 1, Harry Bratt.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Brief Overview of Different Versions of Sphinx Arthur Chan.
Progress of Sphinx 3.X From X=5 to X=6 Arthur Chan Evandro Gouvea David J. Huggins-Daines Alex I. Rudnicky Mosur Ravishankar Yitao Sun.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006.
2 nd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jun 7, 2005.
Speed-up Facilities in s3.3 GMM Computation Seach Frame-Level Senone-Level Gaussian-Level Component-Level Not implemented SVQ-based GMM Selection Sub-vector.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
From Main() to the search routine in Sphinx 3 (s3accurate) Arthur Chan July 8, 2004.
Conditional Random Fields
Progress of Sphinx 3.X, From X=4 to X=5 By Arthur Chan Evandro Gouvea Yitao Sun David Huggins-Daines Jahanzeb Sherwani.
Technical Aspects of the CALO Recorder By Satanjeev Banerjee Thomas Quisel Jason Cohen Arthur Chan Yitao Sun David Huggins-Daines Alex Rudnicky.
1 USING CLASS WEIGHTING IN INTER-CLASS MLLR Sam-Joo Doh and Richard M. Stern Department of Electrical and Computer Engineering and School of Computer Science.
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,
Sphinx 3.4 Development Progress Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 4, 2004.
CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University.
Sphinx 3.4 Development Progress Report in February Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 1, 2004.
1 Less is More? Yi Wu Advisor: Alex Rudnicky. 2 People: There is no data like more data!
Computer Science: A Structured Programming Approach Using C1 3-7 Sample Programs This section contains several programs that you should study for programming.
Face Detection using the Viola-Jones Method
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.
1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.
Intel® Labs – Microprocessor & Programming Research Tao Ma Michael Deisher Mississippi State University Intel Corporation March 17, 2010 ICASSP 2010, Dallas.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CMU Shpinx Speech Recognition Engine Reporter : Chun-Feng Liao NCCU Dept. of Computer Sceince Intelligent Media Lab.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
November 1, 2005IEEE MMSP 2005, Shanghai, China1 Adaptive Multi-Frame-Rate Scheme for Distributed Speech Recognition Based on a Half Frame-Rate Front-End.
1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát.
Temple University Training Acoustic Models Using SphinxTrain Jaykrishna Shukla, Mubin Amehed, and Cara Santin Department of Electrical and Computer Engineering.
Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
ISL Meeting Recognition Hagen Soltau, Hua Yu, Florian Metze, Christian Fügen, Yue Pan, Sze-Chen Jou Interactive Systems Laboratories.
The Use of Virtual Hypothesis Copies in Decoding of Large-Vocabulary Continuous Speech Frank Seide IEEE Transactions on Speech and Audio Processing 2005.
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
ALPHABET RECOGNITION USING SPHINX-4 BY TUSHAR PATEL.
Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Tight Coupling between ASR and MT in Speech-to-Speech Translation Arthur Chan Prepared for Advanced Machine Translation Seminar.
Bayes Risk Minimization using Metric Loss Functions R. Schlüter, T. Scharrenbach, V. Steinbiss, H. Ney Present by Fang-Hui, Chu.
Septian Adi Wijaya – Informatics Brawijaya University
Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition Pengrui Wang, Jie Li, Bo Xu Interactive Digital.
Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI
Juicer: A weighted finite-state transducer speech decoder
Statistical Models for Automatic Speech Recognition
Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st )
CALO Decoder Progress Report for April/May
Statistical Models for Automatic Speech Recognition
Progress Report of Sphinx in Q (Sep 1st to Dec 30th)
Automatic Speech Recognition: Conditional Random Fields for ASR
Sphinx Recognizer Progress Q2 2004
Foundation of Video Coding Part II: Scalar and Vector Quantization
LECTURE 15: REESTIMATION, EM AND MIXTURES
Paper Reading Dalong Du April.08, 2011.
Chair Professor Chin-Chen Chang Feng Chia University
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Research Institute for Future Media Computing
Speaker Identification:
A word graph algorithm for large vocabulary continuous speech recognition Stefan Ortmanns, Hermann Ney, Xavier Aubert Bang-Xuan Huang Department of Computer.
Presenter : Jen-Wei Kuo
Presentation transcript:

Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems http://sourceforge.net/projects/cmusphinx/ Arthur Chan, Jahanzeb Sherwani, Ravishankar Mosur and Alex Rudnicky Computer Science Department, Carnegie Mellon University Sphinx 3.X (X=4) Sphinx: -speaker-independent large vocabulary speech recognition system -open source under Berkeley-style license: one can distribute, modify and use it freely Sphinx 3.X: Reengineering of Sphinx 3 to create a real-time large vocabulary speech recognizer -S3.3: Tree lexicon, Histogram Pruning and Live-mode decoder (R. Mosur 1999) -S3.4 (released Jul 04): Fast GMM Computation , Phoneme look-ahead. (A. Chan 2004, this paper.) -S3.5 (will soon release): MLLR-based Speaker Adaptation , live-mode APIs, alignment, phoneme recognition, lattice rescoring, best path search in lattice. 4-Level of GMM Computation GMMs Frames Gaussians Feature Component Gaussian-level -VQ-based Gaussian Selection (Bochierri 93) -SVQ-base Gaussian Selection (Mosur 99) Feature-level -Sub vector quantization or SDCHMM method (Mosur 97 Mak 97) -LDA, PCA Frame-Level: -Discount alternative frames -Down Sampling (Wycesna 95) GMM-Level: -Only compute important GMM. (e.g. w high CI score) -CI-GMM Selection (Lee 01) Sphinx 3.4: Fast GMM Computation Experiment Results Our approach: -Divide GMM computation in 4 levels -Implement representative techniques in each level -Inspired by 4-level state tying (Sagayama 95) Observation: In each level, full computation can be approximated by computing only parts of the components. Advantages: Provide a general framework of understanding fast GMM computation Experiment Summary: 1, CI-based GMM Selection seems to most effective. 2, Many Gaussian-level techniques seems to have too much overhead. Algorithm WER Total GMM Srch Ovhd BL 18.65 6.9 5.85 0.85 - Down Sampling 19.10 4.35 3.99 0.96 CIGMMS 18.82 3.25 1.18 2.06 Gaussian Selection 18.95 3.95 2.84 0.89 0.22 SVQ 18.69 4.20 2.04 0.98 1.08 Note: Results combined with pruning can be found in the paper.