Sphinx Recognizer Progress Q2 2004

Slides:



Advertisements
Similar presentations
Speech Recognition with Hidden Markov Models Winter 2011
Advertisements

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
CALO Decoder Progress Report for March Arthur (Decoder and ICSI Training) Jahanzeb (Decoder) Ziad (ICSI Training) Moss (ICSI Training) Carnegie Mellon.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Histogram-based Quantization for Distributed / Robust Speech Recognition Chia-yu Wan, Lin-shan Lee College of EECS, National Taiwan University, R. O. C.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
PERFORMANCE ANALYSIS OF AURORA LARGE VOCABULARY BASELINE SYSTEM Naveen Parihar, and Joseph Picone Center for Advanced Vehicular Systems Mississippi State.
Brief Overview of Different Versions of Sphinx Arthur Chan.
Progress of Sphinx 3.X From X=5 to X=6 Arthur Chan Evandro Gouvea David J. Huggins-Daines Alex I. Rudnicky Mosur Ravishankar Yitao Sun.
CALO Recorder/Decoder Progress Report for Summer 2004 (July and August) Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder)
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Speed-up Facilities in s3.3 GMM Computation Seach Frame-Level Senone-Level Gaussian-Level Component-Level Not implemented SVQ-based GMM Selection Sub-vector.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
From Main() to the search routine in Sphinx 3 (s3accurate) Arthur Chan July 8, 2004.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Progress of Sphinx 3.X, From X=4 to X=5 By Arthur Chan Evandro Gouvea Yitao Sun David Huggins-Daines Jahanzeb Sherwani.
Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh Welcome.
Optimal Adaptation for Statistical Classifiers Xiao Li.
ENEE408G Capstone Design Project: Multimedia Signal Processing Group 1 By : William “Chris” Paul Louis Lo Jang-Hyun Ko Ronald McLaren Final Project : V-LOCK.
Technical Aspects of the CALO Recorder By Satanjeev Banerjee Thomas Quisel Jason Cohen Arthur Chan Yitao Sun David Huggins-Daines Alex Rudnicky.
1 USING CLASS WEIGHTING IN INTER-CLASS MLLR Sam-Joo Doh and Richard M. Stern Department of Electrical and Computer Engineering and School of Computer Science.
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
Sphinx 3.4 Development Progress Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 4, 2004.
CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University.
Sphinx 3.4 Development Progress Report in February Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 1, 2004.
May 30th, 2006Speech Group Lunch Talk Features for Improved Speech Activity Detection for Recognition of Multiparty Meetings Kofi A. Boakye International.
Adaptation Techniques in Automatic Speech Recognition Tor André Myrvoll Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
Temple University Speech Recognition using Sphinx 4 (Ti Digits test) Jaykrishna shukla,Amir Harati,Mubin Amehed,& cara Santin Department of Electrical.
1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Compensating speaker-to-microphone playback system for robust speech recognition So-Young Jeong and Soo-Young Lee Brain Science Research Center and Department.
Voice Recognition All Talk No Walk.

Algoval: Evaluation Server Past, Present and Future Simon Lucas Computer Science Dept Essex University 25 January, 2002.
1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát.
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.
Viking Raid Animation KS2: Use sequence in programs.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.
Temple University Training Acoustic model using Sphinx Train Jaykrishna shukla,Mubin Amehed& cara Santin Department of Electrical and Computer Engineering.
Temple University Training Acoustic Models Using SphinxTrain Jaykrishna Shukla, Mubin Amehed, and Cara Santin Department of Electrical and Computer Engineering.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
Mixture of Gaussians This is a probability distribution for random variables or N-D vectors such as… –intensity of an object in a gray scale image –color.
ISL Meeting Recognition Hagen Soltau, Hua Yu, Florian Metze, Christian Fügen, Yue Pan, Sze-Chen Jou Interactive Systems Laboratories.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.
PHASE-BASED DUAL-MICROPHONE SPEECH ENHANCEMENT USING A PRIOR SPEECH MODEL Guangji Shi, M.A.Sc. Ph.D. Candidate University of Toronto Research Supervisor:
Speaker Verification Using Adapted GMM Presented by CWJ 2000/8/16.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,
#SummitNow Yes, I'm able to index audio files within Alfresco 2013 Fernando González @fegorama.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Lecture 1 Page 1 CS 111 Summer 2013 Important OS Properties For real operating systems built and used by real people Differs depending on who you are talking.
Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Many-core Software Development Platforms
Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st )
CALO Decoder Progress Report for April/May
LTI Student Research Symposium 2004 Antoine Raux
CRANDEM: Conditional Random Fields for ASR
Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems
Progress Report of Sphinx in Q (Sep 1st to Dec 30th)
Automatic Speech Recognition: Conditional Random Fields for ASR
The Application of Hidden Markov Models in Speech Recognition
Presentation transcript:

Sphinx Recognizer Progress Q2 2004

Speed Combination of fast GMM computation techniques with various types of pruning 0.48xRT in 2k task (Communicator), 0.6xRT in 5k task (WSJ) Phoneme Look-ahead research completed. 15-20% gain when fast GMM computation techniques and pruning was applied. Detail can be found in paper in ICSLP 2004 (Chan et.al) Compilation Optimization (~8% gain).

Accuracy S3.4 is much better than S2 In Communicator task S2: 17% WERR ~0.3xRT S3.4: (32 mix, not tuned for speed) 14% WERR 1.1xRT S3.4: (64 mix, not tuned for speed) 12% WERR 1.6xRT Continuous HMM performs much better than Semi-Continuous HMM. ~30% improvement

Interface Sphinx 3.4 Release Candidate II is distributed since Jun 10. http://cmusphinx.sourceforge.net/downloads/sphinx3-0.4-rc2.tgz New feature list: Fast Gaussian Mixture Model (GMM) computation; the following techniques are supported: a. Down-sampling. b. CI-based GMM selection. c. VQ-based and SVQ-based Gaussian selection. d. Arbitrary configuration of Sub-vectors in sub-vector quantization. Fast Match based on phoneme look-ahead. Class-based LMs and dynamic LM selection. Command-line configuration of many front-end parameters and the feature type. Bug fixes to make "make test" work in wider range of platforms. Bug fixes in live mode recognition. Instructions for compilation using Intel compilers, if desired. Better compilation support in Windows. More documentation. Batch mode recognizer supported on Windows/Linux/FreeBSD/MacOS/Solaris.

Outlook of Sphinx in this year (Sphinx 3.5) Fully support live mode recognition API (now still testing, not in s3.4’s distribution) ETA (End of July) Fully support speaker adaptation techniques such as MLLR and VTLN. ETA (Mid of August) Sphinx trainer (SphinxTrain) Better packaging and improvement on the training algorithm ETA (Beginning of July) SphinxDoc A document on using Sphinx to build speech recognition software ETA (End of October)

Meeting Corpus Transcription and Training ICSI Transcription Conversion Completed Processing XML As well as human transciber’s error CHMM Training Completed We chose a very difficult situation to decode Transcriber meeting. Many speakers and many cross talk. Gives us a sense of the worst case performance. 53% WERR