Brief Overview of Different Versions of Sphinx Arthur Chan.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

USA AREA CODES APPLICATION by Koffi Eddy Ihou May 6,2011 Florida Institute of Technology 1.
CALO Decoder Progress Report for March Arthur (Decoder and ICSI Training) Jahanzeb (Decoder) Ziad (ICSI Training) Moss (ICSI Training) Carnegie Mellon.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
ETRW Modelling Pronunciation variation for ASR ESCA Tutorial & Research Workshop Modelling pronunciation variation for ASR INTRODUCING MULTIPLE PRONUNCIATIONS.
Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.
Progress of Sphinx 3.X From X=5 to X=6 Arthur Chan Evandro Gouvea David J. Huggins-Daines Alex I. Rudnicky Mosur Ravishankar Yitao Sun.
CALO Recorder/Decoder Progress Report for Summer 2004 (July and August) Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder)
3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006.
2 nd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jun 7, 2005.
Speed-up Facilities in s3.3 GMM Computation Seach Frame-Level Senone-Level Gaussian-Level Component-Level Not implemented SVQ-based GMM Selection Sub-vector.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
From Main() to the search routine in Sphinx 3 (s3accurate) Arthur Chan July 8, 2004.
Programming Languages Structure
Progress of Sphinx 3.X, From X=4 to X=5 By Arthur Chan Evandro Gouvea Yitao Sun David Huggins-Daines Jahanzeb Sherwani.
Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh Welcome.
Python Jordan Miller and Lauren Winkleman CS 311 Fall 2011.
Technical Aspects of the CALO Recorder By Satanjeev Banerjee Thomas Quisel Jason Cohen Arthur Chan Yitao Sun David Huggins-Daines Alex Rudnicky.
Sphinx 3.4 Development Progress Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 4, 2004.
CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University.
Sphinx 3.4 Development Progress Report in February Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 1, 2004.
A Technical Game Project 4 Due dates: Game Idea Friday, March 16 th Game Plan Friday, March 23 rd Web Page Sunday, April 9 th First Playable Wednesday,
15-Jul-04 FSG Implementation in Sphinx2 FSG Implementation in Sphinx2 Mosur Ravishankar Jul 15, 2004.
CODING Research Data Management. Research Data Management Coding When writing software or analytical code it is important that others and your future.
This chapter is extracted from Sommerville’s slides. Text book chapter
Introduction to Automatic Speech Recognition
P51UST: Unix and Software Tools Unix and Software Tools (P51UST) Compilers, Interpreters and Debuggers Ruibin Bai (Room AB326) Division of Computer Science.
Linux Operations and Administration
Temple University Speech Recognition using Sphinx 4 (Ti Digits test) Jaykrishna shukla,Amir Harati,Mubin Amehed,& cara Santin Department of Electrical.
1 International Computer Science Institute Data Sampling for Acoustic Model Training Özgür Çetin International Computer Science Institute Andreas Stolcke.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
CMU Shpinx Speech Recognition Engine Reporter : Chun-Feng Liao NCCU Dept. of Computer Sceince Intelligent Media Lab.
Efficient Minimal Perfect Hash Language Models David Guthrie, Mark Hepple, Wei Liu University of Sheffield.
INFO 637Lecture #51 Software Engineering Process II Defining Requirements INFO 637 Glenn Booker.
1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát.
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.
Copyright © 2015 – Curt Hill Version Control Systems Why use? What systems? What functions?
From Quality Control to Quality Assurance…and Beyond Alan Page Microsoft.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
CMU Robust Vocabulary-Independent Speech Recognition System Hsiao-Wuen Hon and Kai-Fu Lee ICASSP 1991 Presenter: Fang-Hui CHU.
A Short Course on Geant4 Simulation Toolkit How to learn more?
The Use of Context in Large Vocabulary Speech Recognition Julian James Odell March 1995 Dissertation submitted to the University of Cambridge for the degree.
HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Author :Neeraj Deshmukh, Aravind Ganapathiraju and Joseph Picone.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
TB1: Data analysis Antonio Bulgheroni on behalf of the TB24 team.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
The Use of Virtual Hypothesis Copies in Decoding of Large-Vocabulary Continuous Speech Frank Seide IEEE Transactions on Speech and Audio Processing 2005.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Game tree search Chapter 6 (6.1 to 6.3 and 6.6) cover games. 6.6 covers state of the art game players in particular. 6.5 covers games that involve uncertainty.
APT Configuration Management May 25th, 2004 APT Configuration Management Jesse Doggett.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
T EST T OOLS U NIT VI This unit contains the overview of the test tools. Also prerequisites for applying these tools, tools selection and implementation.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Geant4 Training 2003 A Short Course on Geant4 Simulation Toolkit How to learn more? The full set of lecture notes of this Geant4.
CIS-NG CASREP Information System Next Generation Shawn Baugh Amy Ramirez Amy Lee Alex Sanin Sam Avanessians.
Part 1 The Basics of Information Systems. Purpose of Information Systems Information systems ◦ Collects, stores and organizes information ◦ Retrieves.
Maite Barroso – WP4 Workshop – 10/12/ n° 1 -WP4 Workshop- Developers’ Guide Maite Barroso 10/12/2002
Development Environment
Tools for Natural Language Processing Applications
Juicer: A weighted finite-state transducer speech decoder
Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st )
CALO Decoder Progress Report for April/May
Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems
Progress Report of Sphinx in Q (Sep 1st to Dec 30th)
Sphinx Recognizer Progress Q2 2004
LECTURE 15: REESTIMATION, EM AND MIXTURES
Presentation transcript:

Brief Overview of Different Versions of Sphinx Arthur Chan

Introduction Software aspect of the recognizer is very important Research always require correct use of the software. Sphinx II + III + IV + SphinxTrain ~= 100 k lines of code Each of them are fairly complex

This presentation (30 pages) Introduction (3 pages) History of Sphinx (13 pages) Sphinx I (2 pages) Sphinx II (2 pages) Sphinx III (3 pages) SphinxTrain (3 pages) Sphinx IV (3 pages) How do I get the source code? (4 pages) Versioning Three rules of not getting lost in different recognizers Where can I get “official” information? (2 pages) Outlook in each recognizer. (3 pages) Conclusion

Brief history of Sphinx Largely adapted from Rita’s “The Sphinx Speech Recognition Systems” Kevin et al’s “Speech Recognition: Past, Present and Future” final.html final.html

Before Sphinx Dragon One of the first use of HMM in speech recognition One of the first use of “purely statistically model” in speech Express the knowledge using HMM network Harpy One of the first use of beam search Use phoneme to represent words.

Sphinx I Before Sphinx …... From AT&T’s literature, the concept of speaker- independence was proposed in 1979 In , most systems are either, Speaker dependent Speaker independent but in a very small domain (<100 words) Sphinx I is therefore outstanding Accuracy is 90% on Resource Management

Sphinx I (1987) By Kai-Fu Lee and Roberto Bisiani Key developer included Hsiao-wuen Hon, Fil Alleva Written in C. Continuous speech recognizer using discrete HMM with 3 codebooks of size 256. Using simple word-pair grammar Generalize triphones Real-time on Sun3 or Dec 3000 Where is the source code? Good antique!

Sphinx II (1992) By Xuedong Huang Hardwired to 5-state Bakis topology 3-gram language models Decision-tree tying of HMM (by Mei-Yuh Huang) 90% in WSJ task (0 or 1?)

Fast Beam Search v. X FBS-6 flat lexicon decoder FBS-7 lexicon tree-based. FBS-8 decoder (written by Ravi Mosur, see thesis in 96) Support multiple types of beam pruning. Lexical tree Tricks in GMM Computation Machine optimization: loop unrolling Predictive Codebook computation Phoneme lookahead Best path search.

Other facts about Sphinx II We license it at the beginning (seem to back till days like 95) In 2000, it starts to be open-sourced in Sourceforge under Berkeley’s style license You could incorporate Sphinx’s source code You don’t need to open your source code. (No recursive legal binding) Similar to LGPL In 2001, a major alpha release by Kevin that ensures portability in several platforms.

Sphinx III flat lexicon decoder (“s3”,“s3flat”,”s3slow”) Sphinx III (by Ravi Mosur) Flat Lexicon Support both CHMM and SCHMM “Poor-man” trigram Use only the most likely first word, this avoid D^2 expansion of the word lattice. Arbitrary topology Very accurate, used in evaluation of BN and others. Derivative from the search include N-best generator Aligner Phone recognizer

Sphinx III tree lexicon decoder (“s3.x”,”s3fast”,”s3inaccurate”) What is s3.x actually? A “spin-off” of the Sphinx III flat lexicon’s source code First use was in BN 10x RT evaluation in 1999 From s3.0 -> s3.2 Use tree-lexicon with unigram lookahead Lexical tree with approximation to avoid memory problem One of the first in the world used Sub-vector quantization in speed-up GMM computation

(cont.) From s3.2 -> s3.3 (Rita, Ricky) Live mode recognizer (livedecode) and simulator (livepretend) From s3.3 -> s3.4 (Evandro, Arthur C, Jahanzeb,) 4-level of speed-up of GMM computation, phoneme lookahead Bug fixes in live mode From s3.4 -> s3.5 (Evandro, Arthur C, Yitao) (Tentative) Speaker adaptation + documentation

Facts about S3 A Java version exists -> sphin3j Open source at ~2002 Always being maintained by Evandro from 2001 to now. s3.5 is the current active branch in S3 development.

SphinxTrain Equally important and very complex But not well understood. What is SphinxTrain? A collection of ~40 tools for Sphinx 2, 3 and 4 acoustic model training A set of perl scripts to do training Sphinx 2 and 3 all have slight different formats of models

Mini-history Baum Welch trainer and Viterbi trainer existed very long time ago. Training tool in general was not systematic and was no structured. From the chaos, Eric Thayer first pull everything together to create the package SphinxTrain Rita did numerous bug fixes and modification of the current trainer Innovate the use of automatic question generation. (make_quest) Built a set of training scripts for RM (the 0*/ scripts) Write the first set of systematic tutorial on training Ricky refined the code and wrote the first set of perl script for Training. He made a PHD out of it too. (PHD = Push Here Dummy!) Alan and Kevin Put the set of code to sourceforge Alan build a set of training script that can “run-through”

Sphinx IV Why Sphinx IV? Too many limitations in SphinxTrain and Sphinx III Only N-gram Approximation of triphones Fast GMM computation could be very troublesome to understood Bw doesn’t skip silence. We heavily rely on force alignement in training.

Sphinx IV (cont.) (By no mean complete……) Lead Design : Bhiksha (MERL) Lead Team Developer : Willer Walker (Sun) Key developers : Evandro, Rita, Phillip Kwok and Paul Lamere Many heavy weight speech advisors: Evandro, Rita, Ravi, Bhiksha, Medro Moreno ……

Is Sphinx IV good? Very accurate, very fast, very versatile and very nicely-pakcaged Java-based speech recognizer Some internal benchmark in RM and WSJ 5k is shown to be faster and more accurate than s3.3 (under 1xRT and 10% better) Support N-gram, FSM and FSG. Will provide facilities like confidence-scoring Still under development (just have first alpha release) Trainer is not stable

Summary of the recognizers and trainers Sphinx I -> obsolete Sphinx II -> we are using the fast recognizer now Sphinx III, the following coexists S3 flat S3 fast (s3.4 stable, s3.5 devel) SphinxTrain (0.92 in the CVS) Sphinx IV Recognizer is alpha released Trainer not yet stable

How can I get version X of Sphinx? Official Web page of Sphinx Give announcement and news of development Some documentation is there. For the tarballs Releases: sphinx2-0.4.tgz (s2) sphinx3-0.1.tgz (s3.3) sphinx3-0.4-rc2.tgz (s3.4 release candidate II) sphinx4-0.1alpha-src.zip (s4)

Rule 2: If it doesn’t exist in CVS, officially it doesn’t exist Simply speaking, no one actually support and maintain them. Software fall into this category: CMU LM Toolkit (we haven’t touched it for a while) We may do it in the future. Phoenix (Distributed somewhere else) Training scripts in csh Rita always actively support it.

Rule 1: If they were no tarballs, they are in CVS ANYONE can get the following modules through CVS by using the following commands: cvs –z3 – co modulname modulename = SphinxTrain -> SphinxTrain archive_s3 -> s3 + s3.0 + s3.2 + s3.3 sphinx2 -> devel ver. of sphinx2 sphinx3 =~ s3.4 -> we will check base on this to develop s3.5 share =~ cepview + lm3g2dmp sphinx3j = the java version of sphinx3 Sphinx4 = development version of sphinx4

Rule 3: You may need other modules to complete your task SphinxTrain heavily rely on force alignment so you also need s3-align Usage of any s3 recognizers required the LM in DMP format so you need the tool lm3g2dmp which can be found in sphinx2 or share.

Where can I get more information for the recognizer? People to ask s2 : Evandro, Ravi S3 flat : Evandro, Ravi, ArthurC S3 tree: Evandro, Ravi, ArthurC SphinxTrain: Rita, Evandro, Ravi, ArthurC, Rong, Ziad, Murali. S4 : S4’s developers in Sourceforge Willie, Paul, Phillip, Bhiksha, Rita, Evandro.

Web page to look up Rita’s web page Contains the manual of training Twiki web page for sphinx 4 design bin/cmusphinx/twiki/view/Sphinx4/WebHome/ bin/cmusphinx/twiki/view/Sphinx4/WebHome/ ArthurC’s web page Risk his life to write a manual for Sphinx 3.4 Also collect some information for each Sphinx

Outlook of all recognizers Sphinx II Sorry, we won’t support it too much. Reason, s3.4 and s4 are proved to have very nice speed and accuracy performance Sphinx III Only active branch is s3.5 Moderate change in s3flat Motivated by project CALO This quarter : make adaptation works. SphinxTrain Write a set of scripts for Continuous HMM training Silence deletion problem will be fixed.

(cont.) sphinxDoc Chapter 1 and 2 completed (*sigh*, still 7 left) Only begin written when Arthur C is procrastinating and don’t want to read and play video game. Will be there at around Sep or Oct. Sphinx IV Alpha release Trainer will be fixed Argus Incorporate the advantages of many speech recognizers together Not yet started.

Conclusion This presentation Summarize the current code status of Sphinx and SphinxTrain. We still have a lot of work to do…… Next presentation s3 or s3.4 from main to the search.