Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.

Similar presentations


Presentation on theme: "Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008."— Presentation transcript:

1 Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008

2 Framework Introduction CMU Sphinx  Developed by Carnegie Mellon University.  has been supported by programs such as DARPA, IBM, Sun Microsystems  Some notable applications that use Sphinx include: Roomline, a conference room reservation system at CMU Let’s Go, a spoken dialog system in use at Pittsburgh’s transit system. HTK  originally developed in 1989 by the Speech Vision and Robotics Group of Cambridge University  HTK was purchased by Entropic Laboratories in 1993 and then again by Microsoft during its acquisition of Entropic in 1999.  The HTK source code was then licensed back to Cambridge University for advances in development. Open source since then

3 Phase 1: Performance-Based Areas of Comparison Training and Decoding using the AN4 Corpus Same procedure used in Homework #5 Provides following metrics  Decoder time to completion  Decoder accuracy on the sentence level.  Decoder accuracy on the word level.  Types and quantities of decoding errors encountered during the decoding process.  Notable trends of errors  Memory requirements for recognizer at runtime

4 Phase 2: Other Notable Areas of Comparison Coded data feature format support Language Modeling support Overall ease of training and decoding corpora Notable features of the Software Baseline for each toolkit Operating System support Available documentation and community support Licensing and usage rights Future Toolkit development plans

5 Training and Testing Procedure for AN4 in HTK

6 Training Procedure Developed In a “tutorial” format: HTKTrainingDecoding_tutorial.doc An example of a full-result developed tutorial directory is also included on the CD htktut

7 Training Results Comparison 8 Gaussians per HMM state Context-dependant Tri-phone state models Tied states Finite State Grammar Language Model MetricSphinx3HTK Peak Memory Usage (MB)8.25.9 Time to Completion (sec)6393 Sentence Error Rate (%)59.269.0 Word Error Rate (%)21.39.0 Word Substitution Errors92 Word Insertion Errors71154 Word Deletion Errors20

8 Front-End Data Feature Support Sphinx provides wave2feat for limited conversion to MFCC (used in a previous homework). However, “Sphinx trainer and decoder are compatible with man other data formats”  Need more research into which specifically HTK Provides HCopy to do many different conversions:

9 Language Modeling Both frameworks use N-Gram Statistical Grammar models as well as Fixed, context-free grammars (defined by BNF-type networks). HTK includes two separate modules HLMLib and HLMTools to provide N-Gram Language Model training, class-based models, and perplexity calculations.  NOTE: HTK Book also includes a thorough tutorial building and training such a model using phrases from Sherlock Holmes Sphinx relies on other tools for LM Generation. (Reference CMU Statistical Language Model toolkit).

10 Notable Software Baseline Characteristics Sphinx  Organized across three components  Huge amount of Code  Uses Unix-style directory organization  Source files averaged 1200 LOC  Includes automated tests. HTK  All in one distribution  Organized into HTKLib, HTKTools, and HLMLib, HLMTools  Average LOC: 1400  Only one level of dependency between *Tools and *Lib

11 Documentation HTK has an excellent wealth of information available through the HTKBook. 1. The first part of the book gives enough background theory to equip relatively unversed individuals with enough knowledge to understand the mechanics of the toolkit. 2. Section two of the book provides extensive details about the core architecture of HTK through the major phases of model training and testing. 3. Section three provides an in-depth look into the language modeling features that HTK provides as a part of its framework. 4. Section four provides a detailed reference to each application that is provided with the framework. No comparably detailed information exists for Sphinx. (Does have automatically maintained Doxygen and JavaDoc, however).

12 Licensing (IMPORTANT!) MAJOR Difference in the restrictions. HTK – “The Licensed Software either in whole or in part can not be distributed or sub- licensed to any third party in any form.” Makes the application of HTK a very important question when deciding. Sphinx Licensed by CMU, may be re- distributed.

13 Recent Release Activity and Future Plans Sphinx  Last release of a major Sphinx component (Sphinx3) was in 06/2007.  PocketSphinx, embedded decoder  Sphinx-4, pure Java implementation. HTK  Last release of HTK3 in 12/2006  Lack of public announces.

14 Comparison Matrix Developed to summarize results across many areas of comparison comparison_matrix.xls

15 References Main HTK Website -- http://htk.eng.cam.ac.uk/http://htk.eng.cam.ac.uk/ Sourceforge Sphinx -- http://cmusphinx.sourceforge.net/html/cmusphinx.php http://cmusphinx.sourceforge.net/html/cmusphinx.php Brief Sphinx/HTK Comparison -- http://lima.lti.cs.cmu.edu/moinmoin/SphinxHTK http://lima.lti.cs.cmu.edu/moinmoin/SphinxHTK HTKBook -- http://htk.eng.cam.ac.uk/prot-docs/htk_book.shtmlhttp://htk.eng.cam.ac.uk/prot-docs/htk_book.shtml ASR System Review -- http://www.cis.hut.fi/Opinnot/T- 61.6040/pellom-2004/lecture-09.pdfhttp://www.cis.hut.fi/Opinnot/T- 61.6040/pellom-2004/lecture-09.pdf Arthur Chan Sphinx Presentation -- http://www.cs.cmu.edu/~archan/sphinxPresentation.html http://www.cs.cmu.edu/~archan/sphinxPresentation.html Sphinx-3 Decoder Wiki -- http://cmusphinx.sourceforge.net/sphinx3/doc/s3_description.htm l#lm_dumpfile http://cmusphinx.sourceforge.net/sphinx3/doc/s3_description.htm l#lm_dumpfile


Download ppt "Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008."

Similar presentations


Ads by Google