Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.

Slides:



Advertisements
Similar presentations
Tutorial 8: Developing an Excel Application
Advertisements

10 Software Engineering Foundations of Computer Science ã Cengage Learning.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Linguist Module in Sphinx-4 By Sonthi Dusitpirom.
Some general principles in computer security Tomasz Bilski Chair of Control, Robotics and Computer Science Poznań University.
CIM2564 Introduction to Development Frameworks 1 Overview of a Development Framework Topic 1.
February 21, 2008 Center for Hybrid and Embedded Software Systems Driving Application: 4D Tele-immersion Future Work Though.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
Statistical based IDS background introduction. Statistical IDS background Why do we do this project Attack introduction IDS architecture Data description.
Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh Welcome.
Sphinx 3.4 Development Progress Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 4, 2004.
CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University.
Software Self-Testing
High-Level Assessment Month Year
Doxygen: Source Code Documentation Generator John Tully.
XML, DITA and Content Repurposing By France Baril.
Introduction to Automatic Speech Recognition
Speech Recognition Final Project Resources
12 Building and Maintaining Information Systems.
UML - Development Process 1 Software Development Process Using UML (2)
Free Mini Course: Applying SysML with MagicDraw
Introduction to Java Kumar Harshit. Objectives ( 목적지 ) At the end of the lesson, the student should be able to: ● Describe the features of Java technology.
Computers & Employment By Andrew Attard and Stephen Calleja.
Temple University Speech Recognition using Sphinx 4 (Ti Digits test) Jaykrishna shukla,Amir Harati,Mubin Amehed,& cara Santin Department of Electrical.
Capstone Experience at UNH Manchester Student Guided Mentoring for an Undergraduate Research Group in Speech Capstone Objectives Challenges Technology.
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
Terrier: TERabyte RetRIevER An Introduction By: Kavita Ganesan (Last Updated April 21 st 2009)
Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.
CMU Shpinx Speech Recognition Engine Reporter : Chun-Feng Liao NCCU Dept. of Computer Sceince Intelligent Media Lab.
Plug-in System for the Xylia Extensible XML Editor Student: Jonathan Milley Supervisor: Dr. T. S. Norvell.
The Basics of Javadoc Presented By: Wes Toland. Outline  Overview  Background  Environment  Features Javadoc Comment Format Javadoc Program HTML API.
Presentation: SOAP/WS in a distributed object framework, Application Servers & AXIS SOAP.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Presentation: SOAP/WS in a distributed object framework, Application Servers & AXIS SOAP.
Control in ATLAS TDAQ Dietrich Liko on behalf of the ATLAS TDAQ Group.
Copyright © 1994 Carnegie Mellon University Disciplined Software Engineering - Lecture 3 1 Software Size Estimation I Material adapted from: Disciplined.
Javadoc: Advanced Features & Limitations Presented By: Wes Toland.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
SPEECH RECOGNITION Presented to Dr. V. Kepuska Presented by Lisa & Za ECE 5526.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 4 Computer Software.
Disciplined Software Engineering Lecture #3 Software Engineering Institute Carnegie Mellon University Pittsburgh, PA Sponsored by the U.S. Department.
©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.
Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
ACOUSTIC-PHONETIC UNIT SIMILARITIES FOR CONTEXT DEPENDENT ACOUSTIC MODEL PORTABILITY Viet Bac Le*, Laurent Besacier*, Tanja Schultz** * CLIPS-IMAG Laboratory,
Confidential Continuous Integration Framework (CIF) 5/18/2004.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
Performance Comparison of Speaker and Emotion Recognition
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
Basic structure of sphinx 4
Java Example Presentation of a Language. Background Conception: Java began as a language for embedded processors in consumer electronics, such as VCR,
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
ALPHABET RECOGNITION USING SPHINX-4 BY TUSHAR PATEL.
Requirements Management with Use Cases Module 2: Introduction to RMUC Requirements Management with Use Cases Module 2: Introduction to RMUC.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Introduction to Programming 1 1 2Introduction to Java.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
JAVA TRAINING IN NOIDA. JAVA Java is a general-purpose computer programming language that is concurrent, class-based, object-oriented and specifically.
Page 1 of 10 ASR – effect of five parameters on the WER performance of HMM SR system Sanjay Patil, Jun-Won Suh Human and Systems Engineering Experimental.
1 RIC 2009 Symbolic Nuclear Analysis Package - SNAP version 1.0: Features and Applications Chester Gingrich RES/DSA/CDB 3/12/09.
System Design, Implementation and Review
Tools for Natural Language Processing Applications
Introduction to Advanced Java Programming
Start the recording …………………………….
Sphinx Recognizer Progress Q2 2004
(Computer fundamental Lab)
DEPLOYING SECURITY CONFIGURATION
Statistical based IDS background introduction
Presentation transcript:

Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008

Framework Introduction CMU Sphinx  Developed by Carnegie Mellon University.  has been supported by programs such as DARPA, IBM, Sun Microsystems  Some notable applications that use Sphinx include: Roomline, a conference room reservation system at CMU Let’s Go, a spoken dialog system in use at Pittsburgh’s transit system. HTK  originally developed in 1989 by the Speech Vision and Robotics Group of Cambridge University  HTK was purchased by Entropic Laboratories in 1993 and then again by Microsoft during its acquisition of Entropic in  The HTK source code was then licensed back to Cambridge University for advances in development. Open source since then

Phase 1: Performance-Based Areas of Comparison Training and Decoding using the AN4 Corpus Same procedure used in Homework #5 Provides following metrics  Decoder time to completion  Decoder accuracy on the sentence level.  Decoder accuracy on the word level.  Types and quantities of decoding errors encountered during the decoding process.  Notable trends of errors  Memory requirements for recognizer at runtime

Phase 2: Other Notable Areas of Comparison Coded data feature format support Language Modeling support Overall ease of training and decoding corpora Notable features of the Software Baseline for each toolkit Operating System support Available documentation and community support Licensing and usage rights Future Toolkit development plans

Training and Testing Procedure for AN4 in HTK

Training Procedure Developed In a “tutorial” format: HTKTrainingDecoding_tutorial.doc An example of a full-result developed tutorial directory is also included on the CD htktut

Training Results Comparison 8 Gaussians per HMM state Context-dependant Tri-phone state models Tied states Finite State Grammar Language Model MetricSphinx3HTK Peak Memory Usage (MB) Time to Completion (sec)6393 Sentence Error Rate (%) Word Error Rate (%) Word Substitution Errors92 Word Insertion Errors71154 Word Deletion Errors20

Front-End Data Feature Support Sphinx provides wave2feat for limited conversion to MFCC (used in a previous homework). However, “Sphinx trainer and decoder are compatible with man other data formats”  Need more research into which specifically HTK Provides HCopy to do many different conversions:

Language Modeling Both frameworks use N-Gram Statistical Grammar models as well as Fixed, context-free grammars (defined by BNF-type networks). HTK includes two separate modules HLMLib and HLMTools to provide N-Gram Language Model training, class-based models, and perplexity calculations.  NOTE: HTK Book also includes a thorough tutorial building and training such a model using phrases from Sherlock Holmes Sphinx relies on other tools for LM Generation. (Reference CMU Statistical Language Model toolkit).

Notable Software Baseline Characteristics Sphinx  Organized across three components  Huge amount of Code  Uses Unix-style directory organization  Source files averaged 1200 LOC  Includes automated tests. HTK  All in one distribution  Organized into HTKLib, HTKTools, and HLMLib, HLMTools  Average LOC: 1400  Only one level of dependency between *Tools and *Lib

Documentation HTK has an excellent wealth of information available through the HTKBook. 1. The first part of the book gives enough background theory to equip relatively unversed individuals with enough knowledge to understand the mechanics of the toolkit. 2. Section two of the book provides extensive details about the core architecture of HTK through the major phases of model training and testing. 3. Section three provides an in-depth look into the language modeling features that HTK provides as a part of its framework. 4. Section four provides a detailed reference to each application that is provided with the framework. No comparably detailed information exists for Sphinx. (Does have automatically maintained Doxygen and JavaDoc, however).

Licensing (IMPORTANT!) MAJOR Difference in the restrictions. HTK – “The Licensed Software either in whole or in part can not be distributed or sub- licensed to any third party in any form.” Makes the application of HTK a very important question when deciding. Sphinx Licensed by CMU, may be re- distributed.

Recent Release Activity and Future Plans Sphinx  Last release of a major Sphinx component (Sphinx3) was in 06/2007.  PocketSphinx, embedded decoder  Sphinx-4, pure Java implementation. HTK  Last release of HTK3 in 12/2006  Lack of public announces.

Comparison Matrix Developed to summarize results across many areas of comparison comparison_matrix.xls

References Main HTK Website -- Sourceforge Sphinx Brief Sphinx/HTK Comparison HTKBook -- ASR System Review /pellom-2004/lecture-09.pdfhttp:// /pellom-2004/lecture-09.pdf Arthur Chan Sphinx Presentation Sphinx-3 Decoder Wiki -- l#lm_dumpfile l#lm_dumpfile