2 nd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jun 7, 2005.

Slides:



Advertisements
Similar presentations
CALO Decoder Progress Report for March Arthur (Decoder and ICSI Training) Jahanzeb (Decoder) Ziad (ICSI Training) Moss (ICSI Training) Carnegie Mellon.
Advertisements

SYSTEM PROGRAMMING & SYSTEM ADMINISTRATION
1 A Real Problem  What if you wanted to run a program that needs more memory than you have?
Java.  Java is an object-oriented programming language.  Java is important to us because Android programming uses Java.  However, Java is much more.
Brief Overview of Different Versions of Sphinx Arthur Chan.
Progress of Sphinx 3.X From X=5 to X=6 Arthur Chan Evandro Gouvea David J. Huggins-Daines Alex I. Rudnicky Mosur Ravishankar Yitao Sun.
CALO Recorder/Decoder Progress Report for Summer 2004 (July and August) Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder)
3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006.
Speed-up Facilities in s3.3 GMM Computation Seach Frame-Level Senone-Level Gaussian-Level Component-Level Not implemented SVQ-based GMM Selection Sub-vector.
From Main() to the search routine in Sphinx 3 (s3accurate) Arthur Chan July 8, 2004.
Chapter 1 Introduction to Object- Oriented Programming and Problem Solving.
Progress of Sphinx 3.X, From X=4 to X=5 By Arthur Chan Evandro Gouvea Yitao Sun David Huggins-Daines Jahanzeb Sherwani.
Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh Welcome.
Progress Presentation of Sphinx 3.6 (2005 Q2) Arthur Chan Carnegie Mellon University Jun 7, 2005.
WEL COME PRAVEEN M JIGAJINNI PGT (Computer Science) MCA, MSc[IT], MTech[IT],MPhil (Comp.Sci), PGDCA, ADCA, Dc. Sc. & Engg.
Chapter 6: An Introduction to System Software and Virtual Machines
Sphinx 3.4 Development Progress Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 4, 2004.
CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 2: Operating-System Structures Modified from the text book.
Sphinx 3.4 Development Progress Report in February Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 1, 2004.
15-Jul-04 FSG Implementation in Sphinx2 FSG Implementation in Sphinx2 Mosur Ravishankar Jul 15, 2004.
Systems Life Cycle A summary of what needs to be done.
Starting Chapter 4 Starting. 1 Course Outline* Covered in first half until Dr. Li takes over. JAVA and OO: Review what is Object Oriented Programming.
Linux Operations and Administration
Chapter 7: Architecture Design Omar Meqdadi SE 273 Lecture 7 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
Slide 1 System Software Software The term that we use for all the programs and data that we use with a computer system. Two types of software: Program.
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
SOFTWARE REUSE 28 March 2013 William W. McMillan.
Chapter 2: Operating-System Structures. 2.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Jan 14, 2005 Operating System.
CMU Shpinx Speech Recognition Engine Reporter : Chun-Feng Liao NCCU Dept. of Computer Sceince Intelligent Media Lab.
INTRODUCTION SOFTWARE HARDWARE DIFFERENCE BETWEEN THE S/W AND H/W.
Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.
Software Development Process.  You should already know that any computer system is made up of hardware and software.  The term hardware is fairly easy.
Guide to Programming with Python Chapter One Getting Started: The Game Over Program.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Intermediate 2 Software Development Process. Software You should already know that any computer system is made up of hardware and software. The term hardware.
1 CS 501 Spring 2004 CS 501: Software Engineering Lecture 2 Software Processes.
Page 1© Crown copyright 2004 FLUME Metadata Steve Mullerworth 3 rd -4 th October May 2006.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
Intermediate 2 Computing Unit 2 - Software Development Topic 2 - Software Development Languages and Environments.
1 Software. 2 What is software ► Software is the term that we use for all the programs and data on a computer system. ► Two types of software ► Program.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
© 2006 Pearson Addison-Wesley. All rights reserved 2-1 Chapter 2 Principles of Programming & Software Engineering.
1 MSTE Visual SourceSafe For more information, see:
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Centroute, Tenet and EmStar: Development and Integration Karen Chandler Centre for Embedded Network Systems University of California, Los Angeles.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
Getting ready. Why C? Design Features – Efficiency (C programs tend to be compact and to run quickly.) – Portability (C programs written on one system.
ALPHABET RECOGNITION USING SPHINX-4 BY TUSHAR PATEL.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
Lecture 10 Page 1 CS 111 Online Memory Management CS 111 On-Line MS Program Operating Systems Peter Reiher.
CSCE 240 – Intro to Software Engineering Lecture 3.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
Introduction to Operating Systems Concepts
Chapter 1 Introduction.
A Real Problem What if you wanted to run a program that needs more memory than you have? September 11, 2018.
Behavioral Design Patterns
Chapter 1 Introduction.
Objects First with Java
Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st )
CALO Decoder Progress Report for April/May
Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems
Progress Report of Sphinx in Q (Sep 1st to Dec 30th)
Sphinx Recognizer Progress Q2 2004
Java Programming Introduction
From Use Cases to Implementation
Presentation transcript:

2 nd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jun 7, 2005

This meeting (2 nd Progress Meeting of 3.6) Purpose of this meeting A working progress report on various aspects of the development A briefing on embedded sphinx2. (by David) A briefing on sphinx3’s “crazy branch” (by Arthur) As a branch in CVS Include several interesting features Include bunches of mild changes Discussion before another check-in.

Outline of this talk Review of 1 st Progress Meeting Progress of Embedded version of Sphinx 2 (by Dave, 7-10 pages) Progress of Sphinx 3’s crazy branches (15-20 pages) Architecture Diagram of Sphinx 3.6 Changes in search abstraction (7 pages) Progress on search implementation (8 pages) GMM Computation FSG mode, Word Switching Tree Search mode Mild re-factoring (Not “gentle” any more) (3 pages) LM S3.0 family of tools Hieroglyph (1 page)

Review of 1 st Progress Meeting Last time.. Two separate layers were defined Low-Level Implementation of Search and Possible abstractions of Search Just introduced, its advantage was not yet revealed. Implementation of Mode 5 was still under developed (only 10% Completion) Just modularize libs3decoder to 8 sub-modules

Progress of Architecture in Sphinx 3.6

Motivation of Architecting Sphinx 3.X Need of new search algorithms New search algorithm development could have risk. We don’t want to throw away the old one. Mere replacement could cause backward compatibility problem. Code has grown to a stage where Some changes could be very hard. Multiple programmers become active at the same time CVS conflict could become often if things are controlled by “if-else” structure

Architecture of Sphinx 3.X (X<6) Batch sequential Architecture (Shaw 96) Each executable would customize the sub- routines decode livepretend Decode_anytopo align allphone GMM Computation 1 approx_cont_mgau Search 1 Process Controller 1 GMM Computation 2 (Using gauden & senone Method 1) Search 2 Process Controller 2 GMM Computation 3 (Using gauden & senone Method 2) Search 3 Process Controller 3 GMM Computation 4 (Using gauden & senone Method 3) Search 4 Process Controller 4 Command Line 1Command Line 2Command Line 3Command Line 4 Initialization 1 (kb and kbcore) Initialization 2Initialization 3Initialization 4

Pros/Cons of Batch Sequential Architecture Pros: Great flexibility for individual programmers No assumption, data structure are usually optimized for the application. Align and allphone have optimization. Crafting in individual application has high quality Cons: Tremendous difficulty in maintenance Most changes need to be carried out for 5-6 times. Spread disease of code duplication Code with functionality was duplicated multiple times Scared a lot of programmers in the past Beginners tend to love general architecture

Big Picture of Software Architecture in Sphinx 3.6 Layered and Object Oriented Implemented in C Major high level routines Initializer (kb.c or kbcore.c) A kind of clipboard for other controllers Process controller (corpus.c) Govern the protocol of processing a sentence Search abstraction routine (srch.c) Govern how search is done Implemented as piplines and filters with shared memory Each filter can be overridden, similar to what OO language do Command line processor (cmd_ln_macro.c and cmd_ln.c) – implemented as macros.

Software Architecture Diagram of Sphinx 3.6 Applications Controllers/ Abstractions ImplementationsLibraries decode livepretend align allphone dag astar livedecode API Search Controller Process Controller Search Initializer Command Line Processor User Defined Applications Fast Single Stream GMM Computation Multi Stream GMM Computation Mode 0 : Align Mode 1 : Allphone Mode 2 : FSG Mode 3 : Anytopo Mode 4 : Magic Wheel Mode 5 : WSFT Dictionary Library Search Library LM Library AM Library Utility Library Feature Library Miscellaneous Library decode (anytopo)

Search Abstraction Search abstraction is implemented as objects Search operations are implemented as filters with shared memory Each filter, a kind of unique operation for search Ideally, each filter or a set of filter can be replaced. Select Active CD Senone Compute Approx. GMM Score (CI senone) Compute Detail GMM Score (CD senone) Compute Detail HMM Score (CD) Propagate Graph (Phone- Level) Rescoring At word End using High-Level KS (e.g. LM) Propagate Graph (Word- Level) Search For One Frame

Different ways to implement Search implementations 1, Use Default implementation Just specify all atomic search operations (ASOs) provided 2, Override “search_one_frame” Only need to specify GMM computation and how to “search_one_frame” 3, Override the whole mechanism For people who dislike the default so much Override how to “search”

Concrete Examples Mode 4 (Magic Wheel) and Mode 5 (WST) are using the default implementation Mode 2 (FSG) override “search_one_frame” implementation But share GMM implementation. Likely, Mode 0 (align),1 (allphone) and 3 (flat lexicon decoding) will also do the same.

Future work Align, allphone and decode_anytopo’s re-factoring are not yet completed. Search abstraction need to consider More flexible mechanisms Do the search backward. (for backward search) Approximate search in the first stage (for phoneme and word look- ahead) (Optional) Parallel and distributed decoding Command-line and internal modules could still have mismatch Might learn from mechanisms of Sphinx 2 and Sphinx 4 Controlling how an utterance could require 5 different files A better control format? Not yet fully anticipate fixed point front-end and GMM computation in Sphinx 2

Progress of Search Implementation in Sphinx 3.6

GMM Computation Decode can now use SCHMM specify by.semi. Implemented and tested by Dave GMM Computation in align, allphone, decode, livepretend are now common Not yet incorporate Sphinx 2 Fixed- point version of GMM computation It looks very delicious.

Finite State Machine Search (Mode 2) -Implementation Largely Completed (Completion 70%) Recipe: Search function pointer implementation adapted from Sphinx 2 FSG_* family of routines GMM computation Use Sphinx 3 GMM computation Already allows CIGMMS

Finite State Machine Search (Mode 2) –Problems for the Users Not yet seriously tested Finding test cases are hard Still don’t have a way to write grammar Yitao’s goal in Q3 and Q Either directly incorporate the CFG’s score into the search Or implement an approximate converter from CFG to FSM (HTK’s method)

Finite State Machine Search (Mode 2) –Other Problems Problems inherited from Sphinx2 (copied from Ravi’s slide) No lextree implementation (What?) Static allocation of all HMMs; not allocated “on demand” (Oh, no! ) FSG transitions represented by NxN matrix (You can’t be serious!! ) Other wish list No histogram pruning (Houston, we’ve got a problem.) No state-based implementation (Wilson! I am sorry!! ) We need it for unifyication of BW, alignment, allphone and FSG search.

Time Switching Tree Search (Mode 4) Name changes: It was “lucky wheel” Now is “magic wheel” In last check-in, after test-full, results are exactly the same for 6 corpora We could sleep. Future work: Change the word end triphone implementation from composite triphone to full triphones

Word Switching Tree Search (Mode 5) Now could run for the Communicator task With the same performance as mode 4 Major reasons why it doesn’t approach decode_anytopo’s result Bigram probability is not yet factored Not an easy task. Still considering howto. Triphone’s implementation is not yet exact Completion 30%

Future work on Mode 5 N-gram Look-ahead Full trigram tree implementation Phoneme and Word Look-ahead Share full triphone implementation with mode 4 in future.

Big picture of All Search Implementations Finite state machine data structure could unify align, allphone, Baum-Welch, FSG search Time will show whether it is also applicable in tree search. Search implementation has more short-term demand. Mode 5 will be our new flag ship By Oct, 3 out of 4 goals in mode 5 should be completed. Between different searches, code should be shared as much as possible

Some other mild refactorings

Summary of Re-factorings Not gentle any more But it is mild Several useful things to know Language model routine revamping S3.0 family of tools Overall status of merging

LM routine Current capability Read both text-based and DMP-based LM Allow switching of LM Allow inter-conversion between text and DMP format of LM Provide single interface to all applications Tool of the month : lm_convert lm3g2dmp++ Will be the application for future language model inter-conversion Other formats? CMULMTK’s format?

S3.0 family of tools Architecture drives many changes in the code Align, allphone and decode_anytopo now use kbcore Same version of multi-stream GMM Computation routine Simplified search structure. ctl_process mechanism Next step is to use srch.c interface. All tools are now sharing Sets of common command-line macros

Code Merging Sphinx3.0, Sphinx 3.X and share are now unified. Alex: “It’s time to fix the training algorithms!” Ravi: “It’s time to add full n-gram and full n-phones to the recognizer!!” Dave: ”It’s time to work on pronunciation modeling!” Yitao: “It’s time to implement a CFG-based search!!” Evandro: “It’s time to do more regression test!” Alan: “Don’t merge Sphinx with festival!!” Next step: It’s time to clean up SphinxTrain. We will keep the pace to be <4 tools check-in/month.

Hieroglyphs Halves of Chapter 3 and 5 are finished Chapter 3: “Introduction to Speech Recognition” Missing : Description of DTW, HMM and LM Chapter 5: “Roadmap of building speech recognition system” Missing How to evaluate the system? How to train a system? (Evandro’s tutorial will be perfect) Still ~4 chapters (out of 12) of material to go before 1 st draft is written

Conclusion We have done something. Embedded Sphinx 2 Its completion will benefit both sphinx 2 and sphinx 3 Sphinx 3.6 Its completion will benefit long term development Short term need in funded projects Tentative deadline: Beginning of October