Sphinx 3.4 Development Progress Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 4, 2004.

Slides:



Advertisements
Similar presentations
Heuristic Search techniques
Advertisements

Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
Conceptual Clustering
Traveling Salesperson Problem
CALO Decoder Progress Report for March Arthur (Decoder and ICSI Training) Jahanzeb (Decoder) Ziad (ICSI Training) Moss (ICSI Training) Carnegie Mellon.
EE 553 Integer Programming
UNIVERSITY OF JYVÄSKYLÄ Building NeuroSearch – Intelligent Evolutionary Search Algorithm For Peer-to-Peer Environment Master’s Thesis by Joni Töyrylä
Chapter 7 – Classification and Regression Trees
Brief Overview of Different Versions of Sphinx Arthur Chan.
Making Choices using Structure at the Instance Level within a Case Based Reasoning Framework Cormac Gebruers*, Alessio Guerri †, Brahim Hnich* & Michela.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Speed-up Facilities in s3.3 GMM Computation Seach Frame-Level Senone-Level Gaussian-Level Component-Level Not implemented SVQ-based GMM Selection Sub-vector.
1 Introducing Collaboration to Single User Applications A Survey and Analysis of Recent Work by Brian Cornell For Collaborative Systems Fall 2006.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
Spatial and Temporal Data Mining
From Main() to the search routine in Sphinx 3 (s3accurate) Arthur Chan July 8, 2004.
Improving Word-Alignments for Machine Translation Using Phrase-Based Techniques Mike Rodgers Sarah Spikes Ilya Sherman.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Progress of Sphinx 3.X, From X=4 to X=5 By Arthur Chan Evandro Gouvea Yitao Sun David Huggins-Daines Jahanzeb Sherwani.
Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh Welcome.
Technical Aspects of the CALO Recorder By Satanjeev Banerjee Thomas Quisel Jason Cohen Arthur Chan Yitao Sun David Huggins-Daines Alex Rudnicky.
1 USING CLASS WEIGHTING IN INTER-CLASS MLLR Sam-Joo Doh and Richard M. Stern Department of Electrical and Computer Engineering and School of Computer Science.
CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University.
CS Instance Based Learning1 Instance Based Learning.
Sphinx 3.4 Development Progress Report in February Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 1, 2004.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
FLANN Fast Library for Approximate Nearest Neighbors
Clustering Vertices of 3D Animated Meshes
Overview of Search Engines
Face Detection using the Viola-Jones Method
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov.
Machine Learning CUNY Graduate Center Lecture 1: Introduction.
Active Learning for Class Imbalance Problem
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variances Jing Xiang & Seyoung Kim Bayesian Network Structure Learning X 1...
Chapter 9 – Classification and Regression Trees
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.
An Implementation and Performance Evaluation of Language with Fine-Grain Thread Creation on Shared Memory Parallel Computer Yoshihiro Oyama, Kenjiro Taura,
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Temple University Training Acoustic Models Using SphinxTrain Jaykrishna Shukla, Mubin Amehed, and Cara Santin Department of Electrical and Computer Engineering.
StoryFlow: Tracking the Evolution of Stories IEEE INFOVIS 2013 Shixia Liu, Senior Member, IEEE, Microsoft Research Asia Yingcai Wu, Member, IEEE, Microsoft.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
A Fast LBG Codebook Training Algorithm for Vector Quantization Presented by 蔡進義.
Static Process Scheduling
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Custom Computing Machines for the Set Covering Problem Paper Written By: Christian Plessl and Marco Platzner Swiss Federal Institute of Technology, 2002.
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
Operating Systems Lecture 9 Introduction to Paging Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Introduction to Machine Learning, its potential usage in network area,
Data Driven Resource Allocation for Distributed Learning
Better Algorithms for Better Computers
Real-Time Ray Tracing Stefan Popov.
Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st )
CALO Decoder Progress Report for April/May
Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems
Progress Report of Sphinx in Q (Sep 1st to Dec 30th)
Decision Making Based on Cohort Scores for
Sphinx Recognizer Progress Q2 2004
Foundation of Video Coding Part II: Scalar and Vector Quantization
Advisor: Chin-Chen Chang1, 2 Student: Yi-Pei Hsieh2
Data Transformations targeted at minimizing experimental variance
Module Recognition Algorithms
Rohan Yadav and Charles Yuan (rohany) (chenhuiy)
Presentation transcript:

Sphinx 3.4 Development Progress Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 4, 2004

This seminar  Overview of Sphinxes (5 mins.)  Report on Sphinx 3.4 development progress (40 mins.) Speed-up algorithms Language model facilities  User/developer forum (20 mins.)

Sphinxes  Sphinx 2 Semi-continuous HMM-based Real-time performance : 0.5xRT – 1.5xRT  Tree lexicon Ideal for application development  Sphinx 3 Fully-Continuous HMM Significantly slower than Sphinx 2 : 14-17xRT (tested in P4 1G)  Flat lexicon. Ideal for researcher  Sphinx 3.3 Significant modification of Sphinx 3 Close to RT performance 4-7xRT – Tree lexicon

Sphinx 3.4  Descendant of Sphinx 3.3 With improved speed performance  Already achieved real-time performance (1.3xRT) in Communicator task. Target users are application developers Motivated by project CALO

Overview of S3 and S3.3: Computations at every frame Senone Computation Search Scores (Information For Pruning GMM) S3: -Flat lexicon, all senones are computed. S3.3: -Tree lexicon, senones only when active in search.

Current Systems Specifications (without Gaussian Selection) Sphinx 3Sphinx 3.3 Speed in P4-1G Tested in Communicator Task ERR 17.2% 11xRT GMM, 3xRT Srch ERR 18.6% 6xRT GMM, 1xRT Srch GMM ComputationsNot optimized (few code optimization) Can applied Sub-VQ-based Gauss. Selection LexiconFlatTree SearchBeam on search, no beam on GMM Beam on Search Beam on GMM.

Our Plan in Q1 2004: upgrade s3.3 to s3.4  Fast Senone Computation 4-Level of Optimization  Other improvements Phoneme look-ahead  Reduction of search space by determining the active phoneme list at word-begin. Multiple and dynamic LM facilities

Fast Senone Computation  More than >100 techniques can be found in the literature from  Most techniques claim to have 50-80% reduction of computation with “ negligible ” degradation  Practically : It translate to 5% to 30% relative degradation.  Our approaches categorize them to 4 different types implement representative techniques tune system to <5% degradation  Users can choose which types of technique should be used.

Fast GMM Computation: Level 1: Frame Selection -Compute GMM in one and other frame only -Improvement : Compute GMM only if current frame is similar to previous frame

Algorithms  The simple way (Na ï ve Down-Sampling) Compute senone scores only one and another N frames  In Sphinx 3.4, implemented Simple way Improved version (Conditional Down-Sampling)  Found sets of VQ codebook.  If a vector is clustered to a codeword again, computation is skipped.  Naive down-sampling Rel 10% degradation, 40-50% reduction  Conditional down-sampling Rel 2-3% degradation, 20-30% reduction

Fast GMM Computation: Level 2 : Senone Selection -Compute GMM only when its base-phones are highly likely -Others backed-off by the base phone scores. -Similar to -Julius: (Akinobu 1999) -Microsoft ’ s Rich Get Richer (RGR) heuristics GMM

Algorithm: CI-based Senone Selection  If base CI senone of CD senone has high score E.g. aa (base CI senone) of t_aa_b (CD senone) compute CD senone  Else, Back-off to CI senone  Known problems. Back-off caused many senone scores be the same Caused inefficiency of the search  Very effective 75%-80% reduction of senone computation with <5% degradation Worthwhile in system with large portion time spent in doing GMM computation.

Fast GMM Computation: Level 3 : Gaussian Selection Gaussian GMM

Algorithm: VQ-based Gaussian Selection  Bochierri 93  In training: Pre-compute a set of VQ codebook for all means. Compute the neighbors for each senones for codeword.  If the mean of a Gaussian is closed to the codeword, consider it as a neighbor.  In run-time: Find the closest codeword for the feature. compute Gaussian distribution(s) only when they is/are the neighbor  Quite effective 40-50% reduction, <5% degrdation

Issues: Require back-off schemes.  Minimal number of neighbors  Always use the closest Gaussian as a neighbor (Douglas 99) Further constraints to reduce computation.  Dual-ring constraints (Knill and Gales 97) Overhead is quite significant

Other approaches  Tree-based algorithm k-d tree Decision tree  Issues : How to adapt these models? No problem for VQ-based technique Research problems.

Fast GMM Computation: Level 4 : Sub-vector quantization Gaussian Feature Component

Algorithm (Ravi 98)  In training: Partition all means to subvectors For each sets of subvectors  Find a set of VQ code-book  In run-time: For each mean  For each subvector Compute the closest index Compute Gaussian score by combining all subvector scores.

Issue  Can be used in Gaussian Selection Use approximate score to decide which Gaussian to compute  Use as an approximate score Require large number of sub-vectors (13) Overhead is huge  Use as Gaussian Selection Require small amount of sub-vectors(3) Overhead is still larger than VQ.  Machine-related issues.

Summary of works in GMM Computation:  4-level of algorithmic optimization.  However 2x2 !=4 There is a certain lower limit of computation (e.g %)

Work in improving search: Phoneme Look-ahead  Phoneme Look-ahead Use approximate senone scores of future frames to determine whether a phone arc should be extended. Current Algorithm  If any senone of a phone HMM is active in any of future N frame, the phone is active.  Similar to Sphinx II.  Results not very promising Next step: try to add path-score in decision.

Speed-up Facilities in s3.3 GMM Computation Seach Frame-Level Senone-Level Gaussian-Level Component-Level Not implemented SVQ-based GMM Selection Sub-vector constrained to 3 SVQ code removed Lexicon Structure Pruning Heuristic Search Speed-up Tree. Standard Not Implemented

Summary of Speed-up Facilities in s3.4 GMM Computation Seach Frame-Level Senone-Level Gaussian-Level Component-Level (New) Naïve Down-Sampling (New) Conditional Down-Sampling (New) CI-based GMM Selection (New) VQ-based GMM Selection (New) Unconstrained no. of sub- vectors in SVQ-based GMM Selection (New) SVQ code enabled Lexicon Structure Pruning Heuristic Search Speed-up Tree (New) Improved Word-end Pruning (New) Phoneme- Look-ahead

Language Model Facilities  S3 and S3.3 Only accept non-class-based LM in DMP format. Only one LM can be specified for the whole test set.  S3.4 Basic facilities for accepting class-based LM in DMP format Support dynamic LM  Not yet thoroughly tested, may disable it before stable.

Availability  Internal release to CMU initially Put in Arthur ’ s web page next week. Include  speed-up code  LM facilities(?)  If it is more stable, will put in Sourceforge.

Sphinx 3.5?  Better interfaces Stream-lined recognizer  Enable Sphinx 3 to learn (AM and LM adaptation)  Further Speed-up and improved accuracy Improved lexical tree search Machine optimization Multiple recognizer combination?  Your ideas:

Your help is appreciated. Current team:  Arthur = (Maintainer + Developer) * Regression Tester ^ (Support)  Jahanzeb = Developer in Search+ Regression Tester  Ravi = Developer + Consultant We need,  Developers  Regression testers  Test scenarios  Extension of current code.  Suggestions  Comments/Feedbacks.  Talk to Alex if you are interested.