Progress Report of Sphinx in Q (Sep 1st to Dec 30th)

Slides:



Advertisements
Similar presentations
January 6. January 7 January 8 January 9 January 10.
Advertisements

Yasuhiro Fujiwara (NTT Cyber Space Labs)
CALO Decoder Progress Report for March Arthur (Decoder and ICSI Training) Jahanzeb (Decoder) Ziad (ICSI Training) Moss (ICSI Training) Carnegie Mellon.
Brief Overview of Different Versions of Sphinx Arthur Chan.
Progress of Sphinx 3.X From X=5 to X=6 Arthur Chan Evandro Gouvea David J. Huggins-Daines Alex I. Rudnicky Mosur Ravishankar Yitao Sun.
Sphinx on Handhelds David Huggins-Daines
CALO Recorder/Decoder Progress Report for Summer 2004 (July and August) Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder)
3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006.
2 nd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jun 7, 2005.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Speed-up Facilities in s3.3 GMM Computation Seach Frame-Level Senone-Level Gaussian-Level Component-Level Not implemented SVQ-based GMM Selection Sub-vector.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
From Main() to the search routine in Sphinx 3 (s3accurate) Arthur Chan July 8, 2004.
Progress of Sphinx 3.X, From X=4 to X=5 By Arthur Chan Evandro Gouvea Yitao Sun David Huggins-Daines Jahanzeb Sherwani.
Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh Welcome.
Progress Presentation of Sphinx 3.6 (2005 Q2) Arthur Chan Carnegie Mellon University Jun 7, 2005.
Technical Aspects of the CALO Recorder By Satanjeev Banerjee Thomas Quisel Jason Cohen Arthur Chan Yitao Sun David Huggins-Daines Alex Rudnicky.
Sphinx 3.4 Development Progress Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 4, 2004.
CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University.
Sphinx 3.4 Development Progress Report in February Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 1, 2004.
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.
 Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors.
1 4.2 MARIE This is the MARIE architecture shown graphically.
Usability Issues Documentation J. Apostolakis for Geant4 16 January 2009.
Introduction Characteristics of USB System Model What needs to be done Platform Issues Conceptual Issues Timeline USB Monitoring 2 nd Update – 7 th February.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Reading Success Lab Software: An Integrated Approach to Assessing and Strengthening Reading and Math Skills James M. Royer, PhD Department of Psychology.
Ch 1. A Python Q&A Session Spring Why do people use Python? Software quality Developer productivity Program portability Support libraries Component.
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
Temple University Training Acoustic model using Sphinx Train Jaykrishna shukla,Mubin Amehed& cara Santin Department of Electrical and Computer Engineering.
Temple University Training Acoustic Models Using SphinxTrain Jaykrishna Shukla, Mubin Amehed, and Cara Santin Department of Electrical and Computer Engineering.
Human pose recognition from depth image MS Research Cambridge.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
Performance of Compressed Inverted Indexes. Reasons for Compression  Compression reduces the size of the index  Compression can increase the performance.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
1 BROOKHAVEN SCIENCE ASSOCIATES NSLS – II Stability Workshop Conventional Facilities Temperature Stability Chris Channing P.E. Sr. Project Engineer National.
Participation of JINR in CERN- INTAS project ( ) Korenkov V., Mitcin V., Nikonov E., Oleynik D., Pose V., Tikhonenko E. 19 march 2004.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Status of CRU FW Resource Estimations Erno DAVID Wigner Research Center for Physics (HU) 10 March, 2016.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Introduction to Machine Learning, its potential usage in network area,
Digital Television (DTV)
Summary Report Project Name:
PyTimber & CO M. Betz, R. De Maria, M. Fitterer, C. Hernalsteens, T. Levens Install: $ pip install pytimber Sources:
Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI
Environment Variables…
Documentation Guidelines
Guidance on the new Pathway IT System for Mentors
Efficient Image Classification on Vertically Decomposed Data
Streamlined publishing through the cloud with HTML5
Guidance on the new Pathway IT System for Mentors
                      Digital Audio 1.
David Huggins-Daines Sphinx on Handhelds David Huggins-Daines
MLP Based Feedback System for Gas Valve Control in a Madison Symmetric Torus Andrew Seltzman Dec 14, 2010.
Microsoft Build /14/ :29 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Efficient Image Classification on Vertically Decomposed Data
OASIS Overview TC Process
Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st )
CALO Decoder Progress Report for April/May
Linear Predictive Coding Methods
EEG Recognition Using The Kaldi Speech Recognition Toolkit
Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems
Fast Communication and User Level Parallelism
Sphinx Recognizer Progress Q2 2004
Amendment Invoice AML/ASP Task Force Progress Report
Numerical Integration of Functions
Research Institute for Future Media Computing
Parallel I/O for Distributed Applications (MPI-Conn-IO)
THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU
Presentation transcript:

Progress Report of Sphinx in Q4 2004 (Sep 1st to Dec 30th) By Arthur Chan

High-light Release of Sphinx 3.5 Further Speed-up of Sphinx 3.5 Adaptation Progress New tools: decode_anytopo : the slow decoder ep: high performance model –based end-pointer cepview: versatile cepstral viewer wave2feat: stand-alone tools for converting a wave file to its feature.

Release of Sphinx 3.5 Release Candidates Progress Features Sphinx 3.5 RC II released at Oct 8, 2004 Sphinx 3.5 RC V released at Dec 19 Expected time official release: Beginning of January Features Stable live-mode APIs MLLR Further speed-up from Sphinx 3.4 Absolute CI GMM Selection Approximate computation of CI senones Complete Merging of s3.0 and s3.5 codebases. Both slow and fast decoders are available in the same codebase.

Further technical detail of Sphinx 3.5 Portable across OSes: Linux/Solaris/Mac OSX/Windows/BSD Platforms: Alpha/x86/Solaris/PPC Performance is tested extensively with vocabulary size varies from 10 to 10K Test was carried out on Linux only Results are repeatable in Windows/Linux.

Performance so far…… Live mode recognition results: TIDIGITS : 0.651% WSJ 5K : 7.73% Communicator : 13.0%

Further Speed-up of Sphinx 3.5 Absolute CI GMM Selection Instead of using a beam, only compute a certain number of CD-senones Results in 20% speed gain on top of 3.4 improvement without loss of accuracy Speed-up of CI GMM computation Using fast GMM computation technique on CI as well. Better lower bound of speed. Outlook Implement tricks such as best GMM index and LDA. (ETA: February)

Adaptation Progress MLLR is proved to work in With 10-20% gain RM1 WSJ (NAB) With 10-20% gain Experiment in unsupervised speaker adaptation is also performed in RM1 task. For detail, please read David Huggins-Daines report.

New tools for Sphinx 3.5 Merging of Sphinx 3.0 and Sphinx 3.5 are completed. S3.0 family: Very accurate batch-mode decoder: decode_anytopo: 5% better than non-optimized fast decoder Very slow 4xRT to 10xRT. Useful for batch model processing like in CALO scenario. Auxillary: align: aligner allphone: phoneme recognition dag: best-path finder in a lattice astar: n-best generator

New tools for Sphinx 3.5 (cont.) Other tools: (New!) ep: Speed-up end-pointer of Ziad’s implementation For detail of ep, please read Ziad’s report. cepview: cepstral viewer Originally a standalone tool, now distributed with s3.5 wave2feat: stand-alone feature extraction routine. Exactly the same as the one in the live-mode decoder.

Outlook in Q1 2004 Official Release of s3.5 (ETA January) Further Improvement in Speed (ETA February) Refactoring of Sphinx3.X and SphinxTrain (ETA March) It is time to turn our focus to accuracy.