CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
Oracle Rally Applications Modernization. 4 June About the Company Founded in 2002 Unites high-level information technology and organization architecture.
Advertisements

Web Trnsport Implementation TEA/TUG October 1-9, 2008 Saratoga Springs, NY.
Sybase PowerBuilder Applications Modernization. 11 October About the Company Founded in 2002 Unites high-level information technology and organization.
WELCOME TO ACOMS! Alaska Corrections Offender Management System.
CALO Decoder Progress Report for March Arthur (Decoder and ICSI Training) Jahanzeb (Decoder) Ziad (ICSI Training) Moss (ICSI Training) Carnegie Mellon.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Brief Overview of Different Versions of Sphinx Arthur Chan.
EU-GRID Work Program Massimo Sgaravatto – INFN Padova Cristina Vistoli – INFN Cnaf as INFN members of the EU-GRID technical team.
Progress of Sphinx 3.X From X=5 to X=6 Arthur Chan Evandro Gouvea David J. Huggins-Daines Alex I. Rudnicky Mosur Ravishankar Yitao Sun.
CALO Recorder/Decoder Progress Report for Summer 2004 (July and August) Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder)
3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006.
2 nd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jun 7, 2005.
Speed-up Facilities in s3.3 GMM Computation Seach Frame-Level Senone-Level Gaussian-Level Component-Level Not implemented SVQ-based GMM Selection Sub-vector.
Learning in the Wild Satanjeev “Bano” Banerjee Dialogs on Dialog March 18 th, 2005 In the Meeting Room Scenario.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
From Main() to the search routine in Sphinx 3 (s3accurate) Arthur Chan July 8, 2004.
Computer Engineering 203 R Smith Agile Development 1/ Agile Methods What are Agile Methods? – Extreme Programming is the best known example – SCRUM.
Review of ICASSP 2004 Arthur Chan. Part I of This presentation (6 pages) Pointers of ICASSP 2004 (2 pages) NIST Meeting Transcription Workshop (2 pages)
Transitioning to XP or The Fanciful Opinions of Don Wells.
Progress of Sphinx 3.X, From X=4 to X=5 By Arthur Chan Evandro Gouvea Yitao Sun David Huggins-Daines Jahanzeb Sherwani.
Almost-Spring Short Course on Speech Recognition Instructors: Bhiksha Raj and Rita Singh Welcome.
Technical Aspects of the CALO Recorder By Satanjeev Banerjee Thomas Quisel Jason Cohen Arthur Chan Yitao Sun David Huggins-Daines Alex Rudnicky.
Using MyMathLab Features You must already be registered or enrolled in a current MyMathLab class in order to use MyMathLab. If you are not registered or.
Sphinx 3.4 Development Progress Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 4, 2004.
Chapter 1 Understanding the Web Design Environment
Sphinx 3.4 Development Progress Report in February Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 1, 2004.
15-Jul-04 FSG Implementation in Sphinx2 FSG Implementation in Sphinx2 Mosur Ravishankar Jul 15, 2004.
Chapter 1 Variables in the Web Design Environment
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Evaluation David Kauchak cs458 Fall 2012 adapted from:
1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.
Capstone Experience at UNH Manchester Student Guided Mentoring for an Undergraduate Research Group in Speech Capstone Objectives Challenges Technology.
THE PROTOTYPING MODEL The prototyping model begins with requirements gathering. Developer and customer meet and define the overall objectives for the software.
Notes on ICASSP 2004 Arthur Chan May 24, This Presentation (5 pages)  Brief note of ICASSP 2004  NIST RT 04 Evaluation results  Other interesting.
Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.
Temple University Goals : 1.Down sample 20 khz TIDigits data to 16 khz. 2. Use Down sample data run regression test and Compare results posted in Sphinx-4.
Nightly Releases and Testing Alexander Undrus Atlas SW week, May
A. Aimar - EP/SFT LCG - Software Process & Infrastructure1 Software Process panel SPI GRIDPP 7 th Collaboration Meeting 30 June – 2 July 2003 A.Aimar -
EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,
PTT GSP Knowledge Management System User Training Ekkarin Sereechuenpojit System Engineer Infrastructure Solutions Wannee Govitsutthisak System Engineer.
1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát.
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.
Intermediate 2 Software Development Process. Software You should already know that any computer system is made up of hardware and software. The term hardware.
Making Python Pretty!. How to Use This Presentation… Download a copy of this presentation to your ‘Computing’ folder. Follow the code examples, and put.
GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!
Using MyMathLab Features of MyMathLab You must already be registered or enrolled in a current MyMathLab class in order to use MyMathLab. If you are not.
LCG Generator Meeting, December 11 th 2003 Introduction to the LCG Generator Monthly Meeting.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
Quick Introduction to creating Eyeblaster ads.  Steps for creating an Eyeblaster ad  Building Eyeblaster compatible flash assets  Adding Eyeblaster.
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
APT Configuration Management May 25th, 2004 APT Configuration Management Jesse Doggett.
PDS4 Demonstration Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
© 2007 by Michal Dobisek; made available under the EPL v1.0 | EclipseCon 2007 Michal Dobisek, Inside Subversive The Subversion.
CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
Customer Part Specifications Version 7.0 New Features.
DataGrid is a project funded by the European Commission under contract IST EDG Baseline API Document Document build description and current.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Planning a Network Upgrade Working at a Small-to-Medium Business or.
1 Usability Analysis n Why Analyze n Types of Usability Analysis n Human Subjects Research n Project 3: Heuristic Evaluation.
#SummitNow Yes, I'm able to index audio files within Alfresco 2013 Fernando González @fegorama.
Wildfire Costing (WFCST) for Recorders A component of Wildfire One Fire Season 2016.
Development Environment
Yes, I'm able to index audio files within Alfresco
WP4-install status update
Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st )
CALO Decoder Progress Report for April/May
Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems
Progress Report of Sphinx in Q (Sep 1st to Dec 30th)
Sphinx Recognizer Progress Q2 2004
Presentation transcript:

CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University July 6, 2004

This Presentation  Progress report for June (15 pages) Review and Highlight (2 pages) ICSI AM training (4 pages) Infrastructure (2 page) Decoder (8 pages) Summary and Outlook (1 pages)  Review of Q Live-mode APIs not completed Sphinx not yet tested for task with vocab> 2k ICSI training just started

June high-light  They are completed ! (to some extent)  Live-mode APIs prototype is completed A demo is built.  Sphinx 3.4 went through the WSJ 5k task successfully Without pruning  First two phases of ICSI training are completed

ICSI Training -Grand Plan  By Ziad and ArthurC  Transcript conversion is completed  4 Phases Phase I - Replication of Rita ’ s training Phase II – Fixing Resource  Use corrected train/test/dev sets  Fixed transcriptions and dictionary Phase III – Tuning  Training: On topology/#senones/#mix  Recognition: Parameters tuning Phase IV – Further Improvement  Use SCHMM to generate trees?  Automatic question generation?  Others?

ICSI Training -Current Status  Phase I completed Within 0.5% difference from Rita ’ results Tested on transcriber ’ s meeting  47.3% WERR. (45.2% WERR when equivalence pair were considered)  Phase II completed In the development set and testing set  Results varied from 47% to 29% Clipped speech deletion found to be ineffective.

ICSI Training -Before we go to Phase III  From the last two phases We have some results that looks good. BUT, Results vary with meeting conditions  # of speakers?  Speaker speaking rate entropy?  Cross talk?  Understanding is more important than typing!  Plan of next month Understand why recognition results vary Complete Phase III and IV with current test sets. Obtain standard test set from NIST

Infrastructure (2 pages) -Workshops and Presentations  2 CVS Workshops had great discussion in the workshop Slides can be found at ArthurC ’ s web page Will re-do it in the new semester.  2 Speech Developer ’ s meetings Next meeting on this Thursday:  “ From main() to GMM computation.

Infrastructure -CVS  What ’ re there in CVS? MRCP source code (v1 and v2) Standard training scripts:  ICSI Conversion Scripts  Communicator Training Scripts Guarantee giving you 100% Satisfaction and 12% WERR.  WSJ 5k Training Scripts Guarantee giving you 100% Satisfaction and 8% WERR.  Outlook Need to migrate to other machines. Next: ICSI training scripts (P1 to P4) Communicator /WSJ testing scripts.

Decoder work (7 pages) -Interface  By Yitao (he didn ’ t even get hurt!)  Sphinx 2-like APIs ’ prototype is completed, functions completed Initialization  A demo is also built.  Will be officially included in Sphinx 3.5. Latest code already available in CVS  Plan of July Let the APIs go-through its ultimate challenge: be used in an application. Enable logging of the recognizer

Decoder work -Speed  With big help from Evandro  WSJ 5k task evaluation completed NVP, perplexity ~= 90 Tested under a 2G machine All results are not tuned. (very wide beam-width, no fast GMM computation)  S3 (s3flat) : WERR 6.5%, Speed 2.7xRT  S3.4 (s3fast) : WERR 6.65%, Speed 0.94xRT  Conclusion : WSJ 5k task is not our challenge.  Plan of July -> It is time to try a 20k task. (ICSI or WSJ 20k)

SphinxTrain work  In the current Baum-Welch trainer of SphinxTrain (v0.92) Silence is not optionally deleted in Baum-Welch Multiple pronunciations are not allowed in Baum-Welch We rely on force alignment to get the correct alignment

SphinxTrain 0.93 progress  Silence Modeling Optional silence deletion is now allowed Progress : Completed  Multiple Pronunciation To be Allowed in Baum-Welch Progress : nearly completed (need 2-3 days)  Correct Triphone Expansion May not have time to finish it in Q3.  Plan of July Enable multiple pronunciations in Baum-Welch Legacy is a problem! (We could fix Sphinx 4 Trainer instead.)

Decoder work -Adaptation  Mainly code-tracing in this part  Situation: Two versions of MLLR adaptation (Sam Joo ’ s and SphinxTrain ’ s) Some code need to be refined before we expose them S3flat has MLLR but not S3fast  Plan of this month After finish trainer job, we will tackle it.

Decoder work – Packaging and Distribution  Official Web page: cmusphinx.sourceforge.net/  Release Process 1, set n = 1 2, Loop  Distribute the Release Candidate n  See anyone yell in one week (calm down period)  If yes, n = n + 1, loop again.  If no, break 3, Copy the RC into Sourceforge ’ s standard distribution web site.  Current status: People yelled in RC II in the calm down period (Yitao fixed them) Create RCIII this week.

Decoder work -Miscellaneous  Continuous HMM for Communicator model is also completed. Ready for combination (Do we want to?) Possibly we want to combine ICSI model and CMU model.  Training script is still a big headache for use Still have no time to fix it.

Decoder work – Documentation (aka sphinxDoc)  Only have progress when ArthurC procrastinates and doesn ’ t want to read and play video game  Draft I of Chapter I and II are completed. Chapter I : License Agreement and user responsibility Chapter II :  What is speech recognition for dummy.  History of speech recognition  History of sphinx  Version of sphinx (When to use what)

Summary and Outlook  We have done something in June  We better do more in next 3 months.  Priorities – We have to deal with “ CALO Grand Challenge ” Recorder/Classifier/Recognizer Integration Improvement of Acoustic/Language Modeling Speaker Adaptation  Non-completed tasks always on the list and will pop up in the right time.