Presentation is loading. Please wait.

Presentation is loading. Please wait.

CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University.

Similar presentations


Presentation on theme: "CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University."— Presentation transcript:

1 CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University July 6, 2004

2 This Presentation  Progress report for June (15 pages) Review and Highlight (2 pages) ICSI AM training (4 pages) Infrastructure (2 page) Decoder (8 pages) Summary and Outlook (1 pages)  Review of Q2 2004 Live-mode APIs not completed Sphinx not yet tested for task with vocab> 2k ICSI training just started

3 June high-light  They are completed ! (to some extent)  Live-mode APIs prototype is completed A demo is built.  Sphinx 3.4 went through the WSJ 5k task successfully Without pruning  First two phases of ICSI training are completed

4 ICSI Training -Grand Plan  By Ziad and ArthurC  Transcript conversion is completed  4 Phases Phase I - Replication of Rita ’ s training Phase II – Fixing Resource  Use corrected train/test/dev sets  Fixed transcriptions and dictionary Phase III – Tuning  Training: On topology/#senones/#mix  Recognition: Parameters tuning Phase IV – Further Improvement  Use SCHMM to generate trees?  Automatic question generation?  Others?

5 ICSI Training -Current Status  Phase I completed Within 0.5% difference from Rita ’ results Tested on transcriber ’ s meeting  47.3% WERR. (45.2% WERR when equivalence pair were considered)  Phase II completed In the development set and testing set  Results varied from 47% to 29% Clipped speech deletion found to be ineffective.

6 ICSI Training -Before we go to Phase III  From the last two phases We have some results that looks good. BUT, Results vary with meeting conditions  # of speakers?  Speaker speaking rate entropy?  Cross talk?  Understanding is more important than typing!  Plan of next month Understand why recognition results vary Complete Phase III and IV with current test sets. Obtain standard test set from NIST

7 Infrastructure (2 pages) -Workshops and Presentations  2 CVS Workshops had great discussion in the workshop Slides can be found at ArthurC ’ s web page Will re-do it in the new semester.  2 Speech Developer ’ s meetings Next meeting on this Thursday:  “ From main() to GMM computation.

8 Infrastructure -CVS  What ’ re there in CVS? MRCP source code (v1 and v2) Standard training scripts:  ICSI Conversion Scripts  Communicator Training Scripts Guarantee giving you 100% Satisfaction and 12% WERR.  WSJ 5k Training Scripts Guarantee giving you 100% Satisfaction and 8% WERR.  Outlook Need to migrate to other machines. Next: ICSI training scripts (P1 to P4) Communicator /WSJ testing scripts.

9 Decoder work (7 pages) -Interface  By Yitao (he didn ’ t even get hurt!)  Sphinx 2-like APIs ’ prototype is completed, functions completed Initialization  A demo is also built.  Will be officially included in Sphinx 3.5. Latest code already available in CVS  Plan of July Let the APIs go-through its ultimate challenge: be used in an application. Enable logging of the recognizer

10 Decoder work -Speed  With big help from Evandro  WSJ 5k task evaluation completed NVP, perplexity ~= 90 Tested under a 2G machine All results are not tuned. (very wide beam-width, no fast GMM computation)  S3 (s3flat) : WERR 6.5%, Speed 2.7xRT  S3.4 (s3fast) : WERR 6.65%, Speed 0.94xRT  Conclusion : WSJ 5k task is not our challenge.  Plan of July -> It is time to try a 20k task. (ICSI or WSJ 20k)

11 SphinxTrain work  In the current Baum-Welch trainer of SphinxTrain (v0.92) Silence is not optionally deleted in Baum-Welch Multiple pronunciations are not allowed in Baum-Welch We rely on force alignment to get the correct alignment

12 SphinxTrain 0.93 progress  Silence Modeling Optional silence deletion is now allowed Progress : Completed  Multiple Pronunciation To be Allowed in Baum-Welch Progress : nearly completed (need 2-3 days)  Correct Triphone Expansion May not have time to finish it in Q3.  Plan of July Enable multiple pronunciations in Baum-Welch Legacy is a problem! (We could fix Sphinx 4 Trainer instead.)

13 Decoder work -Adaptation  Mainly code-tracing in this part  Situation: Two versions of MLLR adaptation (Sam Joo ’ s and SphinxTrain ’ s) Some code need to be refined before we expose them S3flat has MLLR but not S3fast  Plan of this month After finish trainer job, we will tackle it.

14 Decoder work – Packaging and Distribution  Official Web page: cmusphinx.sourceforge.net/  Release Process 1, set n = 1 2, Loop  Distribute the Release Candidate n  See anyone yell in one week (calm down period)  If yes, n = n + 1, loop again.  If no, break 3, Copy the RC into Sourceforge ’ s standard distribution web site.  Current status: People yelled in RC II in the calm down period (Yitao fixed them) Create RCIII this week.

15 Decoder work -Miscellaneous  Continuous HMM for Communicator model is also completed. Ready for combination (Do we want to?) Possibly we want to combine ICSI model and CMU model.  Training script is still a big headache for use Still have no time to fix it.

16 Decoder work – Documentation (aka sphinxDoc)  Only have progress when ArthurC procrastinates and doesn ’ t want to read and play video game  Draft I of Chapter I and II are completed. Chapter I : License Agreement and user responsibility Chapter II :  What is speech recognition for dummy.  History of speech recognition  History of sphinx  Version of sphinx (When to use what)

17 Summary and Outlook  We have done something in June  We better do more in next 3 months.  Priorities – We have to deal with “ CALO Grand Challenge ” Recorder/Classifier/Recognizer Integration Improvement of Acoustic/Language Modeling Speaker Adaptation  Non-completed tasks always on the list and will pop up in the right time.


Download ppt "CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University."

Similar presentations


Ads by Google