Download presentation
Presentation is loading. Please wait.
1
CALO Recorder/Decoder Progress Report for Summer 2004 (July and August) Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder) Ziad Al Bawab (Recorder/End-pointer) Rong Zhang (ICSI Training) Arthur Chan (Recorder/End-Pointer/Decoder/Trainer) Carnegie Mellon University Aug 30, 2004
2
Summer Highlight This presentation (15 pages) Review of June and Highlight (2 pages) Recorder (3 pages) Decoder (6 pages) ICSI Training (1 page) Trainer (1 page) Documentation (1 page) Conclusion (1 page)
3
Review of June Three goals we set in June 1, Recorder/Classifier/Decoder Integration 2, Further Improvement of ICSI Training 3, Speaker Adaptation Summer highlight We solve 2 (1+0.5 + 0.5) out of the 3 problems Plus more
4
Problems we faced in the Summer Summer is a nice season Many of us had vacation/left Alex : Went to Spain in last three weeks of July Jason : Left and went to Texas ThomasQ, Yash, Moss : Internship in other states Ziad : Back to Lebanon from Aug 1 – Aug 21 Mock : Back to Thailand from Aug 1 – Aug 15 (Evandro) : Went to vacation from Aug 1 – Aug 15 Arthur : broke down from Aug 12 – Aug 22 Lack of man power were a big problem.
5
Recorder (Integration) Ziad/Yitao/Arthur Recorder + Classifier + Decoder Code Integration is completed Classifier and end-pointer are now modularized and incorporated to CALO Recorder. “FSM” of end-pointer is now implemented Classifier + Decoder had a hard-time Trapped by feature mismatch Now fixed. Yitao also separate classifier and decoder into separate thread. Outlook: Before code check-in, we may need to fix speed-up problems (Our weakness) 3 components are closely coupled
6
Recorder (Portability) By Jason/Arthur We are not yet “CP” In Windows, cygwin, linux and Mac OSX, our codebase in CVS compiled linked It now works in the following platforms: Windows -Fully functioning with extra functions specific to Windows Cygwin -Small problems in GUI, NTP works now MacOSX -Fully functioning, just need to fix some memory leaks and invalid memory read/write In Linux AD97 chipset still confuse Portaudio library
7
Recorder Outlook in Q4 What should we do? Linux : Focus on Linux’s Port Fix portaudio problem Fix offline classifier Barely able to support more feature requests without Thomas. We need to implement switch for processing routines. Reducing the boundary of release and development After we fix the portability problem, it’s time to move to SRI’s CVS. Memory management can be an issue Need to scan it using memory checking tools
8
Decoder (Live Mode APIs) More robust than Jun Fixed couple of memory problems Now going through in-depth code review Documented and commented An advantage for our partner.
9
Decoder (Speed) We finally have a s3.x setup for ICSI A quick hack without careful tuning 0.6xRT in a 2G machine with relative 20% degradation (from 69% -> 63%) Outlook: become important Q4’s goals again
10
Decoder (Speaker Adaptation) Single regression class MLLR is now fully supported Produce exactly the same result as Sam- Joo’s package Lack of test cases for now Outlook: In Q4, we need to Test the current package with more test cases. If time allows, enable multiple regression class and MAP.
11
Decoder (s3.0/s3.x code merging) align, astar, allphone, dag, decode-anytopo are now in s3.5 codebase Thanks to Carl Quillen Merging is 80% completed, Code compiled, linked and ran. align and allphone are fixed. There are still small difference because there are small difference between s3.0 and s3.x astar/dag/decode-anytopo in progress. 12k lines of code are saved from s3.0 + s3.2 (63k) to s3.5 (51k) Only slight increase in the package size 0.3 M to 0.5 M
12
Decoder (s3.0/s3.x code merging) (cont.) Consequence of merging, it will be possible to use 3.x to Generate alignment Generate n-best Do phoneme recognition Search best path in the lattice Do flat lexicon search. Interface is also available reading N-best. Not exposed yet. Outlook : More code merging activities will happened in next two quarters.
13
Decoder (Release) We need to provide our partners a recognizer With state of the art technology high performance Sphinx 3.5 will be released at the beginning of September Still need work on Write two more chapters of documentation Polish live-mode APIs Some small code clean-ups Will also announce corresponding tag for SphinxTrain. A simultaneous release of s3.5 + ST
14
ICSI Training (Phase III) By Rong Phase I and II had been completed in May and June. Now in Phase III: Tuning We already tuned the parameters such as # of senone and # of mixture. Ziad and Arthur are too busy in Summer Outlook: an area which was under-worked in Summer. Need to do more in Q4.
15
Trainer (Clean-up) Unification of the front-end Sphinx 2/ Sphinx 3/SphinxTrain Thanks to Evandro No need to worry about code-level mismatch Unification of command-line interface 36 out of 37 tools now have standard command- line interface. All support options –example and -help Appendix A.2 of Hieroglyph A 94 pages comprehensive and formatted documentation can now be found on-line
16
Documentation Project Hieroglyph An effort to build a set of comprehensive documentation using Sphinx, SphinxTrain and CMU LM Toolkit to build speech application In Summer 1 st Draft of “Speaker Adaptation” (Chapter 9) is completed 1 st Draft of “SphinxTrain command line reference” (Chapter A.2) is completed. 2 nd Draft of “License of Sphinx” is completed. All can be found in www.cs.cmu.edu/~archan/sphinxDoc.html www.cs.cmu.edu/~archan/sphinxDoc.html
17
Conclusion We have done something in the Summer But with great pain We need to put more stress on some weak areas in Q4 Outlook in September and Q4 September : ICASSP 2005 and ICSLP 2004 preparation October : Polish Speaker Adaptation November : Complete dynamic LM addition/deletion December : Search refinement, further speed- up.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.