CALO Decoder Progress Report for March Arthur (Decoder and ICSI Training) Jahanzeb (Decoder) Ziad (ICSI Training) Moss (ICSI Training) Carnegie Mellon University Apr 13, 2004
This Presentation Progress report for March In February Batch mode recognizer completed Live-mode recognizer didn ’ t work In March More decoder work Speed, Accuracy, Interface. ICSI transcription conversion task Resources, Conversion Scripts Miscellaneous efforts in improving the decoder Contact with other groups, web page(s), manual.
Decoder work (Speed) By Arthur and Jahanzeb Sphinx 3.4 starts to work reasonably in Communicator task 1G: 1.1xRT, 2G: 0.48xRT Phoneme look-ahead research completed 15-20% gain when CIGMMS applied Will incorporate as a functionality Outlook of April Machine Optimization (Still there!) WSJ evaluation Technical report version of the results publishing.
Decoder work (Accuracy) First comparison between s2 and s3.4 S3.0 ~ S2 > S3.3 > S3.4 Not the fairest comparison S3 model is trained by female speakers only S3 model is less tuned Outlook of April Learn how to do training. Do a fairer comparison. Change search structure.
Decoder work (Interface) Live-mode decoder works Live-mode recognizer interface is still poorer than S2 No config file yet. Many users complained (Well, actually 2-3 of them) Outlook of April Focus on building better API-interface and command-line interface. Jahanzeb will be there while Arthur is working on training.
ICSI Training Transcription Conversion Task By Moss, Ziad and Arthur Completion of Resource mapping (100%) OOV (~20%) Conversion script (90%)
ICSI Transcription: How does it look like? three six two four three zero seven
XML tags conversion Transcription is more detail than necessary. Current Treatment: : Ignore whole sentence. Too many occurrences, too many varieties.. : Ignore. : Replace by ++GARBAGE++ : Ignore whole sentence. Too few occurrence. Don ’ t want to care : Replace by ++GARBAGE++ & : Use mapping.
Plain-text Normalization After XML Conversion “ I – I am no-, I mean C-zero ” ‘ - ’ can mean “ - ” : Interruption/Interjection marks “ -XXX ” or “ XXX- ” : Broken words “ XXX-XXX ” : hyphenated words AM transcription Get rid all pronunciations and leave broken words alone LM transcription Interruption marks and broken words will be removed (Optional) Leave interruption marks there.
XML conversion script Functionalities Optional conversion Resource (dict/mapping/rules) read-in XML parser Generate both transcription and control file for close-talking microphones Generate both LM and AM transcription TODO: Incorporate Ziad ’ s script Correct timing information Generation of far-field channels Fix small bugs.
Outlook of ICSI training task in April Complete OOVs transcription (Arthur, Moss and Ziad) Fix bugs in conversion script (Arthur Learn AM training (Ziad and Arthur) LM training (Moss) Fix potential problems in SphinxTrain.
Miscellaneous (Contact with other group) Want to seek a better interface for Sphinx Try to contact other groups to see what ’ s up XVoice-sphinx, “ command-and-control ” application that tried to use Sphinx. Actually it does dictation. Not very happy with Sphinx after Sphinx ’ s default AM and LM in command-and-control OSSRI No clear goal yet Start to gather funding. Don ’ t really like Sphinx because “ Sphinx is poorer than ViaVoice in C&C ”
We need to help them more …… We need better …… Release (to replace s3.3) After WSJ evaluation, S3.4 will officially released to replace the current S3.3 Sphinx web page (also CMU web page) Sphinx ’ s web page need to have a more unified theme. Task force will be gathered after ICSLP Manual Need to provide basic education to developers and “ hard-core ” hackers. wrote the first outline of the manual. 1st draft will appear in a quarter time-frame.
Summary Still need to build good model for ICSI first. (Arthur/Ziad/Moss) Training is also critical to understand why s2> s3.3. Better everything for the decoder Arthur/Jahanzeb -> 50/50 Others : always on my “ priority queue ”, will pop up at the right time.