3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006.

Slides:



Advertisements
Similar presentations
Microsoft ® Office Outlook ® 2007 Training Retrieve, back up, or share messages Sweetwater ISD presents:
Advertisements

Test process essentials Riitta Viitamäki,
Business logic for annotation workflow Tom Oldfield July 21, 2010.
Design Constraints By: Tuan Ha Cohort: MCIS 21 Class: MISS 470.
CALO Decoder Progress Report for March Arthur (Decoder and ICSI Training) Jahanzeb (Decoder) Ziad (ICSI Training) Moss (ICSI Training) Carnegie Mellon.
SOFTWARE MAINTENANCE 24 March 2013 William W. McMillan.
MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.
MP IP Strategy Stateye-GUI Provided by Edotronik Munich, May 05, 2006.
Brief Overview of Different Versions of Sphinx Arthur Chan.
Progress of Sphinx 3.X From X=5 to X=6 Arthur Chan Evandro Gouvea David J. Huggins-Daines Alex I. Rudnicky Mosur Ravishankar Yitao Sun.
CALO Recorder/Decoder Progress Report for Summer 2004 (July and August) Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder)
Decision Tree Rong Jin. Determine Milage Per Gallon.
2 nd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jun 7, 2005.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
From Main() to the search routine in Sphinx 3 (s3accurate) Arthur Chan July 8, 2004.
Project Report1 Dave Inman Project report. Project Report2 Ways to write a report Top down: Write the structure of the report (maybe use the web templates.
Progress of Sphinx 3.X, From X=4 to X=5 By Arthur Chan Evandro Gouvea Yitao Sun David Huggins-Daines Jahanzeb Sherwani.
Technical Aspects of the CALO Recorder By Satanjeev Banerjee Thomas Quisel Jason Cohen Arthur Chan Yitao Sun David Huggins-Daines Alex Rudnicky.
Sphinx 3.4 Development Progress Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 4, 2004.
CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University.
Sphinx 3.4 Development Progress Report in February Arthur Chan, Jahanzeb Sherwani Carnegie Mellon University Mar 1, 2004.
Department of Computer Science 1 CSS 496 Business Process Re-engineering for BS(CS)
By Davis Hsu. Why is it important to have the Emergency Document automatically updated? The Emergency Document provides the important documentation on.
1 Introducing Windows Backup There are different methods for starting Windows 2000 Backup. Requirements for running Windows 2000 Backup All users can back.
Overview of Mini-Edit and other Tools Access DB Oracle DB You Need to Send Entries From Your Std To the Registry You Need to Get Back Updated Entries From.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
9/10/20151 Hyperion Enterprise 6.5 New Features & Functionality Robert Cybulski, CPA Finit Solutions.
MCTS Guide to Microsoft Windows 7
1 CSC 427: Data Structures and Algorithm Analysis Fall 2011 See online syllabus (also available through BlueLine): Course goals:
Designing For Testability. Incorporate design features that facilitate testing Include features to: –Support test automation at all levels (unit, integration,
Temple University Goals : 1.Down sample 20 khz TIDigits data to 16 khz. 2. Use Down sample data run regression test and Compare results posted in Sphinx-4.
Max Planck Institute for Psycholinguistics Tool development report H. Brugman MPI Nijmegen.
Nightly Releases and Testing Alexander Undrus Atlas SW week, May
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.
Question of the Day  On a game show you’re given the choice of three doors: Behind one door is a car; behind the others, goats. After you pick a door,
Microsoft ® Business Solutions–Navision ® 4.0 Development II - C/SIDE Solution Development Day 5.
Copyright © 2015 – Curt Hill Version Control Systems Why use? What systems? What functions?
CSE332: Data Abstractions Lecture 8: Memory Hierarchy Tyler Robison Summer
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Fall 2004EE 3563 Digital Systems Design EE 3563 VHSIC Hardware Description Language  Required Reading: –These Slides –VHDL Tutorial  Very High Speed.
Slide 1 Project 1 Task 2 T&N3311 PJ1 Information & Communications Technology HD in Telecommunications and Networking Task 2 Briefing The Design of a Computer.
This document gives one example of how one might be able to “fix” a meteorological file, if one finds that there may be problems with the file. There are.
Lecture 11 Page 1 CS 111 Online Virtual Memory A generalization of what demand paging allows A form of memory where the system provides a useful abstraction.
CS5103 Software Engineering Lecture 02 More on Software Process Models.
TB1: Data analysis Antonio Bulgheroni on behalf of the TB24 team.
M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young MACH: A New Kernel Foundation for UNIX Development Presenter: Wei-Lwun.
M1G Introduction to Programming 2 3. Creating Classes: Room and Item.
MICE CM28 Oct 2010Jean-Sebastien GraulichSlide 1 Detector DAQ o Achievements Since CM27 o DAQ Upgrade o CAM/DAQ integration o Online Software o Trigger.
Bernhard Pieber Software Generation | StyledTextEditor Rich Text Editing for Cuis Idea & Funding by Bernhard Pieber Implementation by Juan Vuletich.
Fall 2014 (both sections) Assignment #1 Feedback Elapsed time (How long): –Ranged from 45 minutes to 10 days –About 1/4 said less than 1 day –About 1/2.
Software Quality Assurance and Testing Fazal Rehman Shamil.
Version Control and SVN ECE 297. Why Do We Need Version Control?
Observing the Current System Benefits Can see how the system actually works in practice Can ask people to explain what they are doing – to gain a clear.
Chapter – 8 Software Tools.
Efficiently Solving Computer Programming Problems Doncho Minkov Telerik Corporation Technical Trainer.
Software testing techniques Software testing techniques REGRESSION TESTING Presentation on the seminar Kaunas University of Technology.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
AliRoot survey: Calibration P.Hristov 11/06/2013.
Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir. A.C. Verschueren Eindhoven University of Technology Section of Digital.
Chapter 18 Maintaining Information Systems
An overview of decoding techniques for LVCSR
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st )
CALO Decoder Progress Report for April/May
Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems
Progress Report of Sphinx in Q (Sep 1st to Dec 30th)
Sphinx Recognizer Progress Q2 2004
CSE 303 Concepts and Tools for Software Development
Lecture 06:Software Maintenance
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
Presentation transcript:

3 rd Progress Meeting For Sphinx 3.6 Development Arthur Chan, David Huggins-Daines, Yitao Sun Carnegie Mellon University Jan 25, 2006

This meeting 3 rd Progress report on 3.6 development (40 pages) Agenda What happened in Fall 2005? (4 slides) Progress of Sphinx Development in Fall 2005 (17 slides) Summary of Progress in 2005 (10 slides) Discussion: Should we create one release candidate? (1 slide)

What happened in FALL 2005?

What happened in Fall 2005? Major Events in Sphinx Development We participate GALE in Oct 2006 Conformance of the recognizers (sphinx 3 and sphinx 4) become an issue Lack of advanced acoustic modeling techniques become very glaring Sphinx 3 and 4 have gone through bug fixes. CALO effort are now split to two Off-line recognizer: require major improvement in LM and AM. AM Issue is shared with GALE On-line recognizer (CALO jargon: Smartnote) Now have new LM and AM Require significant development work

Time distribution (Estimated) Arthur 50% on GALE, 20% on CALO, 30% on Sphinx Dave 65% CALO, 30% on PocketSphinx, 5% on Sphinx Yitao 90% CALO, 10% on Sphinx

The Two Funded Projects Upside: They point to issues that need to be solved Need significant reprioritization of tasks Balance of effort on the 2 projects is now achieved Downside: Code development of Sphinx becomes a slower process Also, we haven’t released s3 for a while => Should we release the code now? Tired students and staffs can be found everywhere

Progress of Sphinx 3.6 in FALL 2005

Overview Work on second-stage Merging of bestpath search in the 2-nd stage of tree search IBM lattice generation word confidence estimation Behavior changes and bug fixes Treatment of acoustic scores Assertion in vithist.c Attempts in search algorithm improvements Mode 3 – Flat lexicon decoding Mode 4 – Tree lexicon decoding Sphinx on Mandarin and coded language. New tools: conf, dp

Work Schedule Sep 1 to Oct 1: Implementation of triphones in flat lexicon decoder Oct 1 to Nov 1: Implementation of triphones on tree lexicon decoder (incomplete) Nov 1 to Dec 8: IBM lattice generation Confidence score generation Fixed issues in scores Dec 8 to Jan 3: Concept of “vacation” was tried Jan 3 to now: Fixed bugs, prepare release.

Second-stage Processing Best-path search could now be specified in decode Implementation requires write back. (urgh.) Recognizer can now generate lattice in IBM format Word is attached at the link Sphinx format generates word attached to the node. Scores are normalized with best senone scores Rong’s confidence-based routine is now in Sphinx conf Goodies: use Sphinx logs3 routine -> significantly reduce alpha-beta scores mismatch.

Second-stage Processing (cont.) Further work Best-path generation doesn’t conform to past 3.5 -> Bugs caused by 3.6 development Also, the best path is not always in the lattice -> Legacy bug Confidence-based method Lattice-based : could only be used off-line currently 10% of the data still have alpha-beta mismatch Consensus network generation need special focus

Scores we see (Change 1) Tree search now truly generate un- normalized scores. was normalized by the ending frame only Caused by bug introduced in mid-2005 All 1-st stage search use the same score logging functions Include align, allphone, decode_anytopo, decode matchseg_write, match_write are the current versions log_* is still used but will soon be totally replaced

Scores we see(Change 2) Multi-stream GMM computation (ms_gauden) By default, it won’t quantize log pdf to 8 bits now Single-stream GMM computation Vectors with zero means and variances are removed (-remove_zero_var_gau) Scores and performance will change Testing resource has changed. (Evandro grins at this point)

Scores we see (Change 3) Sphinx now supports generation of different hypseg format (-hypseg_fmt) SPHINX 2-format SPHINX 3-format ctm format Always require more processing, but it is better than nothing.

Scores – a summary Unnormalized (true) acoustic and language scores generated by (-hypsegscore_unscale) 1-st stage search and Best path search right after the 1-st stage Normalized acoustic score would be generated by Lattice generation If developers wants to have true scores in lattice Developers could get the best scores from the decoder (– bestsenscrdir) and do their own processing

Other important bug fixes Bug in vithist.c Caused assertion and stop the recognizer Now fix and will return error message to the search abstraction routine.

Attempts in search algorithm improvements (Mode 3) Flat-lexicon decoder Search implementation is completed decode could now use flat-lexicon decoding -op_mode 3 Decoders revamping is completed Mode 2 (FST) Mode 3 (Flat-lexicon) Mode 4 (Ravi’s Tree-Lexicon) Mode 5 (Arthur’s Tree-Lexicon) decode_anytopo is still there for backward compatibility purpose decode_anytopo = decode in mode 3

No Further Re-factoring Avoid re-factoring before next check-in Align and allphone have different input/output file formats It doesn’t make sense to stuff into a single executable. Using XML configuration and control file will be a choice But it takes too much time to implement

Algorithmic Work - Flat Lexicon Decoder Full triphone completed in flat-lexicon decoding 2.5% relative improvement in accuracy But requires 100xRT (urgh) Useful for debugging Also considered full trigram implementation Will results in another 5-10 times slow down Conclusion Flat lexicon search has come to its limit

Algorithmic Work - Tree Lexicon Decoder Current full triphone implementation Has flaws in score propagation Tree copies  No time to do it at all, Q4’s workload nearly kill AC Benchmarking results GALE results: Full Lexicon = Tree Lexicon CALO/Communicator results: Tree Lexicon 5% relative poorer. Conclusion Half a year on search is expected to give us another 5%

Conclusion on Search Need to seriously consider Is working on search a good idea? In both CALO/GALE, gain come from SAT and cross adaptation Second-stage processing Confusion network Confidence annotation First-stage SD -> Second-stage SA VTLN also only give 5% rel but it only takes 5 days to implement

Sphinx on Different Text Encodings There are already non-CMU work for Spanish French Big question mark Could it work on other encoding?

Sphinx on Mandarin (gb2312)

Sphinx on Mandarin (cont.) Thanks to Ravi Bugs we fixed to get it through : libutil\str2words special character buglibutil\str2words special character bug : special character wasn't supportedspecial character wasn't supported This should give us fairly good foundation to start on most language

Summary of Sphinx in Fall 2005 We have done something Strong focus in search research doesn’t seem to get us far. Fire to fight on the modeling side Sounds like the time to check in and move on

Progress of Sphinx 3.X (From X=5 to X=6)

New Features (4 slides) Items that are significant Gentle, mild and simple re-factoring and its consequence (4 slides) Documentation (1 slide) Regression testing (1 slide) Pruned Features ?

New Features (Search) Speed Further enhancement of CIGMMS BBI tree implementation (by Dave, in SphinxTrain) Search FST search Full triphone implementation in decode_anytopo Separation of search abstraction/implementation in 3.X

New Features (Adaptation) Adaptation Multiple classes for MLLR (by Dave) MAP adaptation (by Dave, in SphinxTrain)

New Features (Others) New executables lm_convert lm3g2dmp++ dp If Evandro ask, “Why do we need dp in sphinx 3?” Say this, “I don’t know, we found the executable at./s3/src/misc/dp.c” conf Off-line word-level confidence annotation program Mismatch dict-LM Un-match entries could be automatically generated (-lts_mismatch)

Gentle, mild and simple re- factoring (GMM computation) GMM computation is now shared among decode, decode_anytopo, align, allphone So e.g. decode_anytopo could use fast GMM computation decode could use SCHMM

Gentle, mild and simple re- factoring (Search) Its consequence in search programming: FST, Flat, Tree search now share the same interface (decode) Just like Sphinx 2 and 4 Writing a new search won’t be replacing a search 2-nd stage now works for decode Alright, not for FST search

Gentle, mild and simple re- factoring (Others) Scores output now rationalized Several bug fixes causing seg faults are eliminated Vithist.c bugs Class-based LM is now working correctly Command-line among applications are now synchronized and re-factored

Documentation/Tutorial Hieroglyph Now writing 2 nd draft Doxygen documentation (by Evandro) Tutorial now works archive_s3 Sphinx 2 Sphinx 3 Sphinx 4

Regression Testing Our weakest link Now daily Standard regression test is done Performance check on Communicator/TIDIGITs/TI46 doxygen documentation will be made and tested make check now has 50 tests (3.5: 11) fairly robust to careless mistakes

Expected Trimmed Features Search Mode 0: alignment (?) Mode 1: allphone Mode 5: word tree copies If full triphone in Ravi’s tree search couldn’t be quickly, trimmed it as well (?) Yitao’s PCFG rescoring

Conclusion of Sphinx 3.X (From X=5 to X=6) We have done something Development last year has enriched the code Niceify a lot of things internal to code There are hiccups in our development Not perfect Well, compare this with NASDAQ.

Discussion: What should we do now? Option 1, keep on working without release Option 2, merge the crazy branch with the trunk without release Option 3, merge the crazy branch with the trunk and create release-candidate Sphinx 3.6 RCI

End