Development of CMU Sphinx From 2004 to 2006 Jul An Observer’s Perspective Arthur Chan Evandro Gouvea David Huggins-Daines Mosur Ravishankar Alex Rudnicky.

Slides:



Advertisements
Similar presentations
How To Become a Fluent Reader
Advertisements

Gallup Q12 Definitions Notes to Managers
In Search of Excellence:
CALO Decoder Progress Report for March Arthur (Decoder and ICSI Training) Jahanzeb (Decoder) Ziad (ICSI Training) Moss (ICSI Training) Carnegie Mellon.
P REWRITING Step 1 of the writing process. T HE PURPOSE OF PREWRITING Prewriting is the idea stage of the writing process. When you prewrite, you should.
The Perfect Job Written and Presented by: Seikou Triangle.
Non-Coding Activities a Development Team Needs a.k.a ”I don’t code, am I no longer useful?” Maaret Pyhäjärvi| | Twitter: maaretp Test Granlund.
Agile development By Sam Chamberlain. First a bit of history..
Progress of Sphinx 3.X From X=5 to X=6 Arthur Chan Evandro Gouvea David J. Huggins-Daines Alex I. Rudnicky Mosur Ravishankar Yitao Sun.
CALO Recorder/Decoder Progress Report for Summer 2004 (July and August) Yitao Sun (Recorder/Decoder) Jason Cohen (Recorder/End-pointer) Thomas Quisel (Recorder)
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
Software Development The Good, The Bad (and Ugly) David Kaminsky Dave Ogle.
Progress of Sphinx 3.X, From X=4 to X=5 By Arthur Chan Evandro Gouvea Yitao Sun David Huggins-Daines Jahanzeb Sherwani.
CS CS 5150 Software Engineering Lecture 20 Acceptance and Delivery.
Technical Aspects of the CALO Recorder By Satanjeev Banerjee Thomas Quisel Jason Cohen Arthur Chan Yitao Sun David Huggins-Daines Alex Rudnicky.
Programming Introduction November 9 Unit 7. What is Programming? Besides being a huge industry? Programming is the process used to write computer programs.
CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University.
Software Development Methods And Some Other Stuff.
Agile Testing with Testing Anywhere The road to automation need not be long.
Programming. Software is made by programmers Computers need all kinds of software, from operating systems to applications People learn how to tell the.
Marketing CH. 4 Notes.
Version Control with git. Version Control Version control is a system that records changes to a file or set of files over time so that you can recall.
15 Powerful Habits Make You The Winner!!!.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
Thinking Actively in a Social Context T A S C.
Computers & Employment By Andrew Attard and Stephen Calleja.
Software Testing Life Cycle
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Programs That Succeed “Building Student Leadership Teams” The Key to Building Ownership in the Classroom John Chevalier CTE Instructor / Apple Certified.
Introduction to Programming ICS2O Findlay. Learning Goals  We will learn  The definitions of a computer, program and programming language.  The different.
Chloe Miles IMPROVING PRODUCTIVITY USING IT. Menu Using Word Advantages Disadvantages Conclusion E-Safety Social Media Dangers of Social Media Sites Staying.
PAPER PRESENTATION: EMPIRICAL ASSESSMENT OF MDE IN INDUSTRY Erik Wang CAS 703.
1 The Instant Data Warehouse Released 15/01/ Hello and Welcome!! Today I am very pleased to announce the release of the 'Instant Data Warehouse'.
Software Development Software Testing. Testing Definitions There are many tests going under various names. The following is a general list to get a feel.
Extreme/Agile Programming Prabhaker Mateti. ACK These slides are collected from many authors along with a few of mine. Many thanks to all these authors.
Using Commtap Communication Targets and Activities Project.
Moving into Implementation SYSTEMS ANALYSIS AND DESIGN, 6 TH EDITION DENNIS, WIXOM, AND ROTH © 2015 JOHN WILEY & SONS. ALL RIGHTS RESERVED.Roberta M. Roth.
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.
牛津版 高中一年级 模块 5 Unit 1. Period 3 Reading: Language focus.
From Quality Control to Quality Assurance…and Beyond Alan Page Microsoft.
Chapter 22 Developer testing Peter J. Lane. Testing can be difficult for developers to follow  Testing’s goal runs counter to the goals of the other.
Intermediate 2 Software Development Process. Software You should already know that any computer system is made up of hardware and software. The term hardware.
1 Design and Integration: Part 2. 2 Plus Delta Feedback Reading and lecture repeat Ambiguous questions on quizzes Attendance quizzes Boring white lecture.
ECE450 - Software Engineering II1 ECE450 – Software Engineering II Today: Introduction to Software Architecture.
SOCRATIC SEMINAR GENERATING INSIGHTFUL STATEMENTS AND QUESTIONS.
Chapter 6 CASE Tools Software Engineering Chapter 6-- CASE TOOLS
Product Management Or.. The most important thing most startups forget to do.
CS5103 Software Engineering Lecture 02 More on Software Process Models.
By: Mrs. Abdallah. The way we taught students in the past simply does not prepare them for the higher demands of college and careers today and in the.
Innovation. interaction. inspiration. Leveraging Middleware Jeff Wofford Deep Red Games
Chapter 10 Information Systems Development. Learning Objectives Upon successful completion of this chapter, you will be able to: Explain the overall process.
© 2014 International Technology and Engineering Educators Association STEM  Center for Teaching and Learning™ Game Art and Design Unit 2 Lesson 1 Skills.
Adventist Health Employee Engagement and Unleashing Potential Brian Brim, Ed.D., Principal, The Gallup Organization.
Module 6: Configuring User Environments Using Group Policies.
Version Control and SVN ECE 297. Why Do We Need Version Control?
CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.
Getting ready. Why C? Design Features – Efficiency (C programs tend to be compact and to run quickly.) – Portability (C programs written on one system.
Top 10 Interview Questions & Answers
1 Multimedia Development Team. 2 To discuss phases of MM production team members Multimedia I.
Language Learning for Busy People These documents are private and confidential. Please do not distribute.. Intermediate: I Disagree.
T Iteration Demo LicenseChecker I2 Iteration
INTRODUCTION CSE 470 : Software Engineering. Goals of Software Engineering To produce software that is absolutely correct. To produce software with minimum.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Progress Report of Sphinx in Summer 2004 (July 1st to Aug 31st )
CALO Decoder Progress Report for April/May
Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems
Sphinx Recognizer Progress Q2 2004
CSE 303 Concepts and Tools for Software Development
Programming.
Applied Software Project Management
Presentation transcript:

Development of CMU Sphinx From 2004 to 2006 Jul An Observer’s Perspective Arthur Chan Evandro Gouvea David Huggins-Daines Mosur Ravishankar Alex Rudnicky Yitao Sun

What is the role of Software in Speech Recognition? The Main Theme for Today:

[For the off-line viewer] [This is Arthur Chan’s conclusion: Joint consideration of 3 software components is crucial.] Read on, you’ll see his argument.

Perspective Mainly Arthur Chan’s observation –Two Roles As a developer –“The Grand Janitor” As an observer of events –A historian.

What is CMU Sphinx? Definition 1 : –Large vocabulary speech recognizers with high accuracy and speed performance. Definition 2 : –A collection of tools and resources that enables developers/researchers to build successful speech recognition systems

Family of CMU Sphinx Decoders –Sphinx {II – IV} –PocketSphinx (by Dave since Oct 2005) Acoustic Model Trainer –SphinxTrain Language Model Trainer (new) Documentation –Hieroglyphs –Robust/SphinxTrain Tutorial

The Sphinx Developers Sphinx is maintained by –Volunteer programmers/researchers who like speech recognition –All contribution go to the same codebase –Goal : Sustainable development of Sphinx Sphinx Developer Meetings are held –Regularly (as in an aperiodic function) –Secretly (in the sense that everyone knows) –to decide the way to go in Sphinx

Outline (~30 pages) Software of Speech Recognition –How should we develop? What should a comprehensive software do? CMU Sphinx, Before/After Lessons Learned (Optional) Team and Structure.

Software of Speech Recognition Systems

The Old Black Box Speech Recognizer Acoustic Signal Word Sequence Legend: The Black Box

What It Means to Software Philosophy behind the old black box “When you don’t know, search.” The old Black Box: –Strongly focus on the decoder –Tend to ignore other important components (e.g. models)

The Noisy Channel Point of View Decoder

What it means to software We need to represent and estimate parameters of the acoustic model We need to represent and estimate parameters of the language model Given the models, we need to search through all possible word sequences. Or decoding

A New Black Box Speech Recognizer Acoustic Signal Word Sequence Acoustic Model Language Model AM TrainerLM Trainer

The New Black Box Philosophy of the New Black Box “When you don’t know, search with your knowledge.” Advantages of the New Black Box –Programmers tend to consider the problems jointly –Reduce communication issues between modules owner

The New Black Box vs The Old Black Box The Old Black Box –Narrow our ways to think of the problem –Motivates solely research on search algorithms The New Black Box –Doesn’t ignore the fact that search is important –But give correct emphasis on all the necessary components

Current CMU Sphinx thinks The New Black Box

Before : CMU Sphinx (2004 Jan)

Sphinx and Friends (2004 Jan) Sphinx Siblings Acoustic Signal Word Sequence Acoustic Model Language Model SphinxTrain CMU-C LM Toolkit V2

Issues at the time Sleeping Decoders: (Sphinx Siblings) –Strength: Comprehensive product line –Issues: Decoders came with many versions, code tends to duplicate Sphinx 2 -> fast but not accurate Sphinx 3.0 -> very accurate but very slow Sphinx 3.3 -> accurate, faster than sphinx 3.0 but slower than 1xRT Sphinx 4 not yet completed

Issues at the time (cont.) AM Trainer (SphinxTrain) –Strength: it works –Weakness: what we supported was simple Where is speaker adaptation? LM Trainer (CMU-Cambridge LM Toolkit V2) –Strength: it works –Weakness: software was sleeping -> development has stopped Important functionalities weren’t in the package: e.g. LM Interpolation

General Comments at the time: “Sphinx cannot do feature Y.” “You have no ideas what you are up to.” “No one is working on Sphinx any more.” “Our job is not difficult but very challenging” –Prof. Alex Rudnicky “Sphinx is cursed.” (I made this one up. ) “The riddle of Sphinx couldn’t be solved” –Made up by Arthur Toth in SphinxLunch

After : CMU Sphinx (now)

Sphinx and Friends (now) Sphinx Brothers Acoustic Signal Word Sequence Acoustic Model Language Model SphinxTrain Debugged CMU-C LM Toolkit V3 alpha

Sphinx Brothers now Sphinx 2 –Could now use CDHMM –Could now use FST Sphinx 3.X (gimmicky name of Sphinx 3) –Could run faster if there are magic tuning string –Merging of Sphinx 3.0 and Sphinx 3.3 –Support speaker adaptation –Re-architected

Sphinx Brothers now (cont.) Sphinx 4 –With great effort of Sun Developers and mainly super speech advisors –Beta completed –Quite popular with users and new startups PocketSphinx (by Dave) –Newly added member of the family –First open source embedded LVCSR

L Project L Project L : Project Ladon: Goal: Extensions and Re-development of CMU-Cambridge LM Toolkit V2 Final product: CMU-Cambridge LM Toolkit Version 3 (alpha)

Story of V3: 3 “Young” Persons and their Inspiring Stories Young StudentYoung Student - write the perl script –Utterly frustrated by training LM, decide to write a set of new perl script Young FacultyYoung Faculty - convince us to license the code in BSD –Wanted to see LM toolkit to be BSD again but has no time. Young StaffYoung Staff – add 32 bit LM support –Had nothing to do on the flight back to HK. –Want to do something he thought was useless.

Function of V3 alpha Support more than 65k words (32 bit LM) Perl wrapper by Young Student –One step LM training –Simplified process of LM interpolation and Class- based LM training New functionalities –LM interpolation (lm_combine) (by Wen, Moss, Dave) –Random text generation in 3-gram (by Arthur Toth) –Modified Kneser-Ney smoothing (by Prof. Yannick Estève from LIUM)

Blessing for this change Support by the license Permissions from all copyright owners –Prof. Rosenfeld (also make decision on licensing issue for CMU) –Dr. Robinson (also make decision on licensing issue for Cambridge) –Dr. Clarkson –Blessing mails sent to public mailing list V3 will be re-licensed under BSD

SphinxTrain now Now support speaker adaptation –MLLR, –MAP, –VTLN Fixed many bugs –Still have many to go Integrated to the tutorials. NR code finally removed, we could distribute it now.

Technology explored in last few years Search / GMM Computation Speaker Adaptation and Normalization Embedded Speech Recognition AM Training LM Training

Future Opportunities - Think the Three Modules Together Technology –N-gram (N>3) (LMtk + SX) –On-line adaptation (SX + ST) –On-line training (SX + ST) Software –Integrated package with comprehensive support on SR (SX + ST + LMtk) –Dictation (SX + ST + LMtk)

Before/After, the difference Spent more time to secure training (both AM and LM) Architecture has been re-thought within module and across modules. Our food-chain is secured in the repository –AM, LM and Decoder’s code are under one code- base (cmusphinx)

Some Good Signs Sourceforge’s Project of the Month ( March 2006) Start to be decently competitive again Someone used our decoder(s) and they look happy –Users actually say “Thank You”. Some companies used our recognizer –(Some of them dare to make profits.)

Some Observations We still need to catch up in accuracy. –Mainly on better algorithmic support on domain specific development Some Observations –Today’s 10xRT system becomes 5 years later 1xRT system –Today’s most accurate system becomes BL of next years most accurate system Now seems to be just another starting point.

Conclusion on Our Technology CMU Sphinx = Open Source SR in BSD

Lessons Learned

Lesson 1 Anyone who tries to solve a legacy problem becomes a legacy problem –Corollary 1: Many legacy decision could actually be clever –Corollary 2: Not every change is good

Lesson 2 : on Research Most of WER decline comes from better acoustic model and language model –Corollary 1: Actually the trainers are the key piece of development. –Corollary 2: We should now focus on 1) acoustic segmenter, 2), speaker adaptation and 3) discriminative training.

Lesson 3: on Development Why some of our code never go into Sphinx? –Code without source controls is close to useless –Corollary 1: If you want your code to survive, check in. –Corollary 2: If you don’t know what is source control, you probably need to learn it.

Lesson 4: My Favorite, the current Sphinx Moto “Never Over/Under-estimate yourself, you never know what kind of mess you could make.” –Dr. Evandro Gouvêa

Acknowledgement – Current Team ? ArthurDavidEvandroYitao

Hiring: The Grand Janitor 2 nd – Mixture of Several Jobs. Release Manager - Kick other people to fix various things Speech Scientist – Tell users to give up when they randomly read some useless papers. System Architect - Rewrite the code in many different ways but do the same thing Mediator of Conflicts - Write pseudo-philosophical comments Core Developer - Write crappy code and occasionally debug them Advisor – Do what Dr. Phil does on your friends, your users and most importantly, your boss(es) and ex- bosses

Acknowledgement – Advisors ? AlexRichAlanRavi

Acknowledement – CMU- Cambridge LM Toolkit Contributors: –David Huggins-Daines, –Ananlada Chotimongkol, –Arthur Toth, –Xu Wen –Prof. Yannick Esteve in LIUM.

Discussion

Thanks

Backup

The Organization of the team

How does it work? The Wrong Model 1, A leader yell: “Sphinx Team Assemble!!” 2, The team then assemble and follow commands of the leader. 3, Things get done. 4, Once again Sphinx Team has saved the day!

How does it really work? 1-3/10 steps 1, Someone in the team dream up with a new feature. 2, He communicate with the team: –“What do you guys think?” 3, Developers start to give their “two cents” on the problem, e.g. –Arthur: “According to Harry G. Frankfurt, what you talk about is B.S.” –Evandro: “Don’t underestimate yourself, you don’t know what kind of mess you will make.” –Dave: “That doesn’t sound like the best idea……”

The guy doesn’t give up and others give OK (4-6/10 steps) 4, He go on to implement the code. 5, Check the code in. 6, Peer review happens right after codes check-in, example comments: –Arthur: “That is not the right balance according to Yin and Yang.” –Evandro: “I wonder whether you know C programming.” –Dave: “What is the rationale behind your change?” –Yitao: “*Sigh*, I need to recompile Speechalyzer and Smartnote again.”

Automatic Tests (the final tests) 7, Run make check –Make sure there is no FAIL in testing –Require pasing 70 to 80 tests. 8, Standard regression tests (make perf-std) –Running tests on 3 corpora and make sure the results are matched the past 9, Machines automated both 7 and 8 –mails sent to everyone daily 10, The code could finally screw up people around the world!

“The Sphinx Developers” Members are all funded by CMU. –different purposes, but check-in to same code-base Common goal priority: –Accuracy –Speed & Accuracy trade off –Memory –Interface –Features –User-Friendliness

Characteristic of our Development The role of manager/lead developer is significantly weakened Release could take some time –require good release management Good architecture is very important Require skillful and knowledgeable programmers Highly practical: results worth more than words and opinions

Missions of the team Take care of CMU’s daily need of quality SR Continue to improve the system Bridge the industry and academia.

Conclusion on Team Current development is –Decentralized –Automated –Skill-demanding We probably want to keep in this way