15-Jul-04 FSG Implementation in Sphinx2 FSG Implementation in Sphinx2 Mosur Ravishankar Jul 15, 2004.

Slides:



Advertisements
Similar presentations
Sphinx-3 to 3.2 Mosur Ravishankar School of Computer Science, CMU Nov 19, 1999.
Advertisements

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.
Introduction to Hidden Markov Models
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Hidden Markov Models Adapted from Dr Catherine Sweeney-Reed’s slides.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.
The Acoustic/Lexical model: Exploring the phonetic units; Triphones/Senones in action. Ofer M. Shir Speech Recognition Seminar, 15/10/2003 Leiden Institute.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
On Recognizing Music Using HMM Following the path craved by Speech Recognition Pioneers.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
. Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss.
Fast Temporal State-Splitting for HMM Model Selection and Learning Sajid Siddiqi Geoffrey Gordon Andrew Moore.
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University.
Automatic Continuous Speech Recognition Database speech text Scoring.
The 2000 NRL Evaluation for Recognition of Speech in Noisy Environments MITRE / MS State - ISIP Burhan Necioglu Bryan George George Shuttic The MITRE.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
 Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors.
Hidden Markov Models for Sequence Analysis 4
1 Java Inheritance. 2 Inheritance On the surface, inheritance is a code re-use issue. –we can extend code that is already written in a manageable manner.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát.
CIS250 OPERATING SYSTEMS Memory Management Since we share memory, we need to manage it Memory manager only sees the address A program counter value indicates.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Design and Implementation of Speech Recognition Systems Spring 2014 Class 13: Training with continuous speech 26 Mar
Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.
HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Author :Neeraj Deshmukh, Aravind Ganapathiraju and Joseph Picone.
8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
SPEECH RECOGNITION Presented to Dr. V. Kepuska Presented by Lisa & Za ECE 5526.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Hidden Markov Models CBB 231 / COMPSCI 261 part 2.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Training Tied-State Models Rita Singh and Bhiksha Raj.
1 MARKOV MODELS MARKOV MODELS Presentation by Jeff Rosenberg, Toru Sakamoto, Freeman Chen HIDDEN.
1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.
From Genomics to Geology: Hidden Markov Models for Seismic Data Analysis Samuel Brown February 5, 2009.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
The Use of Virtual Hypothesis Copies in Decoding of Large-Vocabulary Continuous Speech Frank Seide IEEE Transactions on Speech and Audio Processing 2005.
ALPHABET RECOGNITION USING SPHINX-4 BY TUSHAR PATEL.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Probabilistic Pronunciation + N-gram Models CMSC Natural Language Processing April 15, 2003.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
8.0 Search Algorithms for Speech Recognition
Dynamic Programming Search
Presentation transcript:

15-Jul-04 FSG Implementation in Sphinx2 FSG Implementation in Sphinx2 Mosur Ravishankar Jul 15, 2004

15-Jul-04 FSG Implementation in Sphinx2 Outline  Input specification  FSG related API  Application examples  Implementation issues

15-Jul-04 FSG Implementation in Sphinx2 FSG Specification  “Assembly language” for specifying FSGs  Low-level  Most standards should compile down to this level  Set of N states, numbered 0.. N-1  Transitions:  Emitting or non-emitting (aka null or epsilon)  Each emitting transition emits one word  Fixed probability 0 < p <= 1.  One start state, and one final state  Null transitions can effectively give you as many as needed  Goal: Find the highest likelihood path from the start state to the final state, given some input speech

15-Jul-04 FSG Implementation in Sphinx2 An FSG Example FSG_BEGIN leg NUM_STATES 10 START_STATE 0 FINAL_STATE 9 # Transitions T to T city1 … T cityN T from T city1 … T cityN T T from T city1 … T cityN T to T city1 … T cityN T FSG_END to from to city1 cityN city1 cityN city1 cityN city1 cityN  

15-Jul-04 FSG Implementation in Sphinx2 A Better Representation  Composition of FSGs to from to   [city] pittsburgh 01 chicago boston buffalo seattle

15-Jul-04 FSG Implementation in Sphinx2 Multiple Pronunciations and Filler Words  Alternative pronunciations added automatically  Filler word transitions (silence and noise) added automatically  A filler self-transition at every state  Noise words added only if noise penalty (probability) > 0 to from to   [city] [filler]

15-Jul-04 FSG Implementation in Sphinx2 FSG Related API  Loading during initialization (i.e., fbs_init() ):  -fsgfn flag specifying an FSG file to load (similar to –lmfn flag)  Difference: FSG name is contained in the file  Dynamic loading:  char *uttproc_load_fsgfile(char *fsgfile); returns the FSG string name contained in the file  Switching to an FSG:  uttproc_set_fsg (char *fsgname);  Deleting a previously loaded FSG:  uttproc_del_fsg (char *fsgname);  Old demos could be run with FSGs, simply by recompiling with new libraries

15-Jul-04 FSG Implementation in Sphinx2 Mixed LM/FSG Decoding Example  (See lm_fsg_test.c)

15-Jul-04 FSG Implementation in Sphinx2 Another Example: Garbage Models  Extraneous speech could be absorbed using an allphone “garbage model” to from to   [city] [allphone]

15-Jul-04 FSG Implementation in Sphinx2 B/W Training and Forced Alignment  Consolidate code for FSGs, Baum-Welch training, and forced alignment?  Sentence HMMs for training and alignment are essentially linear FSGs  Alternative pronunciations and filler words handled automatically  Differences:  B/W uses forward (and backward) algorithm instead of Viterbi  Alignment has to produce phone and state segmentation as well

15-Jul-04 FSG Implementation in Sphinx2 Implementation  Straightforward expansion of word-level FSG into a triphone HMM network  Viterbi beam search over this HMM network  No major optimizations attempted (so far)  No lextree implementation (What?)  Static allocation of all HMMs; not allocated “on demand” (Oh, no!)  FSG transitions represented by NxN matrix (You can’t be serious!!)  Speed/Memory usage profile needs to be evaluated  Mostly new set of data structures, separate from existing ones  Should be easily ported to Sphinx3

15-Jul-04 FSG Implementation in Sphinx2 Implementation: FSG Expansion to HMMs word1 word p1p2p3p4q1q2q word1 word2

15-Jul-04 FSG Implementation in Sphinx2 Implementation: Triphone HMMs p1p2p3p4 0 1 word1 p1p2 p3p4 word1 p1’ p1’’ p4’ p4’’ 0 1 Multiple root HMMs for different left contexts Multiple leaf HMMs for different right contexts p1 p2 p1’ p1’’ p2’ p2’’ Special case for 2-phone words 1-phone words use SIL as right context

15-Jul-04 FSG Implementation in Sphinx2 Possible Optimization: Lextrees p1p2p3p4q1q2q3 word1 wordN Lextree (associated with source state)

15-Jul-04 FSG Implementation in Sphinx2 Possible Optimization: Path Pruning  If there are two transitions with the same label into the same state, the one starting out with a worse score can be pruned  But reconciling with lextrees is tricky, since labels are now blurred w w

15-Jul-04 FSG Implementation in Sphinx2 Other Issues Pending  Dynamic allocation and management of HMMs  Implementation of absolute pruning  Lattice generation  N-best list generation  …

15-Jul-04 FSG Implementation in Sphinx2 Where Is It?  My copy of open source version of Sphinx2  Someone needs to update the sourceforge copy  Html documentation has been updated