Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.

Slides:



Advertisements
Similar presentations
Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.
Advertisements

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Yasuhiro Fujiwara (NTT Cyber Space Labs)
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
Author :Panikos Heracleous, Tohru Shimizu AN EFFICIENT KEYWORD SPOTTING TECHNIQUE USING A COMPLEMENTARY LANGUAGE FOR FILLER MODELS TRAINING Reporter :
15.0 Utterance Verification and Keyword/Key Phrase Spotting References: 1. “Speech Recognition and Utterance Verification Based on a Generalized Confidence.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.
Hidden Markov Models Theory By Johan Walters (SR 2003)
A Statistical Model for Domain- Independent Text Segmentation Masao Utiyama and Hitoshi Isahura Presentation by Matthew Waymost.
1 Dynamic Match Lattice Spotting Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining - revision Martin Russell.
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Introduction to Automatic Speech Recognition
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
Isolated-Word Speech Recognition Using Hidden Markov Models
Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.
 Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors.
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
Incorporating Dynamic Time Warping (DTW) in the SeqRec.m File Presented by: Clay McCreary, MSEE.
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.
Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
CMU Robust Vocabulary-Independent Speech Recognition System Hsiao-Wuen Hon and Kai-Fu Lee ICASSP 1991 Presenter: Fang-Hui CHU.
HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Author :Neeraj Deshmukh, Aravind Ganapathiraju and Joseph Picone.
8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
A NONPARAMETRIC BAYESIAN APPROACH FOR
8.0 Search Algorithms for Speech Recognition
Handwritten Characters Recognition Based on an HMM Model
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Research on the Modeling of Chinese Continuous Speech Recognition
Presentation transcript:

Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter : CHEN, TZAN HWEI

2 Reference K. Thambiratnam,S. Sridharan “DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING”, ICASSP 2005 S. J. Young, M. G. Brown, J. T. Foote, G. J. F. Jones, and K. Sparck Jones, “Acoustic indexing for multimedia retrieval and browsing”, ICASSP 1997.

3 Outline Introduction Dynamic match phone-lattice keyword spotting (DMPLS) Experimental procedure and result Conclusions

4 Introduction HMM-based Keyword spotting uses a HMM-based speech recognizer to postulate candidate occurrences of a target keyword in continuous speech. All non-target keywords in the target domain’s vocabulary are represented by ’ garbage ’ model.

5 Introduction (cont) Phone-Lattice keyword spotting (PLS) The PLS approach achieves significantly faster query speeds than HMMs by first indexing speech files for subsequent rapid searching. A phone-lattice representation of the speech is generated by perform a N-best Viterbi recognition pass. Figure 1. Phone lattice for word “manage” (m ae n i h jh)

6 Dynamic match phone-lattice keyword spotting (DMPLS) The shortcoming of PLS – the requirement for the keyword phone sequence to appear in its entirety within the phone-lattice. DMPLS is an extension of PLS that uses the Minimum Edit Distance (MED) during lattice searching to compensate for recognizer insertion, deletion and substitution error. In DMPLS, each observed lattice phone sequence scored against the target phone sequence using the MED.

7 DMPLS (cont) Lattice sequence are then accepted/rejected by thresholding on the MED score. PLS is a special case of DPMLS where a threshold of ‘0’ is used.

8 DMPLS (cont) Basic algorithm :

9 DMPLS (cont) Optimized version : To only calculate successive columns of the MED matrix if the minimum element of the current column is less than The removal of lattice traversal from query-time processing. It is only necessary to store observed phone sequences of length in each node. In query-time processing the reduces to simply calculating the MED between each stored observed phone sequence and the target phone sequence.

10 Experimental procedure and result Test corpus : Swichboard-1(SWB1) conversational telephone speech corpus and the TIMIT microphone speech database. 16-mixture triphone HMM models and a 256-mixture Model background model were trained on a 150 subset of SWB1 speech for use with HMMs and DMPLS Similar models were also trained on the training subset of the Wall Street Journal 1 microphone speech database for experiment on TIMIT.

11 Experimental procedure and result (cont) All speech was parameterized using Perceptual Linear Prediction coefficient feature extraction and Cepstral Mean Subtraction. Lattices were generated for each utterance by performing a U-token Viterbi decoding pass and a bigram phone-level LM. The resultant lattices were then expanded using a 4- gram phone-level LM and pruned using a beam-width of W to reduce the complexity.

12 Experimental procedure and result (cont) A second V-token traversal was performed to generate the top 10 scoring observed phone sequence of length 11 at each node MED calculations used a fixed deletion cost of =. The insertion cost was also fixed at = 1. However, was allowed to vary based to vary based on phone substitution. = 0 for same-letter consonant phone substitution. = 1 for vowel 、 closure and stop substitution. = for all other substitution.

13 Experimental procedure and result (cont) The choice of query words was constrained to words that had 6 phone-length to reduce the scope of experiments. The systems were evaluated by performing single- word keyword spotting for each query word across all utterances in a given test set. The notation is used to specify DMPLS configurations. is used to HMMs configurations where is the duration-normalized output likelihood threshold.

14 Experimental procedure and result (cont) TIMIT : Table 1. Baseline keyword spotting results evaluated on TIMIT Table 2. TIMIT performance when isolating various DP rules

15 Experimental procedure and result (cont) TIMIT (cont): Table 3. Effect of DMPLS parameters, evaluated on TIMIT

16 Experimental procedure and result (cont) TIMIT (cont): Table 4. Optimized DMPLS configurations, evaluated on TIMIT Baseline :

17 Experimental procedure and result (cont) SWB1 Table 5. Keyword spotting results on SWB1

18 Conclusions The reported experiments show that the DMPLS method provides very fast keyword spotting while maintaining respectable miss rate. Utterance verification?