Rapid and Accurate Spoken Term Detection Owen Kimball BBN Technologies 15 December 2006.

Slides:



Advertisements
Similar presentations
Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
Advertisements

15.0 Utterance Verification and Keyword/Key Phrase Spotting References: 1. “Speech Recognition and Utterance Verification Based on a Generalized Confidence.
The SRI 2006 Spoken Term Detection System Dimitra Vergyri, Andreas Stolcke, Ramana Rao Gadde, Wen Wang Speech Technology & Research Laboratory SRI International,
Rapid and Accurate Spoken Term Detection David R. H. Miller BBN Technolgies 14 December 2006.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
1 Dynamic Match Lattice Spotting Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof.
Recognition of Voice Onset Time for Use in Detecting Pronunciation Variation ● Project Description ● What is Voice Onset Time (VOT)? – Physical Realization.
ASR Evaluation Julia Hirschberg CS Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept.
Spoken Term Detection Evaluation Overview Jonathan Fiscus, Jérôme Ajot, George Doddington December 14-15, Spoken Term Detection Workshop
1 1 Automatic Transliteration of Proper Nouns from Arabic to English Mehdi M. Kashani, Fred Popowich, Anoop Sarkar Simon Fraser University Vancouver, BC.
1 Quick Transcription of Fisher Data with WordWave Owen Kimball, Rukmini Iyer, Chia-lin Kao, Thomas Colthurst, John Makhoul.
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
To quantitatively test the quality of the spell checker, the program was executed on predefined “test beds” of words for numerous trials, ranging from.
IBM Haifa Research Lab © 2008 IBM Corporation Retrieving Spoken Information by Combining Multiple Speech Transcription Methods Jonathan Mamou Joint work.
L. Padmasree Vamshi Ambati J. Anand Chandulal J. Anand Chandulal M. Sreenivasa Rao M. Sreenivasa Rao Signature Based Duplicate Detection in Digital Libraries.
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
Word-subword based keyword spotting with implications in OOV detection Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University.
Arabic STD 2006 Results Jonathan Fiscus, Jérôme Ajot, George Doddington December 14-15, Spoken Term Detection Workshop
Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
Automatic Spoken Document Processing for Retrieval and Browsing Zahra Ahmadi.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
11 Update on Transcription of Fisher Phase II Data Owen Kimball, Chia-lin Kao, Tresi Arvizo, John Makhoul.
Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.
Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.
Integrated Stochastic Pronunciation Modeling Dong Wang Supervisors: Simon King, Joe Frankel, James Scobbie.
1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Temple University QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone, PhD Department of Electrical and Computer.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.
Introduction of Grphones Dong Wang 05/05/2008. Content  Grphones  Graphone-based LVCSR  Graphone-based STD.
1 Using TDT Data to Improve BN Acoustic Models Long Nguyen and Bing Xiang STT Workshop Martigny, Switzerland, Sept. 5-6, 2003.
Rapid and Accurate Spoken Term Detection Michael Kleber BBN Technologies 15 December 2006.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
Improving out of vocabulary name resolution The Hanks David Palmer and Mari Ostendorf Computer Speech and Language 19 (2005) Presented by Aasish Pappu,
11 Effects of Explicitly Modeling Noise Words Chia-lin Kao, Owen Kimball, Spyros Matsoukas.
Bootstrap Estimates For Confidence Intervals In ASR Performance Evaluation Presented by Patty Liu.
AQUAINT Herbert Gish and Owen Kimball June 11, 2002 Answer Spotting.
1 Update on WordWave Fisher Transcription Owen Kimball, Chia-lin Kao, Jeff Ma, Rukmini Iyer, Rich Schwartz, John Makhoul.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
1 DUTIE Speech: Determining Utility Thresholds for Information Extraction from Speech John Makhoul, Rich Schwartz, Alex Baron, Ivan Bulyko, Long Nguyen,
1 Broadcast News Segmentation using Metadata and Speech-To-Text Information to Improve Speech Recognition Sebastien Coquoz, Swiss Federal Institute of.
Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.
Using Conversational Word Bursts in Spoken Term Detection Justin Chiu Language Technologies Institute Presented at University of Cambridge September 6.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
Dec. 4-5, 2003EARS STT Workshop1 Broadcast News Training Experiments Anand Venkataraman, Dimitra Vergyri, Wen Wang, Ramana Rao Gadde, Martin Graciarena,
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
College of Engineering
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
Presenter : Jen-Wei Kuo
Presentation transcript:

Rapid and Accurate Spoken Term Detection Owen Kimball BBN Technologies 15 December 2006

15-Dec-06 Rapid and Accurate Spoken Term Detection 2 Overview of Talk BBN Levantine system description Evaluation results Diacritics Out-of-vocabulary issues

15-Dec-06 Rapid and Accurate Spoken Term Detection 3 BBN Evaluation Team Core Team Chia-lin Kao Owen Kimball Michael Kleber David Miller Additional assistance Thomas Colthurst Herb Gish Steve Lowe Rich Schwartz

15-Dec-06 Rapid and Accurate Spoken Term Detection 4 BBN System Overview Byblos STT indexer detector decider lattices phonetic- transcripts index scored detection lists final output with YES/NO decisions audio searc h terms ATWV cost parameter s

15-Dec-06 Rapid and Accurate Spoken Term Detection 5 Levantine STT Configuration STT generates a lattice of hypotheses and a phonetic transcript for each input file. Word-based system: –Orthography based on Modern Standard Arabic (MSA), no short vowel diacritics –Acoustic: 57.3 hours LDC (noise words, no mixture exponents) –Language: 250 hours of data, 1.3M words 38.5K dictionary, grapheme-as-phoneme based plus 100 manual pronunciations –unknown short vowel (U), 39 phonemes 42.32% WER on STD Dev06 CTS data

15-Dec-06 Rapid and Accurate Spoken Term Detection 6 Levantine CTS Results Eval DryRun 0.515Dev06 ATWV Data

15-Dec-06 Rapid and Accurate Spoken Term Detection 7 OOV Pipeline: Detector Word-based STT produces 1-best transcript: pronounce it  1-best phonetic transcript. Query is OOV if it contains any OOV word. OOV query detection: –Pronounce query (grapheme-as-phoneme) –Find minimal edit-distance alignments (agrep) –Score = % error =

15-Dec-06 Rapid and Accurate Spoken Term Detection 8 OOV Pipeline: Decider Need different Yes/No decision procedure: IV-decider requires posterior probabilities. Simple OOV decision procedure: –Constant threshold on score (~ 0.7) –Cap on maximum number of hits (0-3) –Values set to maximize ATWV on Dev06 data.

15-Dec-06 Rapid and Accurate Spoken Term Detection 9 OOV Pipeline: Results ATWV remained good: IV OOV Searches take longer: ~10-15x IV speed on Dev06 and DryRun06, with no attempt at indexing.

15-Dec-06 Rapid and Accurate Spoken Term Detection 10 OOV Directions for Improvement Score substitutions using phoneme confusion matrix instead of flat edit distance Speed: indexing phonetic transcripts for approximate matching Search lattices beyond 1-best transcripts

15-Dec-06 Rapid and Accurate Spoken Term Detection 11 Levantine Diacritic Issues Originally looked at diacritized Levantine Trained STT engine using LDC 45 hour set Ran STD without knowing WER (no diacritized STT test set to measure WER). –Found very high false alarm rate Examining FAs found hits that were legitimate alternate spellings

15-Dec-06 Rapid and Accurate Spoken Term Detection 12 Levantine Diacritics- Alternate Spellings Examining query words found more of same: –In first 22 terms of dry run term list, 14 are “alternate diacritic” spellings of 5 underlying words, i.e. there were just 13 unique words in the first 22 terms –Min~ahumo v Minohumo –AlHayaApi v AlHayaAp –Waliko v Walika –qabilo v qabola v qabolo LDC training and STD test set had additional pervasive differences

15-Dec-06 Rapid and Accurate Spoken Term Detection 13 No-Diacritic Levantine Issues A quick look turned up a smaller number of problems for no-diacritic Levantine –Looking at 7 top-FA terms in dev set, found “bHky” vs “b>Hky” but no other spelling confusions One ref instance of term with 0 duration It would be interesting to QC test sets for inconsistent spellings and other issues