Rapid and Accurate Spoken Term Detection Michael Kleber BBN Technologies 15 December 2006.

Slides:



Advertisements
Similar presentations
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Advertisements

{ “Age” Effects on Second Language Acquisition Examination of 4 hypotheses related to age and language learning
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
The SRI 2006 Spoken Term Detection System Dimitra Vergyri, Andreas Stolcke, Ramana Rao Gadde, Wen Wang Speech Technology & Research Laboratory SRI International,
Rapid and Accurate Spoken Term Detection David R. H. Miller BBN Technolgies 14 December 2006.
Pitch-spelling algorithms David Meredith Aalborg University.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Xkl: A Tool For Speech Analysis Eric Truslow Adviser: Helen Hanson.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Adaptation Resources: RS: Unsupervised vs. Supervised RS: Unsupervised.
Investigation on Mandarin Broadcast News Speech Recognition Mei-Yuh Hwang, Xin Lei, Wen Wang*, Takahiro Shinozaki University of Washington, *SRI 9/19/2006,
Review of ICASSP 2004 Arthur Chan. Part I of This presentation (6 pages) Pointers of ICASSP 2004 (2 pages) NIST Meeting Transcription Workshop (2 pages)
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,
1 Less is More? Yi Wu Advisor: Alex Rudnicky. 2 People: There is no data like more data!
Spoken Term Detection Evaluation Overview Jonathan Fiscus, Jérôme Ajot, George Doddington December 14-15, Spoken Term Detection Workshop
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
IBM Haifa Research Lab © 2008 IBM Corporation Retrieving Spoken Information by Combining Multiple Speech Transcription Methods Jonathan Mamou Joint work.
Word-subword based keyword spotting with implications in OOV detection Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University.
Arabic STD 2006 Results Jonathan Fiscus, Jérôme Ajot, George Doddington December 14-15, Spoken Term Detection Workshop
Rapid and Accurate Spoken Term Detection Owen Kimball BBN Technologies 15 December 2006.
A Phonotactic-Semantic Paradigm for Automatic Spoken Document Classification Bin MA and Haizhou LI Institute for Infocomm Research Singapore.
Macquarie RT05s Speaker Diarisation System Steve Cassidy Centre for Language Technology Macquarie University Sydney.
English vs. Mandarin: A Phonetic Comparison Experimental Setup Abstract The focus of this work is to assess the performance of three new variational inference.
Topic Detection and Tracking Introduction and Overview.
Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.
Speech and Language Processing
Learning Phonetic Similarity for Matching Named Entity Translation and Mining New Translations Wai Lam, Ruizhang Huang, Pik-Shan Cheung ACM SIGIR 2004.
Temple University QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone, PhD Department of Electrical and Computer.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.
1 Using TDT Data to Improve BN Acoustic Models Long Nguyen and Bing Xiang STT Workshop Martigny, Switzerland, Sept. 5-6, 2003.
Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
Word and Sub-word Indexing Approaches for Reducing the Effects of OOV Queries on Spoken Audio Beth Logan Pedro J. Moreno Om Deshmukh Cambridge Research.
11 Effects of Explicitly Modeling Noise Words Chia-lin Kao, Owen Kimball, Spyros Matsoukas.
Bootstrap Estimates For Confidence Intervals In ASR Performance Evaluation Presented by Patty Liu.
AQUAINT Herbert Gish and Owen Kimball June 11, 2002 Answer Spotting.
1 Update on WordWave Fisher Transcription Owen Kimball, Chia-lin Kao, Jeff Ma, Rukmini Iyer, Rich Schwartz, John Makhoul.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
1 DUTIE Speech: Determining Utility Thresholds for Information Extraction from Speech John Makhoul, Rich Schwartz, Alex Baron, Ivan Bulyko, Long Nguyen,
National Taiwan University, Taiwan
1 Broadcast News Segmentation using Metadata and Speech-To-Text Information to Improve Speech Recognition Sebastien Coquoz, Swiss Federal Institute of.
Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
1 Unsupervised Adaptation of a Stochastic Language Model Using a Japanese Raw Corpus Gakuto KURATA, Shinsuke MORI, Masafumi NISHIMURA IBM Research, Tokyo.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.
Using Conversational Word Bursts in Spoken Term Detection Justin Chiu Language Technologies Institute Presented at University of Cambridge September 6.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.
STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.
English vs. Mandarin: A Phonetic Comparison The Data & Setup Abstract The focus of this work is to assess the performance of new variational inference.
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
Dec. 4-5, 2003EARS STT Workshop1 Broadcast News Training Experiments Anand Venkataraman, Dimitra Vergyri, Wen Wang, Ramana Rao Gadde, Martin Graciarena,
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
2009 NIST Language Recognition Systems Yan SONG, Bing Xu, Qiang FU, Yanhua LONG, Wenhui LEI, Yin XU, Haibing ZHONG, Lirong DAI USTC-iFlytek Speech Group.
Paul van Mulbregt Sheera Knecht Jon Yamron Dragon Systems Detection at Dragon Systems.
H ADVANCES IN MANDARIN BROADCAST SPEECH RECOGNITION Overview Goal Build a highly accurate Mandarin speech recognizer for broadcast news (BN) and broadcast.
Jeff Ma and Spyros Matsoukas EARS STT Meeting March , Philadelphia Post-RT04 work on Mandarin.
College of Engineering Temple University
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
College of Engineering
Presentation transcript:

Rapid and Accurate Spoken Term Detection Michael Kleber BBN Technologies 15 December 2006

15-Dec-06 Rapid and Accurate Spoken Term Detection 2 Overview of Talk BBN Mandarin system description Evaluation results –Corpus size effects Detecting English terms Segmentation and transcript problems

15-Dec-06 Rapid and Accurate Spoken Term Detection 3 BBN Evaluation Team Core Team Chia-lin Kao Owen Kimball Michael Kleber David Miller Additional assistance Thomas Colthurst Herb Gish Steve Lowe Rich Schwartz

15-Dec-06 Rapid and Accurate Spoken Term Detection 4 BBN System Overview Byblos STT indexer detector decider lattices phonetic- transcripts index scored detection lists final output with YES/NO decisions audio searc h terms ATWV cost parameter s

15-Dec-06 Rapid and Accurate Spoken Term Detection 5 Mandarin STT Configuration STT generates a lattice of hypotheses and a phonetic transcript for each input audio file. Word-based system: –Acoustic: HKUST (226hr), Callhome + Callfriend (35hr) –Language: Acoustic transcripts (2.2M words), UW-web (117M words), BN (123M words) –PLP, pitch, prob. of voicing, features, no mixture exponents Word segmentation: –60k dictionary, longest-first algorithm to segment HKUST –23k dictionary to segment out-of-domain LM data Dictionaries: –26K word dictionary + 6.2K single characters –Experiment: 1.6K English words spelled using Mandarin 82 phonemes 58.37% WER on STD Dev06 CTS data (44.91% CER) 40.94% WER on internal BBN-dev06 (30.97% CER)

15-Dec-06 Rapid and Accurate Spoken Term Detection 6 A Better STD Dev Set Original NIST Dev and Dryrun sets based on Eval03, which was drawn from CallFriend corpus –Previous experience showed results not predictive of performance on HKUST data to be used in Eval Created new BBN Dev set –Used EARS Dev04 audio (HKUST data) –Generated query terms using NIST software

15-Dec-06 Rapid and Accurate Spoken Term Detection 7 Mandarin CTS Results Eval BBNdev 0.241DryRun 0.257Dev06 ATWV Data

15-Dec-06 Rapid and Accurate Spoken Term Detection 8 Why so much worse than English? Word error rate Effect of corpus size on ATWV Cross-language queries Word segmentation

15-Dec-06 Rapid and Accurate Spoken Term Detection 9 Corpus Size Effect ATWV sensitive to corpus size: –Penalty for FA ~.3 / #hours of speech –Value of a hit = 1 / #occurrences of term –Averaged over all terms which occur 2/70/1 6/76/9 6/7 2/3 3 FA average 1/1 4/76/9 2/3 6/9 1 FA  3 3 FA hour 3hour 2hour 13 hours 2/3 {} 1/3 ——1/1 — — 0/1 0/3

15-Dec-06 Rapid and Accurate Spoken Term Detection 10 Corpus Size Effect ATWV sensitive to corpus size: –Penalty for FA ~.3 / #hours of speech –Value of a hit = 1 / #occurrences of term –Averaged over all terms which occur Actual effect on English DryRun06:.790 average hour 3hour 2hour 13 hours

15-Dec-06 Rapid and Accurate Spoken Term Detection 11 Cross-Language Queries Many Mandarin query terms appearing in CTS contained English words/letters. (23% of Dev06, 12% of DryRun06) Training data mismatch: –AM training: added English words, w/pronunciations –Added English to LM training but Dev set used to estimate LM weights contained no English –Result: STT emitted no English words on Dev sets –Mandarin decider logic fixed to give 0 hits for queries with English English is OOV, so ATWV on IV is immune. ATWV-IV = (vs overall)

15-Dec-06 Rapid and Accurate Spoken Term Detection 12 Word Segmentation Problems If match by characters, many FAs result –Character 后 appears in words like “afterwards” ( 之后 or 以后 ), “last” ( 最后 ), “then” ( 然后 ). –Also can appear alone in abbreviated speech. –Match by character: 1 true hit, 91 FAs. Match by words: miss due to disagreements with reference transcript segmentation. –detection: 机械工程 “mechanical engineering” –reference: 机械 “mechanical” 工程 “engineering” Happens in English too: many FAs for “five” where transcript contained “twenty-five.”

15-Dec-06 Rapid and Accurate Spoken Term Detection 13 Word Segmentation Problems Can’t “just segment correctly” — Reference transcript is inconsistent. –DryRun search term 多, “many” Transcription 1: 太 “too” 多 “many” Transcription 2: 太多 “too many” –DryRun search term 双 “pair” 特 “especially” Transcription 1: 几双 “a few pair” Transcription 2: 几 “a few” 双 “pair” 特 “especially” Happens in English too: “day care” vs “day-care.”

15-Dec-06 Rapid and Accurate Spoken Term Detection 14 Directions for Improvement Cross-language queries, if important to task More robust word segmentation logic; Consistent transcripts or lenient scoring