Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rapid and Accurate Spoken Term Detection Michael Kleber BBN Technologies 15 December 2006.

Similar presentations


Presentation on theme: "Rapid and Accurate Spoken Term Detection Michael Kleber BBN Technologies 15 December 2006."— Presentation transcript:

1 Rapid and Accurate Spoken Term Detection Michael Kleber BBN Technologies 15 December 2006

2 15-Dec-06 Rapid and Accurate Spoken Term Detection 2 Overview of Talk BBN Mandarin system description Evaluation results –Corpus size effects Detecting English terms Segmentation and transcript problems

3 15-Dec-06 Rapid and Accurate Spoken Term Detection 3 BBN Evaluation Team Core Team Chia-lin Kao Owen Kimball Michael Kleber David Miller Additional assistance Thomas Colthurst Herb Gish Steve Lowe Rich Schwartz

4 15-Dec-06 Rapid and Accurate Spoken Term Detection 4 BBN System Overview Byblos STT indexer detector decider lattices phonetic- transcripts index scored detection lists final output with YES/NO decisions audio searc h terms ATWV cost parameter s

5 15-Dec-06 Rapid and Accurate Spoken Term Detection 5 Mandarin STT Configuration STT generates a lattice of hypotheses and a phonetic transcript for each input audio file. Word-based system: –Acoustic: HKUST (226hr), Callhome + Callfriend (35hr) –Language: Acoustic transcripts (2.2M words), UW-web (117M words), BN (123M words) –PLP, pitch, prob. of voicing, features, no mixture exponents Word segmentation: –60k dictionary, longest-first algorithm to segment HKUST –23k dictionary to segment out-of-domain LM data Dictionaries: –26K word dictionary + 6.2K single characters –Experiment: 1.6K English words spelled using Mandarin 82 phonemes 58.37% WER on STD Dev06 CTS data (44.91% CER) 40.94% WER on internal BBN-dev06 (30.97% CER)

6 15-Dec-06 Rapid and Accurate Spoken Term Detection 6 A Better STD Dev Set Original NIST Dev and Dryrun sets based on Eval03, which was drawn from CallFriend corpus –Previous experience showed results not predictive of performance on HKUST data to be used in Eval Created new BBN Dev set –Used EARS Dev04 audio (HKUST data) –Generated query terms using NIST software

7 15-Dec-06 Rapid and Accurate Spoken Term Detection 7 Mandarin CTS Results 0.3809Eval06 0.343BBNdev 0.241DryRun 0.257Dev06 ATWV Data

8 15-Dec-06 Rapid and Accurate Spoken Term Detection 8 Why so much worse than English? Word error rate Effect of corpus size on ATWV Cross-language queries Word segmentation

9 15-Dec-06 Rapid and Accurate Spoken Term Detection 9 Corpus Size Effect ATWV sensitive to corpus size: –Penalty for FA ~.3 / #hours of speech –Value of a hit = 1 / #occurrences of term –Averaged over all terms which occur 2/70/1 6/76/9 6/7 2/3 3 FA average 1/1 4/76/9 2/3 6/9 1 FA  3 3 FA hour 3hour 2hour 13 hours 2/3 {} 1/3 ——1/1 — — 0/1 0/3

10 15-Dec-06 Rapid and Accurate Spoken Term Detection 10 Corpus Size Effect ATWV sensitive to corpus size: –Penalty for FA ~.3 / #hours of speech –Value of a hit = 1 / #occurrences of term –Averaged over all terms which occur Actual effect on English DryRun06:.790 average 0.8280.7850.7580.852 hour 3hour 2hour 13 hours

11 15-Dec-06 Rapid and Accurate Spoken Term Detection 11 Cross-Language Queries Many Mandarin query terms appearing in CTS contained English words/letters. (23% of Dev06, 12% of DryRun06) Training data mismatch: –AM training: added English words, w/pronunciations –Added English to LM training but Dev set used to estimate LM weights contained no English –Result: STT emitted no English words on Dev sets –Mandarin decider logic fixed to give 0 hits for queries with English English is OOV, so ATWV on IV is immune. ATWV-IV = 0.4113 (vs. 0.3809 overall)

12 15-Dec-06 Rapid and Accurate Spoken Term Detection 12 Word Segmentation Problems If match by characters, many FAs result –Character 后 appears in words like “afterwards” ( 之后 or 以后 ), “last” ( 最后 ), “then” ( 然后 ). –Also can appear alone in abbreviated speech. –Match by character: 1 true hit, 91 FAs. Match by words: miss due to disagreements with reference transcript segmentation. –detection: 机械工程 “mechanical engineering” –reference: 机械 “mechanical” 工程 “engineering” Happens in English too: many FAs for “five” where transcript contained “twenty-five.”

13 15-Dec-06 Rapid and Accurate Spoken Term Detection 13 Word Segmentation Problems Can’t “just segment correctly” — Reference transcript is inconsistent. –DryRun search term 多, “many” Transcription 1: 太 “too” 多 “many” Transcription 2: 太多 “too many” –DryRun search term 双 “pair” 特 “especially” Transcription 1: 几双 “a few pair” Transcription 2: 几 “a few” 双 “pair” 特 “especially” Happens in English too: “day care” vs “day-care.”

14 15-Dec-06 Rapid and Accurate Spoken Term Detection 14 Directions for Improvement Cross-language queries, if important to task More robust word segmentation logic; Consistent transcripts or lenient scoring


Download ppt "Rapid and Accurate Spoken Term Detection Michael Kleber BBN Technologies 15 December 2006."

Similar presentations


Ads by Google