Rapid and Accurate Spoken Term Detection Owen Kimball BBN Technologies 15 December 2006
15-Dec-06 Rapid and Accurate Spoken Term Detection 2 Overview of Talk BBN Levantine system description Evaluation results Diacritics Out-of-vocabulary issues
15-Dec-06 Rapid and Accurate Spoken Term Detection 3 BBN Evaluation Team Core Team Chia-lin Kao Owen Kimball Michael Kleber David Miller Additional assistance Thomas Colthurst Herb Gish Steve Lowe Rich Schwartz
15-Dec-06 Rapid and Accurate Spoken Term Detection 4 BBN System Overview Byblos STT indexer detector decider lattices phonetic- transcripts index scored detection lists final output with YES/NO decisions audio searc h terms ATWV cost parameter s
15-Dec-06 Rapid and Accurate Spoken Term Detection 5 Levantine STT Configuration STT generates a lattice of hypotheses and a phonetic transcript for each input file. Word-based system: –Orthography based on Modern Standard Arabic (MSA), no short vowel diacritics –Acoustic: 57.3 hours LDC (noise words, no mixture exponents) –Language: 250 hours of data, 1.3M words 38.5K dictionary, grapheme-as-phoneme based plus 100 manual pronunciations –unknown short vowel (U), 39 phonemes 42.32% WER on STD Dev06 CTS data
15-Dec-06 Rapid and Accurate Spoken Term Detection 6 Levantine CTS Results Eval DryRun 0.515Dev06 ATWV Data
15-Dec-06 Rapid and Accurate Spoken Term Detection 7 OOV Pipeline: Detector Word-based STT produces 1-best transcript: pronounce it 1-best phonetic transcript. Query is OOV if it contains any OOV word. OOV query detection: –Pronounce query (grapheme-as-phoneme) –Find minimal edit-distance alignments (agrep) –Score = % error =
15-Dec-06 Rapid and Accurate Spoken Term Detection 8 OOV Pipeline: Decider Need different Yes/No decision procedure: IV-decider requires posterior probabilities. Simple OOV decision procedure: –Constant threshold on score (~ 0.7) –Cap on maximum number of hits (0-3) –Values set to maximize ATWV on Dev06 data.
15-Dec-06 Rapid and Accurate Spoken Term Detection 9 OOV Pipeline: Results ATWV remained good: IV OOV Searches take longer: ~10-15x IV speed on Dev06 and DryRun06, with no attempt at indexing.
15-Dec-06 Rapid and Accurate Spoken Term Detection 10 OOV Directions for Improvement Score substitutions using phoneme confusion matrix instead of flat edit distance Speed: indexing phonetic transcripts for approximate matching Search lattices beyond 1-best transcripts
15-Dec-06 Rapid and Accurate Spoken Term Detection 11 Levantine Diacritic Issues Originally looked at diacritized Levantine Trained STT engine using LDC 45 hour set Ran STD without knowing WER (no diacritized STT test set to measure WER). –Found very high false alarm rate Examining FAs found hits that were legitimate alternate spellings
15-Dec-06 Rapid and Accurate Spoken Term Detection 12 Levantine Diacritics- Alternate Spellings Examining query words found more of same: –In first 22 terms of dry run term list, 14 are “alternate diacritic” spellings of 5 underlying words, i.e. there were just 13 unique words in the first 22 terms –Min~ahumo v Minohumo –AlHayaApi v AlHayaAp –Waliko v Walika –qabilo v qabola v qabolo LDC training and STD test set had additional pervasive differences
15-Dec-06 Rapid and Accurate Spoken Term Detection 13 No-Diacritic Levantine Issues A quick look turned up a smaller number of problems for no-diacritic Levantine –Looking at 7 top-FA terms in dev set, found “bHky” vs “b>Hky” but no other spelling confusions One ref instance of term with 0 duration It would be interesting to QC test sets for inconsistent spellings and other issues