Download presentation
Presentation is loading. Please wait.
Published byCaren Cain Modified over 9 years ago
1
11 Effects of Explicitly Modeling Noise Words Chia-lin Kao, Owen Kimball, Spyros Matsoukas
2
22 Outline Motivation BBN’s standard training procedure without noise words Effect of noise words in ML training Effect of noise words in discriminative training Conclusions
3
33 Motivation BBN’s English CTS system does not train with noise words in transcripts For RT04 non-English CTS systems, we found that using noise words helped –[LAUGH], [NOISE], [SIGH], etc., appear in transcripts used to train non-English ML models –Levantine system: 1.6% gain on unadapted LevAr.Dev04 test –Mandarin system: 1.0% gain on unadapted Man.CTS.Dev04 test Do these results hold for English? for discriminative training? –Success would simplify the preparation of Fisher training transcripts: no need to change transcripts and re-segment
4
44 Noise Words in English Transcripts MSU Jan 2000 Switchboard I transcripts include: [laughter], [noise], [vocalized-noise] For RT02, BBN switched to CU-HTK training transcripts, in which explicit noise words were removed from MSU transcripts –Found no significant difference in performance compared with previous BBN transcripts –Assumed noise words were a no-op, but there were other differences and we did not test which ones helped or hurt WordWave Fisher transcripts include [LAUGH], [NOISE], [MN], [COUGH], [LIPSMACK], [SIGH] BBN RT04 CTS English system removes noise words from transcripts and relies on silence HMM to model them.
5
55 Training Procedure without Noise Words Process training transcripts –Drop utterances containing only noise words –Map noise words to silence Train initial ML models and generate word alignments Remove long silences –Using alignment information, chop utterances containing silences longer than two seconds Train final ML models using the processed transcripts and segmentation
6
66 Effect of Noise Words in ML Training Comparison experiments, Fisher training –Train ML models using 330 hours of automatically segmented Fisher data with and without noise words in transcripts Validation experiments, Switchboard training –Train ML models using 180 hours of Switchboard data with noise words: MSU's original transcripts without noise words: CU's processed transcripts Test models on combined Eval03 and Dev04 test set
7
77 Fisher Training Experiments Without noise words: train as described 2 slides back With noise words: use four phonemes to model six noise words; transcripts and segmentation unaltered. Noise WordPhonetic spelling [COUGH]COF-COF [LAUGHTER]LAF-LAF [NOISE]AMN-AMN [MN]BRN-BRN [SIGH]BRN-BRN [LIPSMACK]BRN-BRN
8
88 Fisher Experiment results Noise words in transcripts Unadapted WER Eval03+Dev04 No26.2 Yes25.4 Noise words in acoustic modeling (AM) and language modeling (LM) transcripts give 0.8% WER gain
9
99 Diagnostic Experiments Is the gain with noise words due to better acoustic or better language modeling? Expt I: explicit noise words in transcripts but modeled as silences: spell all noise words using the silence phoneme Expt II: test acoustic models from Expt I using LMs trained on transcripts without noise words Noise words in AM transcripts? Noise phones Noise words in LM transcripts? Unadapted WER Eval03+Dev04 No--No26.2 YesNoiseYes25.4 I YesSilenceYes25.5 II YesSilenceNo25.6
10
10 Diagnostic Experiments, cont’d Including or excluding noise words from the LM training has no significant effect on performance Noise words in transcripts improve performance whether they are trained as noise models or as silence ==> Acoustic model initialization improves when noise words are explicitly marked in the transcripts
11
11 ML Training on Switchboard corpus 1.Use 2385 Swbd I conversations (160 hours), processed and segmented by CU from Eval03 training set 2.Same 2385 conversations from original MSU Swbd I Jan 2000 release (180 hours) 3.Apply auto-segmentation process to the MSU version of conversations, produced 180 hours Noise words in training?Segmentation Unadapted WER Eval03+Dev04 NoCU28.4 YesMSU manual27.7 YesBBN auto27.6
12
12 Effects with Discriminative Training Trained SI-MPE models using baseline 330-hour Fisher ML models as the seed models Noise words in transcripts Unadapted WER Eval03+Dev04 No23.6 Yes23.4 Noise words still yield better models, but the gain is just 0.2%
13
13 Conclusions Including noise words in transcripts results in better model initialization in acoustic training Discriminative training procedure overcomes most of the poor initial estimate when noise words are not explicitly marked in the transcripts We can directly use Fisher transcripts that are output by the BBN / WordWave, i.e. no need to map noise words and resegment
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.