Download presentation
Presentation is loading. Please wait.
Published byBenedict Fletcher Modified over 9 years ago
1
1 Update on WordWave Fisher Transcription Owen Kimball, Chia-lin Kao, Jeff Ma, Rukmini Iyer, Rich Schwartz, John Makhoul
2
2 Outline Schedule update Investigating WordWave + auto segmentation quality –Updated evaluation method –Separating effect of transcripts and segmentation –Improved segmentation algorithm Plans Update on using Fisher data in Training
3
3 Data Schedule BBN has received 925 hours from WordWave (WWave) Processed and released 478 hours via LDC –91 hrs on 8/1/03 –300 on 9/24/03 –87 on 10/21/03 WWave is currently running more slowly than planned –Reason: CTS transcription is hard! –They will complete 1600 hrs by the end of Jan 04, with remaining 200 hrs to follow as quickly as possible.
4
4 Segmentation Quality as of Sept 03 Auto segmentation goals: Given audio and transcript and no timing info, break into fairly short segments and align correct text to each segment In September, we compared transcription and segmentation approaches on a 20 hour Swbd set: –LDC/MSU careful transcription and manual segmentation vs. –LDC fast transcription and manual segmentation vs. –WWave transcripts + BBN automatic segmentation. Compared 2 different segmentation algorithms –Alg I: run recognizer and segment at “reliable” silences; decode using segmentation and reject based on sclite alignment errors –Alg II: use recognizer to get coarse initial segmentation; then forced alignment within coarse segs to find finer segs; final rejection pass as before.
5
5 Performance Comparison in Sept Unadapted recognition; acoustic models trained with 20-hour Swbd1 set, LM trained on full Switchboard ML, GI, VTL, HLDA-trained models Transcripts / Segmentation Training hours Eval01 WER Manual LDC+MSU19.941.1 CTRAN / Alg I19.441.8 Fast Manual LDC17.941.2 WWave / Alg I19.241.4 WWave / Alg II19.541.4
6
6 Improving the Evaluation Method There were a number of issues and shortcuts in the training and test, that clouded comparisons. We therefore –Adopted improved training sequence, including new binaries –Reduced pruning errors in decode –Converted from fast approximate VTL length estimation to more careful approach –Adopted more stable VTL models VTL models trained on 20 hours differed dramatically for small changes in segmentation –This is a bug in our VTL model estimation that we need to fix –For following experiments used stable VTL models from RT03 eval Switched from our historic LDC+MSU baseline to all MSU for simplicity.
7
7 Comparison with Better Train and Test Transcripts/ Segmentation Training hoursEval01 WER LDC+MSU19.938.5 MSU23.438.0 Fast LDC17.939.4 Wwave/ Alg I19.638.8 Wwave/ Alg II19.538.8
8
8 Separating Effect of Segmentation Compare segmentations using identical (MSU) transcripts Alg I WER same for WWave vs MSU transcripts Segmentation may be biggest/only problem. Transcripts/ Segmentation Training hours Eval01 WER MSU / MSU23.438.0 MSU / Alg I20.238.8
9
9 Segmentation Algorithm III Algorithm II used forced alignment within coarse segments provided by initial pass of recognition, but examination revealed unrecoverable errors (words in wrong segment) from coarse initial seg. Tried forced alignment of complete conversation sides Overcame initial problems of failed alignments by –Pre-chopping out long silences, where our system tends to get confused Used auto-segmenter developed for RT03 CTS eval for this –Changing forced alignment program to do much less pruning at begin and end of conversation This accommodated things like beeps, line noise, and words cut off by recording start and stop Forced alignment is followed by script that breaks segments at silences, then rejection pass
10
10 Algorithm III with MSU transcripts Transcripts/ Segmentation Training hours Eval01 WER MSU / MSU23.438.0 MSU / Alg I20.238.8 MSU / Alg III21.838.2 Manually comparing MSU and Alg III showed that Alg III: –had more, shorter segments –had less silence padding around utterances –allowed utterances > 15 seconds when speaker did not pause Modified Alg III to approximate MSU’s statistics
11
11 Improved Algorithm III Transcripts/ Segmentation Training hours Eval01 WER MSU / MSU23.438.0 MSU / Alg I20.238.8 MSU / Original Alg III21.838.2 MSU / Improved Alg III22.538.1 Matching MSU’s utterance lengths and silence improves WER slightly Alg III seems good enough, at least for this task
12
12 Results with WordWave Transcripts WWave transcripts seem fine given improved seg Transcripts/ Segmentation Training hours Eval01 WER MSU / MSU23.438.0 Fast LDC17.939.4 WWave/ Alg I19.638.8 WWave/ Original Alg III21.238.1
13
13 Plans Confirm quality of WWave with Alg III seg –On Swbd 20 hour set, train MMI models to compare all-MSU vs. WWave/Alg III –On Swbd + 150 hour Fisher experiment, where we got gains using Alg I segmented data. Performance should not degrade Improve speed of Alg III Resegment and redistribute all data that has been released so far Catch up with and continue segmenting latest WWave transcript deliveries.
14
14 Update on Adding Fisher Data In Martigny, showed 1.4% gain for adding 150 hrs Fisher data (Alg I segmented) to RT03 training Hoped to have results with 350 hours but we had bugs in our initial runs. Did train MMI on RT03 (sw370) vs RT03+Fisher150 Results on 2 nd adaptation pass with POS LM rescoring CAVEAT: non-rigorous comparison! Fisher150 system optimized (gains 0.1-0.2% gain); used diff phone set & faster training (degrades 0.2% in other comparisons). Training Eval03 WER RT03: SW37023.1 + Fisher 15022.5
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.