Download presentation
Presentation is loading. Please wait.
Published byAmi Norman Modified over 9 years ago
1
S1S1 S2S2 S3S3 8 October 2002 DARTS 2002 1 ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere, Jean-Pierre Martens, Vincent Vandeghinste, Frank Van Eynde, Erik Tjong Kim Sang, Walter Daelemans
2
S1S1 S2S2 S3S3 8 October 20022DARTS 2002 Outline l Project overview l Tasks + results l Conclusions
3
S1S1 S2S2 S3S3 8 October 20023DARTS 2002 ATraNoS l Automatic Transcription and Normalization of Speech l IWT-STWW TOP project, 2x2years, €1.25M l Started 1 October 2000 l Partners: ESAT/KULeuven, ELIS/UGent, CCL/KULeuven, CNTS/UIA
4
S1S1 S2S2 S3S3 8 October 20024DARTS 2002 Project aims l Automatic transcription of spontaneous speech l Conversion of transcriptions according to application, e.g. subtitling (test vehicle in this project)
5
S1S1 S2S2 S3S3 8 October 20025DARTS 2002 Work packages l WP1: segmentation of audio stream in homogeneous segments (ELIS): –preprocessor for speech decoder –segments containing single type of signal (wideband speech, telephone speech, background, etc.) –label segments, cluster speakers –induce only small delay
6
S1S1 S2S2 S3S3 8 October 20026DARTS 2002 WP1 Results Speech/non-speech segmentation using GMM’s (Gaussian Mixture Models) l 65% of the non-speech removed while preserving more than 98% of the speech. l mean duration of the speech segments is 40 seconds (already easy to handle) l performance in accordance with literature
7
S1S1 S2S2 S3S3 8 October 20027DARTS 2002 WP1 Results Segmentation of speech segments using BIC (Bayesian Information Criterion) l Recall = 65%: detection of 72.5% of the speaker changes, 24.3% of the acoustic condition changes, 19.0% false alarms l Recall = 72%: detection of 78.5% of the speaker changes, 37.4% of the acoustic condition changes, 41.3% false alarms l Results competitive with literature l Very fast algorithm (1 minute per hour)
8
S1S1 S2S2 S3S3 8 October 20028DARTS 2002 Work packages (cont’d) l WP2: detection and handling of OOV words: –extension of the lexicon (CCL): compounding module reduce OOV rate –augment recognition results with confidence measures (ESAT): OOV detection –phoneme-to-grapheme conversion (CNTS): transcribe OOV words
9
S1S1 S2S2 S3S3 8 October 20029DARTS 2002 Architecture Speech Recognizer input: speech output: text Confidence threshold Suspected OOV Phoneme Recognizer Phoneme string P2G Converter Spelling Spelling correction with large vocabulary Training Data
10
S1S1 S2S2 S3S3 8 October 200210DARTS 2002 WP2 Results Detection and handling of Out-Of-Vocabulary (OOV) words l Compounding module in combination with ASR: recognition accuracy does not drop because of shorter lexical units; after recomposition: 10 to 20% relative improvement on OOV-rate, compared with baseline
11
S1S1 S2S2 S3S3 8 October 200211DARTS 2002 WP2 Results Detection and handling of Out-Of-Vocabulary (OOV) words l Confidence measures with ASR: based on combination of measures from literature, plus own work l Phoneme-to-grapheme conversion based on machine learning methods
12
S1S1 S2S2 S3S3 8 October 200212DARTS 2002 P2G converter results Performance: all wordsOOVs grapheme-level75.963.8 word-level44.0 7.6 Spelling correction:Net effect: 8.6 (OOVs) (Simulated) interaction with speech recognizer: Increases WER, but improves readability
13
S1S1 S2S2 S3S3 8 October 200213DARTS 2002 Work packages (cont’d) l WP3: spontaneous speech problems: –detection of disfluencies (ELIS): use acoustic/prosodic features; supply info to HMM recognizer –statistical language model (ESAT): extend traditional trigram LM to incorporate hesitations, filled pauses, self-corrections, repetitions sequence of clean speech islands.
14
S1S1 S2S2 S3S3 8 October 200214DARTS 2002 Work packages (cont’d) l WP4: subtitling: –data collection and automatic alignment (CNTS) –input/output specifications (CCL): linguistic characteristics –subtitling: statistical approach (CNTS) –subtitling: linguistic approach (CCL) –hybrid system possible?
15
S1S1 S2S2 S3S3 8 October 200215DARTS 2002 Data collection and alignment News autocuesSubtitles (semi-)automatic alignment (semi-)automatic data capture Machine Learner Training Data Linguistic Annotation Classifier autocues subtitles
16
S1S1 S2S2 S3S3 8 October 200216DARTS 2002 Conclusions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.