Download presentation
Presentation is loading. Please wait.
1
Stockholm 6. Feb -04Robust Methods for Automatic Transcription and Alignment of Speech Signals1 Course presentation: Speech Recognition Leif Grönqvist (leifg@ling.gu.se) Växjö University (Mathematics and Systems Engineering) GSLT (Graduate School of Language Technology) Göteborg University (Department of Linguistics)
2
2Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 Introduction: GSLC GSLC (Göteborg Spoken Language Corpus): A multimodal corpus: Video and/or audio recording GTS (Göteborg Transcription Standard) Overlaps on word level, background information, and comments relevant for interaction MSO (Modified Standard Orthography) Closer to speech than written language NOT phonetic Keeps possibilities to compare to written language Designed for studies of natural speech in various activities 25 social activity types – 200 hours – 360 recordings – 1.3 million running words Recording/transcription only aligned for a few recordings, not word by word
3
3Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 Transcription example $L: he:ej $G: heej $L: heej hur haru haft d{et} i veckan @ $G: jättebra ja{g} tycker inte att du är ett svart hål $L: tycker [4 du inte d{et} va bra ]4 $V: [4 jo:e kolla lungan ]4 $G: ja{g} tycker att du e0 [5 blå å0 (...) ]5 jo d{et} kan du väl ändå [6 tycka tycker ja ]6 $L: [5 ja tycker inte att du e0 en röd stjärna ]5 $L: [6 nä ]6 $V: va{d} tycker [7 ni själva rå1 ]7 $L: [7 röd ja ]7 stjärna ja $G: kan du ö{h} sluta avbryta [8 oss vi håller ]8 $V: [8 va{d} e0 ni va{d} tycke{r} ni själva att ni ]8 e0 $L: (...) $G: va
4
4Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 MultiTool Prerelease 0.7 Browsing, searching, coding, counting Easy navigation through recordings Search in transcription, partiture, media file, or time scale Only manual alignment Partial alignment of specific events would help a lot!
5
5Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04
6
6Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 What can speech technology do for MultiTool? A lot of research I didn’t know about… Question: should we use the transcription or not? Yes: Automatic forced alignment on word level No: Speech recognition + alignment Yes, find the time for: Utterance start and end points Non speech annotations (coughing, whispering, click, loud, high pitch, glottalization, etc) and silent sections Easy-to-recognize speech sounds or words Find out if two utterances are uttered by the same person
7
7Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 Challenging task… Speech recognition/alignment work best with high quality sound signals Recordings of spontaneous speech in natural situations have some unwanted properties: Long distance between microphone and speaker Many speakers in the same signal Overlapped speech Unlimited vocabulary Whatever you call it: Disfluencies, repairs, repetitions, deletions, fragmental speech Various background noise Will any of the existing methods work here?
8
8Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 Existing research The Production of Speech Corpora (Schiel et. al.) – fully automatic methods with usable results: Segmentation into words, if known vocabulary and not very spontaneous speech Markup of prosodic features Time alignment of phonemes (+ probabilistic pronunciation rules give word alignment
9
9Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 Research, cont. Sentence boundary tagging (Stolcke & Shriberg 1996) Probabilities for boundaries between words HMM + Viterbi POS-tags improves Good sound quality Interesting, but sentences are not utterances Inter-word event tagging (Stolcke et. al. 1998) Events are disfluencies in general Input is forced alignment + acoustic features Not directly usable but, similar model and acoustic features may be useful for other events as well
10
10Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 HMM-based segmentation and alignment Find the most probable alignment for a sequence of words Sjölander (2003) describes an interesting system Very interesting! Reports correct alignment for 85.5% of boundaries within 20ms Will it work on noisy signals? A result of say 5% would be very useful I have tried to get the system…
11
11Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 Related tasks Intensity discrimination Easy to measure Useful as indicator for phoneme changes, etc. Voicing Determination and Fundamental Frequency Many methods: Cepstrum, probabilities based on weighted features Voicing patterns could give good hints when specific words occur. Glottalization and impulse detection Intensity and sudden f 0 decrease could be used Glottalization is marked in the transcription!
12
12Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 Robust alignment How could the algorithm used by Sjölander be revised for more robustness? f 0 (voicing) and glottalization detection + ordinary probabilities for phonemes could help Problem: the speech models will not give probabilities for phonemes in simultaneous speech Problem #2: GSLC does not contain phonetic transcription Would training on letters work? My guess: this will not work good enough Better approach to identify things that could be recognized since word-by-word alignment is not necessary
13
13Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 Conclusion First thing to try: Sjölander’s aligner Second: Spoken event tagger Identify events that could be recognized Identify useful acoustic features May for example a decision tree help to recognize the events? Lots of test and experiments will be needed, if the forced alignment doesn’t give useable results
14
14Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 The End! Thank you for listening ??? !!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.