Stockholm 6. Feb -04Robust Methods for Automatic Transcription and Alignment of Speech Signals1 Course presentation: Speech Recognition Leif Grönqvist.

Stockholm 6. Feb -04Robust Methods for Automatic Transcription and Alignment of Speech Signals1 Course presentation: Speech Recognition Leif Grönqvist (leifg@ling.gu.se) Växjö University (Mathematics and Systems Engineering) GSLT (Graduate School of Language Technology) Göteborg University (Department of Linguistics)

2Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 Introduction: GSLC  GSLC (Göteborg Spoken Language Corpus): A multimodal corpus: Video and/or audio recording GTS (Göteborg Transcription Standard)  Overlaps on word level, background information, and comments relevant for interaction MSO (Modified Standard Orthography)  Closer to speech than written language  NOT phonetic  Keeps possibilities to compare to written language Designed for studies of natural speech in various activities 25 social activity types – 200 hours – 360 recordings – 1.3 million running words Recording/transcription only aligned for a few recordings, not word by word

3Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 Transcription example $L: he:ej $G: heej $L: heej hur haru haft d{et} i veckan @ $G: jättebra ja{g} tycker inte att du är ett svart hål $L: tycker [4 du inte d{et} va bra ]4 $V: [4 jo:e kolla lungan ]4 $G: ja{g} tycker att du e0 [5 blå å0 (...) ]5 jo d{et} kan du väl ändå [6 tycka tycker ja ]6 $L: [5 ja tycker inte att du e0 en röd stjärna ]5 $L: [6 nä ]6 $V: va{d} tycker [7 ni själva rå1 ]7 $L: [7 röd ja ]7 stjärna ja $G: kan du ö{h} sluta avbryta [8 oss vi håller ]8 $V: [8 va{d} e0 ni va{d} tycke{r} ni själva att ni ]8 e0 $L: (...) $G: va

4Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 MultiTool  Prerelease 0.7  Browsing, searching, coding, counting Easy navigation through recordings Search in transcription, partiture, media file, or time scale  Only manual alignment  Partial alignment of specific events would help a lot!

5Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04

6Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 What can speech technology do for MultiTool?  A lot of research I didn’t know about…  Question: should we use the transcription or not? Yes: Automatic forced alignment on word level No: Speech recognition + alignment  Yes, find the time for: Utterance start and end points Non speech annotations (coughing, whispering, click, loud, high pitch, glottalization, etc) and silent sections Easy-to-recognize speech sounds or words Find out if two utterances are uttered by the same person

7Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 Challenging task…  Speech recognition/alignment work best with high quality sound signals  Recordings of spontaneous speech in natural situations have some unwanted properties: Long distance between microphone and speaker Many speakers in the same signal Overlapped speech Unlimited vocabulary Whatever you call it: Disfluencies, repairs, repetitions, deletions, fragmental speech Various background noise  Will any of the existing methods work here?

8Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 Existing research  The Production of Speech Corpora (Schiel et. al.) – fully automatic methods with usable results: Segmentation into words, if known vocabulary and not very spontaneous speech Markup of prosodic features Time alignment of phonemes (+ probabilistic pronunciation rules give word alignment

9Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 Research, cont.  Sentence boundary tagging (Stolcke & Shriberg 1996) Probabilities for boundaries between words HMM + Viterbi POS-tags improves Good sound quality Interesting, but sentences are not utterances  Inter-word event tagging (Stolcke et. al. 1998) Events are disfluencies in general Input is forced alignment + acoustic features  Not directly usable but, similar model and acoustic features may be useful for other events as well

10Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 HMM-based segmentation and alignment  Find the most probable alignment for a sequence of words  Sjölander (2003) describes an interesting system Very interesting! Reports correct alignment for 85.5% of boundaries within 20ms Will it work on noisy signals? A result of say 5% would be very useful I have tried to get the system…

11Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 Related tasks  Intensity discrimination Easy to measure Useful as indicator for phoneme changes, etc.  Voicing Determination and Fundamental Frequency Many methods: Cepstrum, probabilities based on weighted features Voicing patterns could give good hints when specific words occur.  Glottalization and impulse detection Intensity and sudden f 0 decrease could be used Glottalization is marked in the transcription!

12Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 Robust alignment  How could the algorithm used by Sjölander be revised for more robustness?  f 0 (voicing) and glottalization detection + ordinary probabilities for phonemes could help  Problem: the speech models will not give probabilities for phonemes in simultaneous speech  Problem #2: GSLC does not contain phonetic transcription Would training on letters work?  My guess: this will not work good enough  Better approach to identify things that could be recognized since word-by-word alignment is not necessary

13Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 Conclusion  First thing to try: Sjölander’s aligner  Second: Spoken event tagger Identify events that could be recognized Identify useful acoustic features May for example a decision tree help to recognize the events?  Lots of test and experiments will be needed, if the forced alignment doesn’t give useable results

14Robust Methods for Automatic Transcription and Alignment of Speech SignalsStockholm 6. Feb -04 The End! Thank you for listening ??? !!!

Stockholm 6. Feb -04Robust Methods for Automatic Transcription and Alignment of Speech Signals1 Course presentation: Speech Recognition Leif Grönqvist.

Similar presentations

Presentation on theme: "Stockholm 6. Feb -04Robust Methods for Automatic Transcription and Alignment of Speech Signals1 Course presentation: Speech Recognition Leif Grönqvist."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Stockholm 6. Feb -04Robust Methods for Automatic Transcription and Alignment of Speech Signals1 Course presentation: Speech Recognition Leif Grönqvist.

Similar presentations

Presentation on theme: "Stockholm 6. Feb -04Robust Methods for Automatic Transcription and Alignment of Speech Signals1 Course presentation: Speech Recognition Leif Grönqvist."— Presentation transcript:

Similar presentations

About project

Feedback