Automatic Transcript Generation Helmer Strik A 2 RT Dept. of Language & Speech University of Nijmegen
Problem & Solution Problem: –We have Audio from radio & TV –We need Transcripts Solution ASR: Automatic Speech Recognition
History of ASR It all started more than 100 years ago
History of ASR Alexander Graham Bell: Make speech visible, for the hearing impaired AT&T Bell Laboratories: 1st ASR - ten English digits ASR is ‘everywhere’ : –PC: dictation + ‘Command & Control’ –mobile phones (hands free) –call-centers –tap phone calls
First: A/D-conversion Mic. + sound card Before ASR: A/D-conversion WAV file- digital & discrete Speech- analogue & continuous
What is ASR? Answer: conversion from speech to text ASR W: a string of words X: unknown speech signal
How: probabilistic approach Find W that max. P(W|X) P(W|X) = P(X|W) * P(W) / P(X) P(W) - language model P(X|W) - acoustic model –Whole word models –Phoneme models + Lexicon
ASR ASR = Phoneme models (HMMs) Lexicon Language model P(X|W) P(W)
Training HMMs & LMs are trained: Training procedure ASR: HMMs (Hidden Markov Models) Language Models Speech + manual transcripts (lexicon)
Decoding Automatic Transcript Generation: ASR W: the automatic transcripts X: unknown speech signal
C-3PO - 6 million languages
MUMIS