Download presentation
Presentation is loading. Please wait.
Published byAileen Morton Modified over 9 years ago
1
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida October 6, 2004
2
What is ASR? Automatic Speech Recognition is: –A system that converts a raw acoustic signal into phonetically meaningful text. –A combination of engineering, linguistics, statistics, psychoacoustics, and computer science.
3
“seven” Psychoacousticians provide expert knowledge about human acoustic perception. Engineers provide efficient algorithms and hardware. Linguists provide language rules. What is ASR? Feature extractionClassificationLanguage model Computer scientists and statisticians provide optimum modeling.
4
Feature extraction Acoustic-phonetic paradigm (pre 1980): –Holistic features (voicing and frication measures, durations, formants and BW) –Difficult to construct robust classifiers Frame-based paradigm (1980 to today): –Short (20 ms) sliding analysis window, assumes speech frame is quasi-stationarity –Relies on classifier to account for speech nonstationarity –Allows for the inclusion of expert information of speech perception
5
Feature extraction algorithms Cepstrum (1962) Linear prediction (1967) Mel frequency cepstral coefficients (Davis & Mermelstein, 1980) Perceptual linear prediction (Hermansky,1990) Human factor cepstral coefficients (Skowronski & Harris, 2002)
6
“seven” Cepstral domain DCT Log energy Mel-scaled filter bank Fourier x(t) Time Filter # MFCC algorithm
7
Classification Operates on frame-based features Accounts for time variations of speech Uses training data to transform features into symbols (phonemes, bi-/tri-phones, words) Non-parametric: Dynamic time warp (DTW) –No parameters to estimate –Computationally expensive, scaling issues Parametric: Hidden Markov model (HMM) –State-of-the-art model, complements features –Data-intensive, scales well
8
HMM classification A Hidden Markov Model is a piecewise stationary model of a nonstationary signal. Model characteristics states: represent domains of piecewise stationarity interstate connections: defines model architecture parameters: pdf means & covariance
9
HMM diagram Time domain State space Feature space
10
Symbol# ModelsPositiveNegative Word <1000CoarticulationScaling Phoneme40pdf estimationCoarticulation Biphone1400 Triphone40KCoarticulationpdf estimation TRADEOFF HMM output symbols
11
Language models Considers multiple output symbol hypotheses Delays making hard decision on classifier output Uses language-based expert knowledge to predict meaningful words/phrases from classifier output N-phones/word symbols Major research topic since early 1990s with advent of large speech corpora
12
ASR Problems Test/Train mismatch Speaker variations (gender, accent, mood) Weak model assumptions Noise: energetic or informational (babble) Current state-of-the-art does not model the human brain nor function with the accuracy or reliability of humans Most progress of late comes from faster computers, not new ideas
13
Conclusions Automatic speech recognition technology emerges from several diverse disciplines –Acousticians describe how speech is produced and perceived by humans –Computer scientists create machine learning models for signal-to-symbol conversion –Linguists provide language information –Engineers optimize the algorithms and provide the hardware, and put the pieces together
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.