Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida October 6, 2004

What is ASR? Automatic Speech Recognition is: –A system that converts a raw acoustic signal into phonetically meaningful text. –A combination of engineering, linguistics, statistics, psychoacoustics, and computer science.

“seven” Psychoacousticians provide expert knowledge about human acoustic perception. Engineers provide efficient algorithms and hardware. Linguists provide language rules. What is ASR? Feature extractionClassificationLanguage model Computer scientists and statisticians provide optimum modeling.

Feature extraction Acoustic-phonetic paradigm (pre 1980): –Holistic features (voicing and frication measures, durations, formants and BW) –Difficult to construct robust classifiers Frame-based paradigm (1980 to today): –Short (20 ms) sliding analysis window, assumes speech frame is quasi-stationarity –Relies on classifier to account for speech nonstationarity –Allows for the inclusion of expert information of speech perception

Feature extraction algorithms Cepstrum (1962) Linear prediction (1967) Mel frequency cepstral coefficients (Davis & Mermelstein, 1980) Perceptual linear prediction (Hermansky,1990) Human factor cepstral coefficients (Skowronski & Harris, 2002)

“seven” Cepstral domain DCT Log energy Mel-scaled filter bank Fourier x(t) Time Filter # MFCC algorithm

Classification Operates on frame-based features Accounts for time variations of speech Uses training data to transform features into symbols (phonemes, bi-/tri-phones, words) Non-parametric: Dynamic time warp (DTW) –No parameters to estimate –Computationally expensive, scaling issues Parametric: Hidden Markov model (HMM) –State-of-the-art model, complements features –Data-intensive, scales well

HMM classification A Hidden Markov Model is a piecewise stationary model of a nonstationary signal. Model characteristics states: represent domains of piecewise stationarity interstate connections: defines model architecture parameters: pdf means & covariance

HMM diagram Time domain State space Feature space

Symbol# ModelsPositiveNegative Word <1000CoarticulationScaling Phoneme40pdf estimationCoarticulation Biphone1400 Triphone40KCoarticulationpdf estimation TRADEOFF HMM output symbols

Language models Considers multiple output symbol hypotheses Delays making hard decision on classifier output Uses language-based expert knowledge to predict meaningful words/phrases from classifier output N-phones/word symbols Major research topic since early 1990s with advent of large speech corpora

ASR Problems Test/Train mismatch Speaker variations (gender, accent, mood) Weak model assumptions Noise: energetic or informational (babble) Current state-of-the-art does not model the human brain nor function with the accuracy or reliability of humans Most progress of late comes from faster computers, not new ideas

Conclusions Automatic speech recognition technology emerges from several diverse disciplines –Acousticians describe how speech is produced and perceived by humans –Computer scientists create machine learning models for signal-to-symbol conversion –Linguists provide language information –Engineers optimize the algorithms and provide the hardware, and put the pieces together

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

Similar presentations

Presentation on theme: "Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

Similar presentations

Presentation on theme: "Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and."— Presentation transcript:

Similar presentations

About project

Feedback