Automatic Speech Recognition

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 3: ASR: HMMs, Forward, Viterbi.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Speech Recognition. What makes speech recognition hard?
1 LSA 352 Summer 2007 LSA 352: Speech Recognition and Synthesis Dan Jurafsky Lecture 5: Intro to ASR+HMMs: Forward, Viterbi, Baum-Welch IP Notice:
Learning, Uncertainty, and Information Big Ideas November 8, 2004.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Hidden Markov Models David Meir Blei November 1, 1999.
CS 224S / LINGUIST 281 Speech Recognition, Synthesis, and Dialogue
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
Speech and Language Processing
 Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors.
7-Speech Recognition Speech Recognition Concepts
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
A NONPARAMETRIC BAYESIAN APPROACH FOR
CS 224S / LINGUIST 285 Spoken Language Processing
Speech Recognition and Synthesis
Learning, Uncertainty, and Information: Learning Parameters
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Statistical Models for Automatic Speech Recognition
Search and Decoding in Speech Recognition
Computational NeuroEngineering Lab
Automatic Speech Recognition
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Search and Decoding in Speech Recognition
CSC 594 Topics in AI – Natural Language Processing
Speech Processing Speech Recognition
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
CSCI 5822 Probabilistic Models of Human and Machine Learning
CRANDEM: Conditional Random Fields for ASR
Statistical Models for Automatic Speech Recognition
Hidden Markov Model LR Rabiner
Automatic Speech Recognition: Conditional Random Fields for ASR
Hassanin M. Al-Barhamtoshy
Speech Processing Speech Recognition
Lecture 10: Speech Recognition (II) October 28, 2004 Dan Jurafsky
LECTURE 15: REESTIMATION, EM AND MIXTURES
Automatic Speech Recognition
Speech recognition, machine learning
Speech Recognition: Acoustic Waves
Automatic Speech Recognition
Speech recognition, machine learning
Search and Decoding in Speech Recognition
Presentation transcript:

Automatic Speech Recognition

The Malevolent Hal

WE’RE Not Quite There Yet (and Lucky for Us) But what is an error?

The Model Speech is variable. An acoustic utterance will never match any model exactly. Conclusion: Speech Recognition is a special case of Bayesian inference.

Goal Of a Probabilistic Noisy Channel Architecture What is the most likely sequence of words W out of all word sequences in a language L given some acoustic input O? Where O is a sequence of observations 0=o1, o2 , o3 , ..., ot each oi is a floating point value representing ~10ms worth of energy of that slice of 0 And w=w1, w2 , w3 ,..., wn each wi is a word in L

ASR as a Conditional Probability We have to invoke you know whom

Bayes Rule Let’s us transform   To   In fact

New Terms Feature Extraction Acoustic waveform is sampled into frames (10, 15, 20 ms) Transformed into a vector of 39 features Acoustic Model or Phone Recognition (likelihoods) compute the likelihood of observed features given linguistic units (words, phones, triphones): p(O|W) output is sequence of probabilities, one for each time frame contains the likelihooods that each linguistic unit generated the acoustic feature vector

New Wine in Old Bottles Language Modeling (priors) Lexicon bigrams/trigrams/quadrigrams of words in a lexicon Lexicon A list of words with a pronunciation for each word expressed as a phone sequence Decoding (Viterbi) Combines the acoustic model, language model and lexicon to produce the most probable sequence of words Training Filling in the HMM lattice using the Baum-Welch (forward-backward) algorithm Observations: acoustic signals, information about the waveform at that point in time Hidden states: phones/triphones

The architecture

It’s More Complex Phones can extend beyond 1 s That’s 100 frames But they’re not acoustically identical

[ay k] ~.45s Notice F2 rises and F1 falls and the difference between silence and release parts of [k]

Conclusion Phones are non-homogenous over time Modeled with beginning, middle, end six ([s ih k s]

The Formal Model B = bi(ot) = p(ot|qi) The probability of a feature vector being generated by subphone state i