Speech Recognition. What makes speech recognition hard?

Slides:



Advertisements
Similar presentations
Hidden Markov Models (HMM) Rabiner’s Paper
Advertisements

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Angelo Dalli Department of Intelligent Computing Systems
15.0 Utterance Verification and Keyword/Key Phrase Spotting References: 1. “Speech Recognition and Utterance Verification Based on a Generalized Confidence.
Homework 3: Naive Bayes Classification
CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 3: ASR: HMMs, Forward, Viterbi.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.
Introduction to Hidden Markov Models
Hidden Markov Models Dave DeBarr
Hidden Markov Models Theory By Johan Walters (SR 2003)
What is the temporal feature in video sequences?
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
EE225D Final Project Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye EE225D Final Project.
Why is ASR Hard? Natural speech is continuous
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Bayesian Networks Textbook: Probabilistic Reasoning, Sections 1-2, pp
Introduction to Automatic Speech Recognition
Isolated-Word Speech Recognition Using Hidden Markov Models
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
Page 0 of 14 Dynamical Invariants of an Attractor and potential applications for speech data Saurabh Prasad Intelligent Electronic Systems Human and Systems.
Speech and Language Processing
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
7-Speech Recognition Speech Recognition Concepts
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Speech, Perception, & AI Artificial Intelligence CMSC March 5, 2002.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
Performance Comparison of Speaker and Emotion Recognition
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
A NONPARAMETRIC BAYESIAN APPROACH FOR
CS 224S / LINGUIST 285 Spoken Language Processing
Automatic Speech Recognition
Online Multiscale Dynamic Topic Models
Automatic Speech Recognition Introduction
Statistical Models for Automatic Speech Recognition
Computational NeuroEngineering Lab
Automatic Speech Recognition
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Speech Processing Speech Recognition
Statistical Models for Automatic Speech Recognition
Speech Processing Speech Recognition
Lecture 10: Speech Recognition (II) October 28, 2004 Dan Jurafsky
Speech Recognition: Acoustic Waves
Listen Attend and Spell – a brief introduction
Presentation transcript:

Speech Recognition

What makes speech recognition hard?

Speech Recognition Task: Identify sequence of words uttered by speaker, given acoustic waveform. Uncertainty introduced by noise, speaker error, variation in pronunciation, homonyms, etc. Thus speech recognition is viewed as problem of probabilistic inference.

Example: “I’m firsty, um, can I haf somefing to dwink?” From Russell and Norvig, Artificial Intelligence

Speech Recognition System Architecture (from Buchsbaum & Giancarlo paper) Here, “lattice” means “Hidden Markov Model” Acoustic feature extraction Acoustic Features–>Phones model Phones–>Word pronounciation model Language model

Acoustic feature extraction From Russell and Norvig, Artificial Intelligence

Hidden Markov Models Markov model: Given state X t, what is probability of transitioning to next state X t+1 ? E.g., word bigram probabilities give P (word t+1 | word t ) Hidden Markov model: There are observable states (e.g., signal S) and “hidden” states (e.g., Words). HMM represents probabilities of hidden states given observable states.

Phone model P( phone | frame features) =  P(frame features| phone) P(phone) P(frame features| phone) often represented by Gaussian mixture model

From Russell and Norvig, Artificial Intelligence Acoustic Features–>Phones model

Word Pronunciation model Now we want P (words|phones 1:t ) =  P(phones 1:t | words) P(words) Represent P(phones 1:t | words) as an HMM Phones–>Word pronounciation model

Example of Phones–>Word pronounciation model From Russell and Norvig, Artificial Intelligence

Language model

To build a speech recognition system, need: Lots of data Acoustic signal processing tools Methods for learning various probability models Methods for “maximum likelihood” calculation (i.e., search or “decoding”): Suppose we have observations (features from acoustic signal) O= (o 1 o 2 o 3 … o n ). We want to find W* = (w 1 w 2 w 3 … w n ) such that

To build a speech recognition system, need: Lots of data Acoustic signal processing tools Methods for learning various probability models Methods for “maximum likelihood” calculation (i.e., search or “decoding”): Suppose we have observations (features from acoustic signal) O= (o 1 o 2 o 3 … o n ). We want to find W* = (w 1 w 2 w 3 … w n ) such that Language model Combine phone models, segmentation models, word pronunciation models Search or “decoding” method

To build a speech recognition system, need: Lots of data Acoustic signal processing tools Methods for learning various probability models Methods for “maximum likelihood” calculation (i.e., search or “decoding”): Suppose we have observations (features from acoustic signal) O= (o 1 o 2 o 3 … o n ). We want to find W* = (w 1 w 2 w 3 … w n ) such that Language model Combine phone models, segmentation models, word pronunciation models Search or “decoding” method

Emotion recognition in speech (by OES high-school students!)