CS 188: Artificial Intelligence Fall 2009 Lecture 21: Speech Recognition 11/10/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.

Slides:

Advertisements

Similar presentations

Particle Filtering Sometimes |X| is too big to use exact inference

Advertisements

Building an ASR using HTK CS4706

Speech Recognition with Hidden Markov Models Winter 2011

Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

Hidden Markov Models Theory By Johan Walters (SR 2003)

Advanced Artificial Intelligence

HMMs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew.

Natural Language Processing - Speech Processing -

Application of HMMs: Speech recognition “Noisy channel” model of speech.

Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.

… Hidden Markov Models Markov assumption: Transition model:

CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.

Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.

Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.

1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.

COMP 4060 Natural Language Processing Speech Processing.

Maximum Likelihood (ML), Expectation Maximization (EM)

CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.

Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.

CHAPTER 15 SECTION 3 – 4 Hidden Markov Models. Terminology.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.

Isolated-Word Speech Recognition Using Hidden Markov Models

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

QUIZ!!  T/F: The forward algorithm is really variable elimination, over time. TRUE  T/F: Particle Filtering is really sampling, over time. TRUE  T/F:

Harmonics November 1, 2010 What’s next? We’re halfway through grading the mid-terms. For the next two weeks: more acoustics It’s going to get worse before.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Sequence Models With slides by me, Joshua Goodman, Fei Xia.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))

CS 188: Artificial Intelligence Fall 2007 Lecture 22: Viterbi 11/13/2007 Dan Klein – UC Berkeley.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio

Statistical NLP Spring 2011

CS Statistical Machine learning Lecture 24

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

Machine Learning  Up until now: how to reason in a model and how to make optimal decisions  Machine learning: how to acquire a model on the basis of.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.

Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.

Bayes Nets & HMMs Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

CS 188: Artificial Intelligence Spring 2007 Speech Recognition 03/20/2007 Srini Narayanan – ICSI and UC Berkeley.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Reasoning over Time  Often, we want to reason about a sequence of observations  Speech recognition  Robot localization  User attention  Medical monitoring.

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.

Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.

Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.

Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.

Harmonics October 28, Where Were We? Mid-terms: our goal is to get them back to you by Friday. Production Exercise #2 results should be sent to.

Automatic Speech Recognition

HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.

CS 188: Artificial Intelligence Fall 2006

Statistical Models for Automatic Speech Recognition

Speech Processing Speech Recognition

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Fall 2008

Statistical Models for Automatic Speech Recognition

CS 188: Artificial Intelligence Spring 2006

Speech recognition, machine learning

Speech Recognition: Acoustic Waves

HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.

CS 188: Artificial Intelligence Fall 2008

Speech recognition, machine learning

Presentation transcript:

CS 188: Artificial Intelligence Fall 2009 Lecture 21: Speech Recognition 11/10/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA

Announcements  Written 3 due on Thursday night  Extra OHs before then: see web page  Review session? TBA  Project 4 up!  Due 11/19  Course contest update  You can qualify for the final tournament starting tonight! 2

Today  HMMs: Most likely explanation queries  Speech recognition  A massive HMM!  Details of this section not required  Start machine learning 3

Speech and Language  Speech technologies  Automatic speech recognition (ASR)  Text-to-speech synthesis (TTS)  Dialog systems  Language processing technologies  Machine translation  Information extraction  Web search, question answering  Text classification, spam filtering, etc…

HMMs: MLE Queries  HMMs defined by  States X  Observations E  Initial distr:  Transitions:  Emissions:  Query: most likely explanation: X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3 E4E4 E5E5 5

State Path Trellis  State trellis: graph of states and transitions over time  Each arc represents some transition  Each arc has weight  Each path is a sequence of states  The product of weights on a path is the seq’s probability  Can think of the Forward (and now Viterbi) algorithms as computing sums of all paths (best paths) in this graph sun rain sun rain sun rain sun rain 6

Viterbi Algorithm sun rain sun rain sun rain sun rain 7

Example 8

Digitizing Speech 9

Speech in an Hour  Speech input is an acoustic wave form s p ee ch l a b Graphs from Simon Arnfield’s web tutorial on speech, Sheffield: “l” to “a” transition: 10

 Frequency gives pitch; amplitude gives volume  sampling at ~8 kHz phone, ~16 kHz mic (kHz=1000 cycles/sec)  Fourier transform of wave displayed as a spectrogram  darkness indicates energy at each frequency s p ee ch l a b frequency amplitude Spectral Analysis 11

Adding 100 Hz Hz Waves 12

Spectrum Frequency in Hz Amplitude Frequency components (100 and 1000 Hz) on x-axis 13

Part of [ae] from “lab”  Note complex wave repeating nine times in figure  Plus smaller waves which repeats 4 times for every large pattern  Large wave has frequency of 250 Hz (9 times in.036 seconds)  Small wave roughly 4 times this, or roughly 1000 Hz  Two little tiny waves on top of peak of 1000 Hz waves 14

Back to Spectra  Spectrum represents these freq components  Computed by Fourier transform, algorithm which separates out each frequency component of wave.  x-axis shows frequency, y-axis shows magnitude (in decibels, a log measure of amplitude)  Peaks at 930 Hz, 1860 Hz, and 3020 Hz. 15 [ demo ]

Resonances of the vocal tract  The human vocal tract as an open tube  Air in a tube of a given length will tend to vibrate at resonance frequency of tube.  Constraint: Pressure differential should be maximal at (closed) glottal end and minimal at (open) lip end. Closed end Open end Length 17.5 cm. Figure from W. Barry Speech Science slides 16

From Mark Liberman’s website 17 [ demo ]

Acoustic Feature Sequence  Time slices are translated into acoustic feature vectors (~39 real numbers per slice)  These are the observations, now we need the hidden states X frequency …………………………………………….. e 12 e 13 e 14 e 15 e 16 ……….. 18

State Space  P(E|X) encodes which acoustic vectors are appropriate for each phoneme (each kind of sound)  P(X|X’) encodes how sounds can be strung together  We will have one state for each sound in each word  From some state x, can only:  Stay in the same state (e.g. speaking slowly)  Move to the next position in the word  At the end of the word, move to the start of the next word  We build a little state graph for each word and chain them together to form our state space X 19

HMMs for Speech 20

Transitions with Bigrams Figure from Huang et al page

Decoding  While there are some practical issues, finding the words given the acoustics is an HMM inference problem  We want to know which state sequence x 1:T is most likely given the evidence e 1:T :  From the sequence x, we can simply read off the words 22

End of Part II!  Now we’re done with our unit on probabilistic reasoning  Last part of class: machine learning 23

Parameter Estimation  Estimating the distribution of a random variable  Elicitation: ask a human!  Usually need domain experts, and sophisticated ways of eliciting probabilities (e.g. betting games)  Trouble calibrating  Empirically: use training data  For each outcome x, look at the empirical rate of that value:  This is the estimate that maximizes the likelihood of the data rgg

Estimation: Smoothing  Relative frequencies are the maximum likelihood estimates  In Bayesian statistics, we think of the parameters as just another random variable, with its own distribution ????

Estimation: Laplace Smoothing  Laplace’s estimate:  Pretend you saw every outcome once more than you actually did  Can derive this as a MAP estimate with Dirichlet priors (see cs281a) HHT

Estimation: Laplace Smoothing  Laplace’s estimate (extended):  Pretend you saw every outcome k extra times  What’s Laplace with k = 0?  k is the strength of the prior  Laplace for conditionals:  Smooth each condition independently: HHT