Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.

Slides:

Advertisements

Similar presentations

Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:

Advertisements

Particle Filtering Sometimes |X| is too big to use exact inference

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

Hidden Markov Models. Room Wandering I’m going to wander around my house and tell you objects I see. Your task is to infer what room I’m in at every point.

Dynamic Bayesian Networks (DBNs)

Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.

Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.

Ch-9: Markov Models Prepared by Qaiser Abbas ( )

Hidden Markov Models Theory By Johan Walters (SR 2003)

1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Advanced Artificial Intelligence

Bayes Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics TexPoint fonts used in EMF. Read the.

1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

HMMs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew.

Application of HMMs: Speech recognition “Noisy channel” model of speech.

Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.

… Hidden Markov Models Markov assumption: Transition model:

Probabilistic inference

PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.

Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.

CS 188: Artificial Intelligence Fall 2009 Lecture 21: Speech Recognition 11/10/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.

Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.

1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.

CPSC 322, Lecture 31Slide 1 Probability and Time: Markov Models Computer Science cpsc322, Lecture 31 (Textbook Chpt 6.5) March, 25, 2009.

Markov Models. Markov Chain A sequence of states: X 1, X 2, X 3, … Usually over time The transition from X t-1 to X t depends only on X t-1 (Markov Property).

Hidden Markov Models David Meir Blei November 1, 1999.

Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.

Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.

CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.

CSE 515 Statistical Methods in Computer Science Instructor: Pedro Domingos.

CHAPTER 15 SECTION 3 – 4 Hidden Markov Models. Terminology.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

Recap: Reasoning Over Time  Stationary Markov models  Hidden Markov models X2X2 X1X1 X3X3 X4X4 rainsun X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3.

Sequence Models With slides by me, Joshua Goodman, Fei Xia.

22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))

CS 188: Artificial Intelligence Fall 2007 Lecture 22: Viterbi 11/13/2007 Dan Klein – UC Berkeley.

Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.

CS Statistical Machine learning Lecture 24

1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.

QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.

Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.

Automatic Speech Recognition

Statistical Models for Automatic Speech Recognition

CS 188: Artificial Intelligence Spring 2007

CSE 515 Statistical Methods in Computer Science

CS 188: Artificial Intelligence Fall 2008

Statistical Models for Automatic Speech Recognition

LECTURE 15: REESTIMATION, EM AND MIXTURES

Speech recognition, machine learning

Speech Recognition: Acoustic Waves

Speech recognition, machine learning

Presentation transcript:

Probabilistic reasoning over time Ch. 15, 17

Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games with multiple moves, planning In particular, the Bayesian networks we’ve seen so far describe static situations –Each random variable gets a single fixed value in a single problem instance Now we consider the problem of describing probabilistic environments that evolve over time –Examples: robot localization, tracking, speech, …

Hidden Markov Models At each time slice t, the state of the world is described by an unobservable variable X t and an observable evidence variable E t Transition model: distribution over the current state given the whole past history: P(X t | X 0, …, X t-1 ) = P(X t | X 0:t-1 ) Observation model: P(E t | X 0:t, E 1:t-1 ) X0X0 E1E1 X1X1 E t-1 X t-1 EtEt XtXt … E2E2 X2X2

Hidden Markov Models Markov assumption (first order) –The current state is conditionally independent of all the other states given the state in the previous time step –What does P(X t | X 0:t-1 ) simplify to? P(X t | X 0:t-1 ) = P(X t | X t-1 ) Markov assumption for observations –The evidence at time t depends only on the state at time t –What does P(E t | X 0:t, E 1:t-1 ) simplify to? P(E t | X 0:t, E 1:t-1 ) = P(E t | X t ) X0X0 E1E1 X1X1 E t-1 X t-1 EtEt XtXt … E2E2 X2X2

state evidence Example

state evidence Example Transition model Observation model

An alternative visualization R t = TR t = F R t-1 = T R t-1 = F U t = TU t = F R t = T R t = F Transition probabilities Observation (emission) probabilities R=TR=F U=T: 0.9 U=F: 0.1 U=T: 0.9 U=F: 0.1 U=T: 0.2 U=F: 0.8 U=T: 0.2 U=F: 0.8

Another example States: X = {home, office, cafe} Observations: E = {sms, facebook, } Slide credit: Andy White

The Joint Distribution Transition model: P(X t | X 0:t-1 ) = P(X t | X t-1 ) Observation model: P(E t | X 0:t, E 1:t-1 ) = P(E t | X t ) How do we compute the full joint P(X 0:t, E 1:t )? X0X0 E1E1 X1X1 E t-1 X t-1 EtEt XtXt … E2E2 X2X2

Review: Bayes net inference Computational complexity Special cases Parameter learning

Review: HMMs Transition model: P(X t | X 0:t-1 ) = P(X t | X t-1 ) Observation model: P(E t | X 0:t, E 1:t-1 ) = P(E t | X t ) How do we compute the full joint P(X 0:t, E 1:t )? X0X0 E1E1 X1X1 E t-1 X t-1 EtEt XtXt … E2E2 X2X2

HMM inference tasks Filtering: what is the distribution over the current state X t given all the evidence so far, e 1:t ? –The forward algorithm X0X0 E1E1 X1X1 E t-1 X t-1 EtEt XtXt … EkEk XkXk Query variable Evidence variables …

HMM inference tasks Filtering: what is the distribution over the current state X t given all the evidence so far, e 1:t ? Smoothing: what is the distribution of some state X k given the entire observation sequence e 1:t ? –The forward-backward algorithm X0X0 E1E1 X1X1 E t-1 X t-1 EtEt … EkEk XkXk … XtXt

HMM inference tasks Filtering: what is the distribution over the current state X t given all the evidence so far, e 1:t ? Smoothing: what is the distribution of some state X k given the entire observation sequence e 1:t ? Evaluation: compute the probability of a given observation sequence e 1:t X0X0 E1E1 X1X1 E t-1 X t-1 EtEt … EkEk XkXk … XtXt

HMM inference tasks Filtering: what is the distribution over the current state X t given all the evidence so far, e 1:t Smoothing: what is the distribution of some state X k given the entire observation sequence e 1:t ? Evaluation: compute the probability of a given observation sequence e 1:t Decoding: what is the most likely state sequence X 0:t given the observation sequence e 1:t ? –The Viterbi algorithm X0X0 E1E1 X1X1 E t-1 X t-1 EtEt … EkEk XkXk … XtXt

HMM Learning and Inference Inference tasks –Filtering: what is the distribution over the current state X t given all the evidence so far, e 1:t –Smoothing: what is the distribution of some state X k given the entire observation sequence e 1:t ? –Evaluation: compute the probability of a given observation sequence e 1:t –Decoding: what is the most likely state sequence X 0:t given the observation sequence e 1:t ? Learning –Given a training sample of sequences, learn the model parameters (transition and emission probabilities) EM algorithm

Applications of HMMs Speech recognition HMMs: –Observations are acoustic signals (continuous valued) –States are specific positions in specific words (so, tens of thousands) Machine translation HMMs: –Observations are words (tens of thousands) –States are translation options Robot tracking: –Observations are range readings (continuous) –States are positions on a map (continuous) Source: Tamara Berg

Application of HMMs: Speech recognition “Noisy channel” model of speech

Speech feature extraction Acoustic wave form Sampled at 8KHz, quantized to 8-12 bits Spectrogram Time Frequency Amplitude Frame (10 ms or 80 samples) Feature vector ~39 dim.

Speech feature extraction Acoustic wave form Sampled at 8KHz, quantized to 8-12 bits Spectrogram Time Frequency Amplitude Frame (10 ms or 80 samples) Feature vector ~39 dim.

Phonetic model Phones: speech sounds Phonemes: groups of speech sounds that have a unique meaning/function in a language (e.g., there are several different ways to pronounce “t”)

Phonetic model

HMM models for phones HMM states in most speech recognition systems correspond to subphones –There are around 60 phones and as many as 60 3 context-dependent triphones

HMM models for words

Putting words together Given a sequence of acoustic features, how do we find the corresponding word sequence?

Decoding with the Viterbi algorithm

Reference D. Jurafsky and J. Martin, “Speech and Language Processing,” 2 nd ed., Prentice Hall, 2008