CSCI 5822 Probabilistic Models of Human and Machine Learning

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
Hidden Markov Models. Room Wandering I’m going to wander around my house and tell you objects I see. Your task is to infer what room I’m in at every point.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Hidden Markov Models.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Modified from:
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
… Hidden Markov Models Markov assumption: Transition model:
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Ch 13. Sequential Data (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Kim Jin-young Biointelligence Laboratory, Seoul.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
. Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss.
Hidden Markov Models David Meir Blei November 1, 1999.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
CHAPTER 15 SECTION 3 – 4 Hidden Markov Models. Terminology.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Evaluation Decoding Dynamic Programming.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CS Statistical Machine learning Lecture 24
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
Pattern Recognition and Machine Learning-Chapter 13: Sequential Data
Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Other Models for Time Series. The Hidden Markov Model (HMM)
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models – Concepts 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
Hidden Markov Models.
Probabilistic reasoning over time
LECTURE 15: HMMS – EVALUATION AND DECODING
Instructor: Vincent Conitzer
CSCI 5822 Probabilistic Models of Human and Machine Learning
Hidden Markov Models Part 2: Algorithms
CSCI 5822 Probabilistic Models of Human and Machine Learning
CSCI 5822 Probabilistic Models of Human and Machine Learning
LECTURE 14: HMMS – EVALUATION AND DECODING
Hidden Markov Models (HMMs)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
LECTURE 15: REESTIMATION, EM AND MIXTURES
Speech recognition, machine learning
Instructor: Vincent Conitzer
Instructor: Vincent Conitzer
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Speech recognition, machine learning
Presentation transcript:

CSCI 5822 Probabilistic Models of Human and Machine Learning Mike Mozer Department of Computer Science and Institute of Cognitive Science University of Colorado at Boulder

Hidden Markov Models

Room Wandering I’m going to wander around my house and tell you objects I see. Your task is to infer what room I’m in at every point in time.

Observations Sink Toilet Towel Bed Bookcase Bench Television Couch Pillow … {bathroom, kitchen, laundry room} {bathroom} {bedroom} {bedroom, living room} {bedroom, living room, entry} {living room} {living room, bedroom, entry} …

Another Example: The Occasionally Corrupt Casino A casino uses a fair die most of the time, but occasionally switches to a loaded one Observation probabilities Fair die: Prob(1) = Prob(2) = . . . = Prob(6) = 1/6 Loaded die: Prob(1) = Prob(2) = . . . = Prob(5) = 1/10, Prob(6) = ½ Transition probabilities Prob(Fair | Loaded) = 0.01 Prob(Loaded | Fair) = 0.2 Transitions between states obey a Markov process

Another Example: The Occasionally Corrupt Casino Suppose we know how the casino operates, and we observe a series of die tosses 3 4 1 5 2 5 6 6 6 4 6 6 6 1 5 3 Can we infer which die was used? F F F F F F L L L L L L L F F F Inference requires examination of sequence not individual trials. Your best guess about the current instant can be informed by future observations.

Formalizing This Problem Observations over time Y(1), Y(2), Y(3), … Hidden (unobserved) state S(1), S(2), S(3), … Hidden state is discrete Here, observations are also discrete but can be continuous Y(t) depends on S(t) S(t+1) depends on S(t)

Hidden Markov Model Markov Process Given the present state, earlier observations provide no information about the future Given the present state, past and future are independent

Application Domains Character recognition Word / string recognition

Application Domains Speech recognition

Application Domains Action/Activity Recognition Factorial HMM – we’ll discuss Figures courtesy of B. K. Sin

HMM Is A Probabilistic Generative Model   hidden state observations

Inference on HMM State inference and estimation Prediction P(S(t)|Y(1),…,Y(t)) Given a series of observations, what’s the current hidden state? P(S|Y) Given a series of observations, what is the joint distribution over hidden states? argmaxS[P(S|Y)] Given a series of observations, what’s the most likely values of the hidden state? (decoding problem) Prediction P(Y(t+1)|Y(1),…,Y(t)) Given a series of observations, what observation will come next? Evaluation and Learning P(Y|𝜃,𝜀,𝜋) Given a series of observations, what is the probability that the observations were generated by the model? argmax𝜃,𝜀,𝜋 P(Y|𝜃,𝜀,𝜋) What model parameters maximize the likelihood of the data? Like all probabilistic generative models, HMM can be used for various sorts of inference

Is Inference Hopeless? Complexity is O(NT) S1 S2 S3 ST X1 X2 X3 XT S1 … 1 2 N … 1 1 2 K … 1 2 N … … 2 2 N Dynamic programming -> O(T * N^2)? S1 S2 S3 ST X1 X2 X3 XT Complexity is O(NT) S1 S1 S1 S1

State Inference: Forward Agorithm Goal: Compute P(St | Y1…t) ~ P(St, Y1…t) ≐αt(St) Computational Complexity: O(T N2) DERIVATION ON NEXT SLIDE

Deriving The Forward Algorithm Notation change warning: n ≅ current time (was t)   Slide stolen from Dirk Husmeier

What Can We Do With α? Notation change warning: n ≅ current time (was t)  

State Inference: Forward-Backward Algorithm Goal: Compute P(St | Y1…T) NOTE: capital T * joint proportional to conditional * chain rule to break out Y1...t * ignore Y1...t because it's a constant over St * use Bayes rule * conditional independent of Y1...t and Y...T given St

Optimal State Estimation  

Viterbi Algorithm: Finding The Most Likely State Sequence Notation change warning: n ≅ current time step (previously t) N ≅ total number time steps (prev. T) gamma: take the best sequence up to the current time and then explore alternative states like alpha except with max instead of sum Slide stolen from Dirk Husmeier

Viterbi Algorithm Relation between Viterbi and forward algorithms Viterbi uses max operator Forward algorithm uses summation operator Can recover state sequence by remembering best S at each step n Practical issue Long chain of probabilities -> underflow compute with logarithms – see next slide

Practical Trick: Operate With Logarithms Notation change warning: n ≅ current time step (previously t) N ≅ total number time steps (prev. T) Prevents numerical underflow All the math works out to compute log gamma incrementally

Training HMM Parameters Baum-Welsh algorithm, special case of Expectation-Maximization (EM) 1. Make initial guess at model parameters 2. Given observation sequence, compute hidden state posteriors, P(St | Y1…T, π,θ,ε) for t = 1 … T 3. Update model parameters {π,θ,ε} based on inferred state Guaranteed to move uphill in total probability of the observation sequence: P(Y1…T | π,θ,ε) May get stuck in local optima model parameter updates: NEXT SLIDE

Updating Model Parameters  

Using HMM For Classification Suppose we want to recognize spoken digits 0, 1, …, 9 Each HMM is a model of the production of one digit, and specifies P(Y|Mi) Y: observed acoustic sequence Note: Y can be a continuous RV Mi: model for digit i We want to compute model posteriors: P(Mi|Y) Use Bayes’ rule

Factorial HMM

Tree-Structured HMM Input as well as output (e.g., control signal is X, response is Y)

The Landscape Discrete state space Continuous state space HMM Linear dynamics Kalman filter (exact inference) Nonlinear dynamics Particle filter (approximate inference)