Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Slides:



Advertisements
Similar presentations
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Advertisements

Learning HMM parameters
Hidden Markov Model.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Hidden Markov Model 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction – Markov Chain – Hidden Markov Model (HMM) Formal Definition of HMM & Problems Estimate.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Hidden Markov Models Eine Einführung.
Hidden Markov Models.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Modified from:
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Visual Recognition Tutorial
… Hidden Markov Models Markov assumption: Transition model:
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Lecture 5: Learning models using EM
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Hidden Markov Models.
Markov Models. Markov Chain A sequence of states: X 1, X 2, X 3, … Usually over time The transition from X t-1 to X t depends only on X t-1 (Markov Property).
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Thanks to Nir Friedman, HU
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
Semi-Supervised Learning
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
THE HIDDEN MARKOV MODEL (HMM)
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Hidden Markov Models BMI/CS 776 Mark Craven March 2002.
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
California Pacific Medical Center
Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, March 14, 2013 Session 8: Sequence Labeling This work is licensed under.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Other Models for Time Series. The Hidden Markov Model (HMM)
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models – Concepts 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models BMI/CS 576
Hidden Markov Models.
Hidden Markov Models - Training
Data Mining Lecture 11.
Hidden Markov Models Part 2: Algorithms
Hidden Markov Models By Manish Shrivastava.
Presentation transcript:

Hidden Markov Model 11/28/07

Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification rate. Maximum likelihood rule is equivalent to Bayes rule with uniform prior. Decision boundary is

Naïve Bayes approximation When x is high dimensional, it is difficult to estimate

Naïve Bayes Classifier When x is high dimensional, it is difficult to estimate But if we assume independence, then it becomes a 1-D problem.

Naïve Bayes Classifier Usually the independence assumption is not valid. But sometimes the NBC can still be a good classifier. A lot of times simple models may not perform badly.

Hidden Markov Model

A coin toss example Scenario: You are betting with your friend using a coin toss. And you see (H, T, T, H, …)

A coin toss example Scenario: You are betting with your friend using a coin toss. And you see (H, T, T, H, …) But, you friend is cheating. He occasionally switches from a fair coin to a biased coin – of course, the switch is under the table! Fair Biased

A coin toss example This is what really happening: (H, T, H, T, H, H, H, H, T, H, H, T, …) Of course you can’t see the color. So how can you tell your friend is cheating?

Hidden Markov Model Hidden state (the coin) Observed variable (H or T)

Markov Property Hidden state (the coin) Observed variable (H or T)

Markov Property Fair Biased transition probability prior distribution

Observation independence Hidden state (the coin) Observed variable (H or T) Emission probability

Model parameters A = (a ij ) (transition matrix) p(y t | x t ) (emission probability) p(x 1 ) (prior distribution)

Model inference Infer states when model parameters are known. Both states and model parameters are unknown.

Viterbi algorithm t-1tt state time

Viterbi algorithm Most probable path: t-1tt state time

Viterbi algorithm Most probable path: t-1tt state time

Viterbi algorithm Most probable path: t-1tt state time Therefore, the path can be found iteratively.

Viterbi algorithm Most probable path: t-1tt state time Let v k (i) be the most probable path ending in state k. Then

Viterbi algorithm Initialization (i=0): Recursion (i=1,...,L): Termination: Traceback (i = L,..., 1):

Advantage of Viterbi path Identify the most probable path very efficiently. The most probable path is legitimate, i.e., it is realizable by the HMM process.

Issue with Viterbi path The most probability path does not predict the confidence level of a state estimate. The most probably path may not be much more probable then other paths.

Posterior distribution Estimate p(x k | y 1,..., y L ). Strategy: This is done by a forward-backward algorithm

Forward-backward algorithm Estimate f k (i)

Forward algorithm Estimate f k (i) Initialization: Recursion: Termination:

Backward algorithm Estimate b k (i)

Backward algorithm Estimate b k (i) Initialization: Recursion: Termination:

Probability of fair coin 1 P(fair)

Probability of fair coin 1 P(fair)

Posterior distribution Posterior distribution predicts the confidence level of a state estimate. Posterior distribution combines information from all paths. But.. The predicted path may not be legitimate.

Estimating parameters when state sequence is known Given the state sequence {x k } Define A jk = # transitions from j to k. E k (b) = #emissions of b from k. The maximum likelihood estimates of parameters are:

Infer hidden states together with model parameters Viterbi training Baum-Welch

Viterbi training Main idea: Use an iterative procedure Estimate state for fixed parameters using the Viterbi algorithm. Estimate model parameters for fixed states.

Baum-Welch algorithm Instead of using the Viterbi path to estimate state, consider the expected number of A kl and E k (b)

Baum-Welch algorithm Instead of using the Viterbi path to estimate state, consider the expected number of A kl and E k (b)

Baum-Welch is a special case of EM algorithm Given an estimate of parameter  t, try to find a better  Choose  to maximize Q

Baum-Welch is a special case of EM algorithm E-step: Calculate the Q function M-step: Maximize Q(  |  t ) with respect to .

Issue with EM EM only finds local maxima. Solution: –Run multiple EM starting with different initial guesses. –Use more sophisticated algorithm such as MCMC.

Kelvin Murphy Dynamic Bayesian Network

Software Kevin Murphy’s Bayes Net Toolbox for Matlab BNT/bnt.html

Applications (Yi Li) Copy number changes

Applications Protein-binding sites

Applications Sequence alignment

Reading list Hastie et al. (2001) the ESL book – p Durbin et al. (1998) Biological Sequence Analysis –Chapter 3.