CSE5230 - Data Mining, 2002Lecture 10.1 Data Mining - CSE5230 Hidden Markov Models (HMMs) CSE5230/DMS/2002/10.

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Hidden Markov Models By Marc Sobel. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Modeling.
Automatic Speech Recognition II  Hidden Markov Models  Neural Network.
HIDDEN MARKOV MODELS Prof. Navneet Goyal Department of Computer Science BITS, Pilani Presentation based on: & on presentation on HMM by Jianfeng Tang Old.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
An Introduction to Hidden Markov Models and Gesture Recognition Troy L. McDaniel Research Assistant Center for Cognitive Ubiquitous Computing Arizona State.
數據分析 David Shiuan Department of Life Science Institute of Biotechnology Interdisciplinary Program of Bioinformatics National Dong Hwa University.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Spring 2003Data Mining by H. Liu, ASU1 7. Sequence Mining Sequences and Strings Recognition with Strings MM & HMM Sequence Association Rules.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
Albert Gatt Corpora and Statistical Methods Lecture 8.
INTRODUCTION TO Machine Learning 3rd Edition
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Bayesian Classification and Bayesian Networks
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Hidden Markov Model: Extension of Markov Chains
Chapter 3 (part 3): Maximum-Likelihood and Bayesian Parameter Estimation Hidden Markov Model: Extension of Markov Chains All materials used in this course.
CSE Data Mining, 2004Lecture 9.1 Data Mining - CSE5230 Hidden Markov Models (HMMs) CSE5230/DMS/2004/9.
Probabilistic Model of Sequences Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Isolated-Word Speech Recognition Using Hidden Markov Models
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Hidden Markov Models Applied to Information Extraction Part I: Concept Part I: Concept HMM Tutorial HMM Tutorial Part II: Sample Application Part II: Sample.
THE HIDDEN MARKOV MODEL (HMM)
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Homework 1 Reminder Due date: (till 23:59) Submission: – – Write the names of students in your team.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013.
By Dr. Borne 2005UMUC Data Mining Lecture 81 Data Mining UMUC CSMN 667 Lecture #8.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Stochastic Processes and Transition Probabilities D Nagesh Kumar, IISc Water Resources Planning and Management: M6L5 Stochastic Optimization.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Pattern Recognition NTUEE 高奕豪 2005/4/14. Outline Introduction Definition, Examples, Related Fields, System, and Design Approaches Bayesian, Hidden Markov.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Models BMI/CS 576
Statistical Models for Automatic Speech Recognition
Hidden Markov Models Part 2: Algorithms
Statistical Models for Automatic Speech Recognition
CONTEXT DEPENDENT CLASSIFICATION
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Presentation transcript:

CSE Data Mining, 2002Lecture 10.1 Data Mining - CSE5230 Hidden Markov Models (HMMs) CSE5230/DMS/2002/10

CSE Data Mining, 2002Lecture 10.2 Lecture Outline u Time- and space-varying processes u First-order Markov models u Hidden Markov models u Examples: coin toss experiments u Formal Definition u Use of HMMs for classification u References

CSE Data Mining, 2002Lecture 10.3 Time- and Space-varying Processes (1) u The data mining techniques we have discussed so far have focused on the classification, prediction or characterization of single data points, e.g.: vAssign a record to one of a set of classes »Decision trees, back-propagation neural networks, Bayesian classifiers, etc. vPredicting the value of a field in a record given the values of the other fields »Regression, back-propagation neural networks, etc. vFinding regions of feature space where data points are densely grouped »Clustering, self-organizing maps

CSE Data Mining, 2002Lecture 10.4 Time- and Space-varying Processes (2) u In the methods we have considered so far, we have assumed that each observed data point is statistically independent from the observation that preceded it, e.g. vClassification: the class of data point x t is not influenced by the class of x_ t-1 (or indeed any other data point) vPrediction: the value of a field for a record depends only on the values of the field of that record, not on values in any other records. u Several important real-world data mining problems can not be modeled in this way.

CSE Data Mining, 2002Lecture 10.5 Time- and Space-varying Processes (3) u We often encounter sequences of observations, where each observation may depend on the observations which preceded it u Examples vSequences of phonemes (fundamental sounds) in speech (speech recognition) vSequences of letters or words in text (text categorization, information retrieval, text mining) vSequences of web page accesses (web usage mining) vSequences of bases (CGAT) in DNA (genome projects [human, fruit fly, etc.)) vSequences of pen-strokes (hand-writing recognition) u In all these cases, the probability of observing a particular value in the sequence can depend on the values which came before it

CSE Data Mining, 2002Lecture 10.6 Example: web log u Consider the following extract from a web log: u Cleary the URL which is requested depends on the URL which was requested before vIf the user uses the “Back” button in his/her browser, the requested URL may depend on earlier URLs in the sequence too u The given a particular observed URL, we can calculate the probabilities of observing all the other possible URLs next. vNote that we may even observe the same URL next. xxx - - [16/Sep/2002:14:50: ]"GET /courseware/cse5230/ HTTP/1.1" xxx - - [16/Sep/2002:14:50: ]"GET /courseware/cse5230/html/research_paper.html HTTP/1.1" xxx - - [16/Sep/2002:14:51: ]"GET /courseware/cse5230/html/tutorials.html HTTP/1.1" xxx - - [16/Sep/2002:14:51: ]"GET /courseware/cse5230/assets/images/citation.pdf HTTP/1.1" xxx - - [16/Sep/2002:14:51: ]"GET /courseware/cse5230/assets/images/citation.pdf HTTP/1.1" xxx - - [16/Sep/2002:14:51: ]"GET /courseware/cse5230/assets/images/clustering.pdf HTTP/1.1" xxx - - [16/Sep/2002:14:51: ]"GET /courseware/cse5230/assets/images/clustering.pdf HTTP/1.1" xxx - - [16/Sep/2002:14:51: ]"GET /courseware/cse5230/assets/images/NeuralNetworksTute.pdf HTTP/1.1" xxx - - [16/Sep/2002:14:51: ]"GET /courseware/cse5230/assets/images/NeuralNetworksTute.pdf HTTP/1.1" xxx - - [16/Sep/2002:14:52: ]"GET /courseware/cse5230/html/lectures.html HTTP/1.1" xxx - - [16/Sep/2002:14:52: ]"GET /courseware/cse5230/assets/images/week03.ppt HTTP/1.1" xxx - - [16/Sep/2002:14:52: ]"GET /courseware/cse5230/assets/images/week06.ppt HTTP/1.1"

CSE Data Mining, 2002Lecture 10.7 First-Order Markov Models (1) u In order to model processes such as these, we make use of the idea of states. At any time t, we consider the system to be in state w(t). u We can consider a sequence of successive states of length T: wT = {w(1), w(2), …, w(T)} u We will model the production of such a sequence using transition probabilities: u The probability that the system will be in state w j and time t+1 given that it was in state w i at time t

CSE Data Mining, 2002Lecture 10.8 First-Order Markov Models (2) u A model of states and transition probabilities, such as the one we have just described, is called a Markov model. u Since we have assumed that the transition probabilities depend only on the previous state, this is a first-order Markov model vHigher order Markov models are possible, but we will not consider them here. u For example, Markov models for human speech could have states corresponding phonemes vA Markov model for the word “cat” would have states for /k/, /a/, /t/ and a final silent state

CSE Data Mining, 2002Lecture 10.9 Example: Markov model for “cat” /k//a//t/ /silent/

CSE Data Mining, 2002Lecture Hidden Markov Models u In the preceding example, we have said that the states correspond to phonemes u In a speech recognition system, however, we don’t have access to phonemes – we can only measure properties of the sound produced by a speaker u In general, our observed data does not correspond directly to a state of the model: the data corresponds to the visible states of the system vThe visible states are directly accessible for measurement. u The system can also have internal “hidden” states, which can not be observed directly vFor each hidden state, there is a probability of observing each visible state. u This sort of model is called Hidden Markov Model (HMM)

CSE Data Mining, 2002Lecture Example: coin toss experiments u Let us imagine a scenario where we are in a room which is divided in two by a curtain. u We are on one side of the curtain, and on the other is a person who will carry out a procedure using coins resulting in a head (H) or a tail (T). u When the person has carried out the procedure, they call out the result, H or T, which we record.  This system will allow us to generate a sequence of Hs and Ts, e.g. HHTHTHTTHTTTTTHHTHHHHTHHHTTHHHHHHTTT TTTTTHTHHTHTTTTTHHTHTHHHTHTHHTTTTHHT TTHHTHHTTTHTHTHTHTHHHTHHTTHT ….

CSE Data Mining, 2002Lecture Example: single fair coin u Imagine that the person behind the curtain has a single fair coin (i.e. it has equal probabilities of coming up heads or tails) u We could model the process producing the sequence of Hs and Ts as a Markov model with two states, and equal transition probabilities: u Note that here the visible states correspond exactly to the internal states – the model is not hidden u Note also that states can transition to themselves TH 0.5

CSE Data Mining, 2002Lecture Example: a fair and a biased coin u Now let us imagine a more complicated scenario. The person behind the curtain has two coins, one fair and one biased (for example, P(T) = 0.9) 1.The person starts by picking a coin a random 2.The person tosses the coin, and calls out the result (H or T) 3.If the result was H, the person switches coins 4.Go back to step 2, and repeat.  This process generates sequences like: TTTTTTTTTTTTTTTTTTTTTTTTHHTTTTTTTHHTTTTTTT TTTTTTTTTTTTTTTHHTTTTTTTTTHTTHTTHHTTTTTHHT TTTTTTTTTHHTTTTTTTTHTHHHTTTTTTTTTTTTTTHHTT TTTTTHTHTTTTTTTHHTTTTT… u Note this looks quite different from the sequence for the fair coin example.

CSE Data Mining, 2002Lecture Example: a fair and a biased coin u In this scenario, the visible state no longer corresponds exactly to the hidden state of the system: vVisible state: output of H or T vHidden state: which coin was tossed u We can model this process using a HMM: Biased Fair T H H T

CSE Data Mining, 2002Lecture Example: a fair and a biased coin u We see from the diagram on the preceding slide that we have extended our model vThe visible states are shown in blue, and the emission probabilities are shown too. u As well as internal states w(t) and state transition probabilities a ij, we have visible states v(t) and emission probabilities b jk vNote that the b jk do not need to be related to the a ij as they are in the example above. u We now have full model such as this is called a Hidden Markov Model

CSE Data Mining, 2002Lecture HMM: formal definition u We can now give a more formal definition of a first-order Hidden Markov Model (adapted from [RaJ1986]: vThere is a finite number of (internal) states, N vAt each time t, a new state is entered, based upon a transition probability distribution which depends on the state at time t – 1. Self-transitions are allowed vAfter each transition is made, a symbol is output, according to a probability distribution which depends only on the current state. There are thus N such probability distributions. u Estimating the number of states N, and the transition and emission probabilities are complex issues, but solutions do exist.

CSE Data Mining, 2002Lecture Use of HMMs u We have now seen what sorts of processes can be modeled using HMMs, and how an HMM is specified mathematically. u We now consider how HMMs are actually used. u Consider the two H and T sequences we saw in the previous examples: vHow could we decided which coin-toss system was most likely to have produced each sequence?  To which system would you assign these sequences? 1: TTTHHTTTTTTTTTTTTTHHTTTTTTHHTTTHH 2: THHTTTHHHTTHTHTTHTHHTTHHHTTHTHTHT 3: THHTHTHTHTHHHTTHTTTHHTTHTTTTTHHHT 4: HTTTHTTHTTTTHTTTHHTTHTHTTTTTTTTHT u We can answer question this using a Bayesian formulation (see last week’s lecture)

CSE Data Mining, 2002Lecture Use of HMMs for classification u HMMs are often used to classify sequences u To do this, a separate HMM is built and trained (i.e. the parameters are estimated) for each class of sequence in which we are interested ve.g. we might have an HMM for each word in a speech recognition system. The hidden states would correspond to phonemes, and the visible states to measured sound features u For a given observed sequence v T, we estimate the probability that each HMM M l generated it: u We assign the sequence to the model with the highest posterior probability. u The algorithms for calculating these probabilities are beyond the scope of this unit, but can be found in the references.

CSE Data Mining, 2002Lecture References u [DHS2000] Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classification (2nd Edn), Wiley, New York, NY, 2000, pp u [RaJ1986] L. R. Rabiner and B. H. Juang, An introduction to hidden Markov models, IEEE Magazine on Acoustics, Speech and Signal Processing, 3, 1, pp. 4-16, January 1986.