1 CS6825: Recognition 8. Hidden Markov Models
2 Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events HMMs allow you to estimate probabilities of unobserved events E.g., in speech recognition, the observed data is the acoustic signal and the words are the hidden parameters you are trying to figure out. E.g., in speech recognition, the observed data is the acoustic signal and the words are the hidden parameters you are trying to figure out.
3 HMMs and their Usage HMMs are very common in Computational Linguistics: HMMs are very common in Computational Linguistics: Speech recognition (observed: acoustic signal, hidden: words)Speech recognition (observed: acoustic signal, hidden: words) Handwriting recognition (observed: image, hidden: words)Handwriting recognition (observed: image, hidden: words) Part-of-speech tagging (observed: words, hidden: part-of-speech tags)Part-of-speech tagging (observed: words, hidden: part-of-speech tags) Machine translation (observed: foreign words, hidden: words in target language)Machine translation (observed: foreign words, hidden: words in target language)
4 Hidden Markov Models Now in each state we could emit a measurement, with probability depending on the state and the measurement Now in each state we could emit a measurement, with probability depending on the state and the measurement We observe these measurements We observe these measurements
5 Hidden Markov Models….example Elements of sign language understanding Elements of sign language understanding the speaker makes a sequence of signsthe speaker makes a sequence of signs Some signs are more common than othersSome signs are more common than others the next sign depends (roughly, and probabilistically) only on the current signthe next sign depends (roughly, and probabilistically) only on the current sign there are measurements, which may be inaccurate; different signs tend to generate different probability densities on measurement valuesthere are measurements, which may be inaccurate; different signs tend to generate different probability densities on measurement values Many problems share these properties Many problems share these properties tracking is like this, for exampletracking is like this, for example
6 HMM’s - dynamics
7 HMM’s - the Joint and Inference
8 Trellises Each column corresponds to a measurement in the sequence Each column corresponds to a measurement in the sequence Trellis makes the collection of legal paths obvious Trellis makes the collection of legal paths obvious Now we would like to get the path with the largest negative log- posterior Now we would like to get the path with the largest negative log- posterior Trellis makes this easy, as follows. Trellis makes this easy, as follows.
9
10 Fitting an HMM I have: I have: sequence of measurementssequence of measurements collection of statescollection of states topologytopology I want I want state transition probabilitiesstate transition probabilities measurement emission probabilitiesmeasurement emission probabilities Straightforward application of EM Straightforward application of EM discrete vars give state for each measurementdiscrete vars give state for each measurement M step is just averaging, etc.M step is just averaging, etc.
11 HMM’s for sign language understanding-1 Build an HMM for each word Build an HMM for each word
12 HMM’s for sign language understanding-2 Build an HMM for each word Build an HMM for each word Then build a language model Then build a language model
13 Figure from “Real time American sign language recognition using desk and wearable computer based video,” T. Starner, et al. Proc. Int. Symp. on Computer Vision, 1995, copyright 1995, IEEE User gesturing For both isolated word recognition tasks and for recognition using a language model that has five word sentences (words always appearing in the order pronoun verb noun adjective pronoun ), Starner and Pentland’s displays a word accuracy of the order of 90%. Values are slightly larger or smaller, depending on the features and the task, etc.
14 Example – American Sign Language Detection gri.gallaudet.edu /~cvogler/research /data/cvdm- iccv98.pdf gri.gallaudet.edu /~cvogler/research /data/cvdm- iccv98.pdf gri.gallaudet.edu /~cvogler/research /data/cvdm- iccv98.pdf gri.gallaudet.edu /~cvogler/research /data/cvdm- iccv98.pdf
15 HMM’s can be spatial rather than temporal; for example, we have a simple model where the position of the arm depends on the position of the torso, and the position of the leg depends on the position of the torso. We can build a trellis, where each node represents correspondence between an image token and a body part, and do DP on this trellis.
16
17 Figure from “Efficient Matching of Pictorial Structures,” P. Felzenszwalb and D.P. Huttenlocher, Proc. Computer Vision and Pattern Recognition2000, copyright 2000, IEEE
18 Another Example – Emotion Detecton ations/mlhmmemotions.pdf ations/mlhmmemotions.pdf ations/mlhmmemotions.pdf ations/mlhmmemotions.pdf
19 Advantage of HMM Does not just use current state to do recognition….looks at previous state(s) to understand what is going on. Does not just use current state to do recognition….looks at previous state(s) to understand what is going on. This is powerful idea when such temporal dependencies exist. This is powerful idea when such temporal dependencies exist.