GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Slides:



Advertisements
Similar presentations
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Advertisements

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Mixture Models and the EM Algorithm
Lauritzen-Spiegelhalter Algorithm
Exact Inference in Bayes Nets
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Expectation Maximization
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Dynamic Bayesian Networks (DBNs)
Supervised Learning Recap
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
Hidden Markov Models Eine Einführung.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.
Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
From Variable Elimination to Junction Trees
… Hidden Markov Models Markov assumption: Transition model:
Midterm Review. The Midterm Everything we have talked about so far Stuff from HW I won’t ask you to do as complicated calculations as the HW Don’t need.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Lecture 5: Learning models using EM
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Belief Propagation, Junction Trees, and Factor Graphs
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Exact Inference: Clique Trees
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Computer vision: models, learning and inference
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
CS Statistical Machine learning Lecture 24
Hidden Markovian Model. Some Definitions Finite automation is defined by a set of states, and a set of transitions between states that are taken based.
EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.
Lecture 2: Statistical learning primer for biologists
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
Pattern Recognition and Machine Learning
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Machine Learning – Lecture 18
Today Graphical Models Representing conditional dependence graphically
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Slide 1 Directed Graphical Probabilistic Models: inference William W. Cohen Machine Learning Feb 2008.
Hidden Markov Models BMI/CS 576
Bayesian Models in Machine Learning
CSCI 5822 Probabilistic Models of Human and Machine Learning
Three classic HMM problems
Exact Inference ..
LECTURE 15: REESTIMATION, EM AND MIXTURES
Markov Networks.
Presentation transcript:

GS 540 week 6

HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence – P(O|M) A C G T A G C T T T Probability of taking this state path given t-probs sequence (emissions) state paths Probability of emitting this sequence from this state path given e-probs Joint Probability

Viterbi Algorithm A C G T A G C T T T sequence states Highest weight path Joint Probability …

Viterbi Algorithm A C G T A G C T T T sequence states … Store at each node: Likelihood of highest-likelihood path ending here Traceback to previous node in path

Viterbi Algorithm A C G T A G C T T T sequence states …

Pseudocode for Viterbi Algorithm Iterate over positions in sequence At each position, fill in likelihood and traceback Compute emission and transition counts: – number of times state k emits letter s – number of times state k transitions to state k' Compute new parameter values

Likelihood function for Viterbi training Likelihood function for Viterbi: Different than likelihood function for EM:

Machine learning debugging tips Debugging ML algorithms is much harder than other code. Problem: It is hard to verify the output after each step.

Machine learning debugging tips Strategies: – Work out a simple example by hand. – Implement a simpler, less efficient algorithm and compare. – Compute your objective function (i.e. likelihood) at each iteration. – Stare hard at your code. – Don't use your real data to test.

HW5 Tips Template is not a real solution! Calculate L(M|O) by hand for the first few observations, compare to your results – For each site, what’s the likelihood of each state? states A C G T

HW5 Tips Not necessary to explicitly create graph structure [although you may find it helpful] states

HW5 Tips Template is not a real solution! Calculate L(M|O) by hand for the first few observations, compare to your results – For each site, what’s the likelihood of each state? Make a toy case: – AAAACCCCCCCCCCCCCCC – Easy to calculate L(M|O) by hand Use log space computations.

HW6: Baum-Welch Goal: learn HMM parameters taking into account all paths: Expectation maximization – Forward backward algorithm. – Re-estimate parameter values based on expected counts.

Bayesian networks, dynamic Bayesian networks, hidden Markov models

Hidden Markov model...

Dynamic Bayesian network: More than one (hidden) random variable per position...

Bayesian network: Arbitrary structure of random variables

Probabilistic model inference algorithms Problem: Given a model, what is the probability that a variable X has value x? Belief propagation HMMs: Forwards-backwards algorithms Problem: Given a model, what is the most likely assignment of variables? Maximum a posteriori (MAP) inference Viterbi algorithm

Probabilistic model learning algorithms Problem: Learn model parameters. EM: – E step: Use inference to get estimate of hidden variable values. – M step: Re-estimate parameter values. HMMs: Baum-Welch algorithm Use belief propagation for inference: "soft EM" Use Viterbi inference: "hard EM"

What type of inference should you use to get an estimate of latent variable values? 1.Belief propagation then choose most likely value for each variable. – Might get impossible set of values 2.Viterbi inference: – Might get unrealistic values.

Example

P(15|12) Step 1: Convert dependencies to distribution hyperedges P(1) P(8|7,9)

P(15|12) Step 1: Convert dependencies to distribution hyperedges P(1) P(8|7,9)

Step 2: Form junction tree

Step 1: Moralize the Graph

Step 2: Remove Arrows

The Clique Graph

Junction Trees A junction tree is a subgraph of the clique graph that: 1. Is a tree 2. Contains all the nodes of the clique graph 3. Satisfies the running intersection property. Running intersection property: For each pair U, V of cliques with intersection S, all cliques on the path between U and V contain S.

Step 5: Build the Junction Tree

Step 7: Populate the Cliques Place each potential from the original network in a clique containing all the variables it references For each clique node, form the product of the distributions in it (as in variable elimination).

Step 7: Populate the Cliques

Step 8: Belief Propagation 1.Incorporate evidence 2.Upward pass: Send messages toward root 3.Downward pass: Send messages toward leaves

Step 8.1: Incorporate Evidence For each evidence variable, go to one table that includes that variable. Set to 0 all entries in that table that disagree with the evidence.

Step 8.2: Upward Pass For each leaf in the junction tree, send a message to its parent. The message is the marginal of its table, summing out any variable not in the separator. When a parent receives a message from a child, it multiplies its table by the message table to obtain its new table. When a parent receives messages from all its children, it repeats the process (acts as a leaf). This process continues until the root receives messages from all its children.

Step 8.3: Downward Pass Reverses upward pass, starting at the root. The root sends a message to each of its children. More specifically, the root divides its current table by the message received from the child, marginalizes the resulting table to the separator, and sends the result to the child. Each child multiplies its table by its parent ’ s table and repeats the process (acts as a root) until leaves are reached. Table at each clique is joint marginal of its variables; sum out as needed. We ’ re done!

Inference Example: Going Up (No evidence)

Status After Upward Pass

Going Back Down

Status After Downward Pass

Why Does This Work? The junction tree algorithm is just a way to do variable elimination in all directions at once, storing intermediate results at each step.

The Link Between Junction Trees and Variable Elimination To eliminate a variable at any step, we combine all remaining tables involving that variable. A node in the junction tree corresponds to the variables in one of the tables created during variable elimination (the other variables required to remove a variable). An arc in the junction tree shows the flow of data in the elimination computation.

Junction Tree Savings Avoids redundancy in repeated variable elimination Need to build junction tree only once ever Need to repeat belief propagation only when new evidence is received

Loopy Belief Propagation Inference is efficient if graph is tree Inference cost is exponential in treewidth (size of largest clique in graph – 1) What if treewidth is too high? Solution: Do belief prop. on original graph May not converge, or converge to bad approx. In practice, often fast and good approximation

Loopy Belief Propagation Nodes (x)Factors (f)