Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology.

Slides:



Advertisements
Similar presentations
. Inference and Parameter Estimation in HMM Lecture 11 Computational Genomics © Shlomo Moran, Ydo Wexler, Dan Geiger (Technion) modified by Benny Chor.
Advertisements

Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
. Computational Genomics Lecture 7c Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
. Hidden Markov Models - HMM Tutorial #5 © Ydo Wexler & Dan Geiger.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Hidden Markov Models Eine Einführung.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Patterns, Profiles, and Multiple Alignment.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Apaydin slides with a several modifications and additions by Christoph Eick.
INTRODUCTION TO Machine Learning 3rd Edition
HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the.
… Hidden Markov Models Markov assumption: Transition model:
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
. Learning Hidden Markov Models Tutorial #7 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … x1 x2 x3 xK.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
. Inference in HMM Tutorial #6 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
Dishonest Casino Let’s take a look at a casino that uses a fair die most of the time, but occasionally changes it to a loaded die. This model is hidden.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
CHAPTER 15 SECTION 3 – 4 Hidden Markov Models. Terminology.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
HMM - Basics.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Hidden Markov Models BMI/CS 776 Mark Craven March 2002.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markovian Model. Some Definitions Finite automation is defined by a set of states, and a set of transitions between states that are taken based.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 DNA Analysis Part II Amir Golnabi ENGS 112 Spring 2008.
. EM in Hidden Markov Models Tutorial 7 © Ydo Wexler & Dan Geiger, revised by Sivan Yogev.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Lecture 16, CS5671 Hidden Markov Models (“Carnivals with High Walls”) States (“Stalls”) Emission probabilities (“Odds”) Transitions (“Routes”) Sequences.
Hidden Markov Models BMI/CS 576
Presentation transcript:

Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI

Resources used for these slides Durbin, Eddy, Krogh, and Mitchison. "Biological Sequence Analysis". Cambridge University Press Sections Durbin, Eddy, Krogh, and Mitchison. "Biological Sequence Analysis". Cambridge University Press Prof. Moran's Algorithms in Computational Biology course (Technion Univ.): Prof. Moran's Algorithms in Computational Biology course (Technion Univ.) – Ydo Wexler & Dan Geiger's Markov Chain Tutorial. Ydo Wexler & Dan Geiger's Markov Chain Tutorial. – Hidden Markov Models (HMMs) Tutorial. Hidden Markov Models (HMMs) Tutorial.

HMM: Coke/Pepsi Example start B R A C P C P Hidden States: start: fake start state A: The price of Coke and Pepsi are the same R: “Red sale”: Coke is on sale (cheaper than Pepsi) B: “Blue sale”: Pepsi is on sale (cheaper than Coke) Emissions: C: Coke P: Pepsi C P

1. Finding the most likely trajectory Given a HMM and a sequence of observables: x 1,x 2,…,x L determine the most likely sequence of states that generated x 1,x 2,…,x L : S*= (s* 1,s* 2,…,s* L ) = argmax p( s 1,s 2,…,s L | x 1,x 2,…,x L ) s 1,s 2,…,s L = argmax p( s 1,s 2,…,s L ; x 1,x 2,…,x L )/p(x 1,x 2,…,x L ) s 1,s 2,…,s L = argmax p( s 1,s 2,…,s L ; x 1,x 2,…,x L ) s 1,s 2,…,s L

= argmax p( s 1,s 2,…,s L ; x 1,x 2,…,x L ) s 1,s 2,…,s L = argmax p(s 1,s 2,…,s L-1 ; x 1,x 2,…,x L-1 )p(s L |s L-1 )p(x L |s L ) s 1,s 2,…,s L This inspires a recursive formulation of S*. Viterbi’s idea: This can be calculated using dynamic programming. v(k,t) = max p(s 1,..,s t = k ; x 1,..,x t ) that is, the probability of a most probable path up to time t that ends on state k. By the above derivation: v(k,t) = max p(s 1,..,s t-1 ; x 1,..,x t-1 )p(s t =k|s t-1 )p(x t |s t =k) = max v(j,t-1)p(s t =k|s j )p(x t |s t =k) j = p(x t |s t =k) max v(j,t-1)p(s t =k|s j ) j

Viterbi’s Algorithm - Example v x 1 = Cx 2 = Px 3 = C start 1000 A 0 R 0 B 0 Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the most likely path S*= (s* 1,s* 2,s* 3 ) that generated x 1,x 2,x 3 = CPC initialization

Viterbi’s Algorithm - Example v x 1 = Cx 2 = Px 3 = C start1000 A 0p(x t |s t =k) max j v(j,t-1)p(s t |s j ) = p(C|A ) max {v(start,0)p(A|start), 0, 0, 0} = p(C|A ) v(start,0)p(A|start) = 0.6 *1*0.6 = 0.36 Parent: start R 0 p(C|R) max {v(start,0)p(R|start), 0, 0, 0} = 0.9*1*0.1 = 0.09 Parent: start B 0 p(C|B) max {v(start,0)p(B|start), 0, 0, 0} = 0.5*1*0.3 = 0.15 Parent: start Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the most likely path S*= (s* 1,s* 2,s* 3 ) that generated x 1,x 2,x 3 = CPC

Viterbi’s Algorithm - Example v x 1 = Cx 2 = Px 3 =C start 1000 A Parent: start = p(x t |s t =k) max j v(j,t-1)p(s t |s j ) = p(P|A) max {v(start,1)p(A|start), v(A,1)p(A|A), v(R,1)p(A|R), v(B,1)p(A|B)} = 0.4* max{0, 0.36*0.2, 0.09*0.1, 0.15*0.4} = 0.4*0.072= Parent: A R Parent: start = p(x t |s t =k) max j v(j,t-1)p(s t |s j ) = p(P|R) max {v(start,1)p(R|start), v(A,1)p(R|A), v(R,1)p(R|R), v(B,1)p(R|B)} = 0.1* max{0, 0.36*0.1, 0.09*0.1, 0.15*0.3} = 0.1*0.045= Parent: B B Parent: start = p(x t |s t =k) max j v(j,t-1)p(s t |s j ) = p(P|B) max {v(start,1)p(B|start), v(A,1)p(B|A), v(R,1)p(B|R), v(B,1)p(B|B)} = 0.5* max{0, 0.36*0.7, 0.09*0.8, 0.15*0.3} = 0.5*0.252= Parent: A Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the most likely path S*= (s* 1,s* 2,s* 3 ) that generated x 1,x 2,x 3 = CPC

Viterbi’s Algorithm - Example v x 1 = Cx 2 = Px 3 =C start 1000 A Parent: start Parent: A = p(x t |s t =k) max j v(j,t-1)p(s t |s j ) = p(C|A) max {v(start,2)p(A|start), v(A,2)p(A|A), v(R,2)p(A|R), v(B,2)p(A|B)} = 0.6* max{0, *0.2, *0.1, 0.126*0.4} = 0.6*0.0504= Parent: B R Parent: start Parent: B = p(x t |s t =k) max j v(j,t-1)p(s t |s j ) = p(C|R) max {v(start,2)p(R|start), v(A,2)p(R|A), v(R,2)p(R|R), v(B,2)p(R|B)} = 0.9* max{0, *0.1, *0.1, 0.126*0.3} = 0.9*0.0378= Parent: B B Parent: start Parent: A = p(x t |s t =k) max j v(j,t-1)p(s t |s j ) = p(C|B) max {v(start,1)p(B|start), v(A,2)p(B|A), v(R,2)p(B|R), v(B,2)p(B|B)} = 0.5* max{0, *0.7, *0.8, 0.126*0.3} = 0.5*0.0378= Parent: B Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the most likely path S*= (s* 1,s* 2,s* 3 ) that generated x 1,x 2,x 3 = CPC

Viterbi’s Algorithm - Example v x 1 = Cx 2 = Px 3 =C start 1000 A Parent: start Parent: A Parent: B R Parent: start Parent: B Parent: B B Parent: start Parent: A Parent: B Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the most likely path S*= (s* 1,s* 2,s* 3 ) that generated x 1,x 2,x 3 = CPC Hence, the most likely path that generated CPC is: start A B R This maximum likelihood path is extracted from the table as follows: The last state of the path is the one with the highest value in the right-most column The previous state in the path is the one recorded as Parent of the last Keep following the Parents trail backwards until you arrive at start

2. Calculating the probability of a sequence of observations Given a HMM and a sequence of observations: x 1,x 2,…,x L determine p(x 1,x 2,…,x L ): p(x 1,x 2,…,x L ) =  p( s 1,s 2,…,s L ; x 1,x 2,…,x L ) s 1,s 2,…,s L =  p(s 1,s 2,…,s L-1 ; x 1,x 2,…,x L-1 )p(s L |s L-1 )p(x L |s L ) s 1,s 2,…,s L

Let f(k,t) = p(s t = k ; x 1,..,x t ) that is, the probability of x 1,..,x t requiring s t = k. In other words, the sum of probabilities of all the paths that emit (x 1,..,x t ) and end in state s t =k. f(k,t) = p(s t = k ; x 1,..,x t, x t ) =  j p(s t-1 =j; x 1,x 2,…,x t-1 ) p(s t =k|s t-1 =j) p(x t |s t =k) = p(x t |s t =k)  j p(s t-1 =j; x 1,x 2,…,x t-1 ) p(s t =k|s t-1 =j) = p(x t |s t =k)  j f(j,t-1) p(s t =k|s t-1 )

Forward Algorithm - Example f x 1 = Cx 2 = Px 3 = C start 1000 A 0 R 0 B 0 Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x 1,x 2,x 3 = CPC. That is, find p(CPC). initialization

Forward Algorithm - Example f x 1 = Cx 2 = Px 3 = C start1000 A 0 p(x t |s t =k)  j f(j,t-1)p(s t |s j ) = p(C|A )  {f(start,0)p(A|start), 0, 0, 0} = p(C|A ) f(start,0)p(A|start) = 0.6 *1*0.6 = 0.36 R 0 p(C|R)  {f(start,0)p(R|start), 0, 0, 0} = 0.9*1*0.1 = 0.09 B 0 p(C|B)  {f(start,0)p(B|start), 0, 0, 0} = 0.5*1*0.3 = 0.15 Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x 1,x 2,x 3 = CPC. That is, find p(CPC).

Forward Algorithm - Example f x 1 = Cx 2 = Px 3 =C start 1000 A = p(x t |s t =k)  j f(j,t-1)p(s t |s j ) = p(P|A) (f(start,1)p(A|start), + f(A,1)p(A|A), + f(R,1)p(A|R), + f(B,1)p(A|B)) = 0.4* ( * * *0.4) = 0.4*0.141= R = p(x t |s t =k)  j f(j,t-1)p(s t |s j ) = p(P|R) (f(start,1)p(R|start) + f(A,1)p(R|A) + f(R,1)p(R|R) + f(B,1)p(R|B)) = 0.1* ( * * *0.3) = 0.1*0.09= B = p(x t |s t =k)  j f(j,t-1)p(s t |s j ) = p(P|B) (f(start,1)p(B|start) + f(A,1)p(B|A) + f(R,1)p(B|R) + f(B,1)p(B|B)) = 0.5* ( * * *0.3) = 0.5*0.369= Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x 1,x 2,x 3 = CPC. That is, find p(CPC).

Forward Algorithm - Example f x 1 = Cx 2 = Px 3 =C start 1000 A = p(x t |s t =k)  j f(j,t-1)p(s t |s j ) = p(C|A)  {f(start,2)p(A|start), f(A,2)p(A|A), f(R,2)p(A|R), f(B,2)p(A|B)} = 0.6* ( * * *0.4} = 0.6* = R = p(x t |s t =k)  j f(j,t-1)p(s t |s j ) = p(C|R)  {f(start,2)p(R|start), f(A,2)p(R|A), f(R,2)p(R|R), f(B,2)p(R|B)} = 0.9* ( * * *0.3} = 0.9* = B = p(x t |s t =k)  j f(j,t-1)p(s t |s j ) = p(C|B)  {f(start,1)p(B|start), f(A,2)p(B|A), f(R,2)p(B|R), f(B,2)p(B|B)} = 0.5* ( * * *0.3} = 0.5* = Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x 1,x 2,x 3 = CPC. That is, find p(CPC).

Forward Algorithm - Example f x 1 = Cx 2 = Px 3 =C start 1000 A R B Hence, the probability of CPC being generated by this HMM is: p(CPC) =  j f(j,3) = = Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x 1,x 2,x 3 = CPC. That is, find p(CPC).

3. Calculating the probability of S t = k given a sequence of observations Given a HMM and a sequence of observations: x 1,x 2,…,x L determine the pr obability that the state visited at time t was k: p( s t =k| x 1,x 2,…,x L ), where 1 <= t <= L p( s t =k| x 1,x 2,…,x L ) = p(x 1,x 2,…,x L ; s t =k )/p(x 1,x 2,…,x L ) Note that p(x 1,x 2,…,x L ) can be found using the forward algorithm. We’ll focus now on determining p(x 1,x 2,…,x L ; s t =k )

p(x 1,…,x t,…,x L ; s t =k) = p(x 1,…,x t ; s t =k) p(x t+1,…,x L | x 1,…,x t ; s t =k) = p(x 1,…,x t ; s t =k) p(x t+1,…,x L | s t =k) f(k,t) b(k,t) forward algorithm backward algorithm b(k,t) = p(x t+1,…,x L | s t =k) =  j p(s t+1 =j|s t =k)p(x t+1 |s t+1 =j) p(x t+2,…,x L | s t+1 =j) b(j,t+1)

Backward Algorithm - Example b x 1 = Cx 2 = Px 3 = C A 1 R 1 B 1 Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x t+1,…,x L given that S t =k: p(x t+1,…,x L | s t =k) initialization

Backward Algorithm - Example b x 1 = Cx 2 = Px 3 = C A  j p(s t+1 =j|s t =k) p(x t+1 |s t+1 =j) b(j,t+1) =  j p(s 3 =j|A) p(C|s 3 =j) b(j,3) = p(A|A)p(C|A)b(A,3) + p(R|A)p(C|R)b(R,3) + p(B|A)p(C|B)b(B,3) = 0.2*0.6* *0.9* *0.5*1 = R  j p(s t+1 =j|s t =k) p(x t+1 |s t+1 =j) b(j,t+1) =  j p(s 3 =j|R) p(C|s 3 =j) b(j,3) = p(A|R)p(C|A)b(A,3) + p(R|R)p(C|R)b(R,3) + p(B|R)p(C|B)b(B,3) = 0.1*0.6* *0.9* *0.5*1 = B  j p(s t+1 =j|s t =k) p(x t+1 |s t+1 =j) b(j,t+1) =  j p(s 3 =j|R) p(C|s 3 =j) b(j,3) = p(A|B)p(C|A)b(A,3) + p(R|B)p(C|R)b(R,3) + p(B|B)p(C|B)b(B,3) = 0.4*0.6* *0.9* *0.5*1 = Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x t+1,…,x L given that S t =k: p(x t+1,…,x L | s t =k)

Backward Algorithm - Example b x 1 = Cx 2 = Px 3 = C A  j p(s t+1 =j|s t =k) p(x t+1 |s t+1 =j) b(j,t+1) =  j p(s 2 =j|A) p(P|s 2 =j) b(j,2) = p(A|A)p(P|A)b(A,2) + p(R|A)p(P|R)b(R,2) + p(B|A)p(P|B)b(B,2) = 0.2*0.4* *0.1* *0.5*0.66 = R  j p(s t+1 =j|s t =k) p(x t+1 |s t+1 =j) b(j,t+1) =  j p(s 2 =j|R) p(P|s 2 =j) b(j,2) = p(A|R)p(P|A)b(A,2) + p(R|R)p(P|R)b(R,2) + p(B|R)p(P|B)b(B,2) = 0.1*0.4* *0.1* *0.5*0.66 = B  j p(s t+1 =j|s t =k) p(x t+1 |s t+1 =j) b(j,t+1) =  j p(s 2 =j|R) p(P|s 2 =j) b(j,2) = p(A|B)p(P|A)b(A,2) + p(R|B)p(P|R)b(R,2) + p(B|B)p(P|B)b(B,2) = 0.4*0.4* *0.1* *0.5*0.66 = Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x t+1,…,x L given that S t =k: p(x t+1,…,x L | s t =k)

Backward Algorithm - Example b x 1 = Cx 2 = Px 3 = C A R B Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x t+1,…,x L given that S t =k: p(x t+1,…,x L | s t =k) We can calculate the probability of CPC being generated by this HMM from the Backward table as follows: p(CPC) =  j b(j,1)p(j|start)p(C|j) = ( *0.6) + (0.2919*0.1*0.9) + (0.2051*0.3*0.5)= though we can obtain the same probability from the Forward table (as we did in a previous slide).

3. (cont.) Using the Forward and Backward tables to calculate the probability of S t = k given a sequence of observations Example: Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the state visited at time 2 was B, that is p(s 2 =B| CPC) In other words, given that the person drank CPC, what’s the probability that Pepsi was on sale during the 2 nd week? Based on the calculations we did on the previous slides: p( s 2 =B|CPC ) = p(CPC; s 2 =B )/p(CPC) = [ p( x 1 =C, x 2 =P; s 2 =B) p(x 3 =C| x 1 =C, x 2 =P ; s 2 =B) ] / p(x 1 =C, x 2 =P, x 3 =C) = [ p(x 1 =C, x 2 =P; s 2 =B) p(x 3 =C| s 2 =B) ] / p(CPC) = [ f(B,2) b(B,2) ] / p(CPC) = [ * 0.66] / = here, p(CPC) was calculated by summing up the last column of the Forward table. so there is a high probability that Pepsi was on sale during week 2, given that the person drank Pepsi that week!