Presentation is loading. Please wait.

Presentation is loading. Please wait.

Markov Chains Lecture #5

Similar presentations


Presentation on theme: "Markov Chains Lecture #5"— Presentation transcript:

1 Markov Chains Lecture #5
Background Readings: Durbin et. al. Section 3.1, Polanski&Kimmel Section 2.8. Prepared by Shlomo Moran, based on Danny Geiger’s and Nir Friedman’s .

2 Dependencies along the genome
In previous classes we assumed every letter in a sequence is sampled randomly from some distribution q() over the alpha bet {A,C,T,G}. This is too restrictive for true genomes. There are special subsequences in the genome, like TATA within the regulatory area, upstream a gene. The pattern CG is less common than expected for random sampling. We model such dependencies by Markov chains and hidden Markov model, which we define next.

3 Finite Markov Chain An integer time stochastic process, consisting of a domain D of m>1 states {s1,…,sm} and An m dimensional initial distribution vector ( p(s1),.., p(sm)). An m×m transition probabilities matrix M= (msisj) For example, D can be the letters {A, C, T, G}, p(A) the probability of A to be the 1st letter in a sequence, and mAG the probability that G follows A in a sequence.

4 Markov Chain (cont.) X1 X2 Xn-1 Xn For each integer n, a Markov Chain assigns probability to sequences (x1…xn) in Dn (i.e, xi D) as follows:

5 Markov Chain (cont.) X1 X2 Xn-1 Xn Similarly, each Xi is a probability distributions over D, which is determined by the initial distribution (p1,..,pn) and the transition matrix M. There is a rich theory which studies the properties of such “Markov sequences” (X1,…, Xi ,…). A bit of this theory is presented next.

6 Matrix Representation
1 0.8 0.2 0.3 0.5 0.05 0.95 A B C D The transition probabilities Matrix M =(mst) M is a stochastic Matrix: The initial distribution vector (u1…um) defines the distribution of X1 (p(X1=si)=ui) . After one move, the distribution is changed to X2 = X1M

7 Matrix Representation
1 0.8 0.2 0.3 0.5 0.05 0.95 A B C D Example: if X1=(0, 1, 0, 0) then X2=(0.2, 0.5, 0, 0.3) And if X1=(0, 0, 0.5, 0.5) then X2=(0, 0.1, 0.5, 0.4). The i-th distribution is Xi = X1Mi-1 Basic question: what can we tell on the distributions Xi?

8 Representation of a Markov Chain as a Digraph
B C D 0.2 0.3 0.5 0.05 0.95 0.8 1 1 0.8 0.2 0.3 0.5 0.05 0.95 A B C D Each directed edge AB is associated with the positive transition probability from A to B.

9 Properties of Markov Chain states
States of Markov chains are classified by the digraph representation (omitting the actual probability values) A, C and D are recurrent states: they are in strongly connected components which are sinks in the graph. B is not recurrent – it is a transient state A B C D Alternative definitions: A state s is recurrent if it can be reached from any state reachable from s; otherwise it is transient.

10 Another example of Recurrent and Transient States
B C D A and B are transient states, C and D are recurrent states. Once the process moves from B to D, it will never come back.

11 Irreducible Markov Chains
A Markov Chain is irreducible if the corresponding graph is strongly connected (and thus all its states are recurrent). A B C D E A B D C

12 Periodic States A state s has a period k if k is the GCD of the lengths of all the cycles that pass via s. (in the shown graph the period of A is 2). A B C D E Exercise: All the states in the same strongly connected component have the same period A Markov Chain is periodic if all the states in it have a period k >1. It is aperiodic otherwise.

13 Ergodic Markov Chains A B A Markov chain is ergodic if :
the corresponding graph is strongly connected. It is not peridoic D C Ergodic Markov Chains are important since they guarantee the corresponding Markovian process converges to a unique distribution, in which each state has a strictly positive probability.

14 Stationary Distributions for Markov Chains
Let M be a Markov Chain of m states, and let V = (v1,…,vm) be a probability distribution over the m states V = (v1,…,vm) is stationary distribution for M if VM=V. (ie, if one step of the process does not change the distribution). V is a stationary distribution V is a non-negative left (row) Eigenvector of M with Eigenvalue 1.

15 Stationary Distributions for a Markov Chain M
Exercise: A stochastic matrix always has a real left Eigenvector with Eigenvalue 1 (hint: show that a stochastic matrix has a right Eigenvector with Eigenvalue 1. Note that the left Eigenvalues of a Matrix are the same as the right Eigenvlues). It can be shown that the above Eigenvector V can be non-negative. Hence each Markov Chain has a stationary distribution.

16 “Good” Markov chains A Markov Chain is good if the distributions Xi , as i∞: Converge to a unique distribution, independent of the initial distribution. (2) In that unique distribution, each state has a positive probability. The Fundamental Theorem of Finite Markov Chains: A Markov Chain is good  the corresponding graph is ergodic. We will prove the  part, by showing that non-ergodic Markov Chains are not good.

17 Examples of “Bad” Markov Chains
A Markov chain is “bad” if either : It does not converge to a unique distribution (independent on the initial distribution). It does converge to u.d., but some states in this distribution have zero probability.

18 Bad case 1: Mutual Unreachabaility
Consider two initial distributions: p(X1=A)=1 (p(X1 = x)=0 if x≠A). p(X1= C) = 1 In case a), the sequence will stay at A forever. In case b), it will stay in {C,D} for ever. Fact 1: If G has two states which are unreachable from each other, then {Xi} cannot converge to a distribution which is independent on the initial distribution.

19 Bad case 2: Transient States
Once the process moves from B to D, it will never come back.

20 Bad case 2: Transient States
Fact 2: For each initial distribution, with probability 1 a transient state will be visited only a finite number of times. X Proof: Let A be a transient state, and let X be the set of states from which A is unreachable. It is enough to show that, starting from any state, with probability 1 a state in X is reached after a finite number of steps (Exercise: complete the proof)

21 Corollary: A good Markov Chain is irreducible

22 Bad case 3: Periodic Markov Chains
Recall: A Markov Chain is periodic if all the states in it have a period k >1. The above chain has period 2. In the above chain, consider the initial distribution p(B)=1. Then states {B, C} are visited (with positive probability) only in odd steps, and states {A, D, E} are visited in only even steps.

23 Bad case 3: Periodic States
Fact 3: In a periodic Markov Chain (of period k >1) there are initial distributions under which the states are visited in a periodic manner. Under such initial distributions Xi does not converge as i∞. Corollary: A good Markov Chain is not periodic

24 The Fundamental Theorem of Finite Markov Chains:
We have proved that non-ergodic Markov Chains are not good. A proof of the opposite direction below is based on Perron-Frobenius theory on nonnegative matrices, and is beyond the scope of this course: If a Markov Chain is ergodic, then It has a unique stationary distribution vector V > 0, which is a right (row) Eigenvector of the transition matrix. For any initial distribution, the distributions Xi , as i∞, converges to V.

25 Use of Markov Chains in Genome search: Modeling CpG Islands
In human genomes the pair CG often transforms to (methyl-C) G which often transforms to TG. Hence the pair CG appears less than expected from what is expected from the independent frequencies of C and G alone. Due to biological reasons, this process is sometimes suppressed in short stretches of genomes such as in the start regions of many genes. These areas are called CpG islands (-C-phosphate-G-).

26 Example: CpG Island (Cont.)
We consider two questions: Question 1: Given a short stretch of genomic data, does it come from a CpG island ? Question 2: Given a long piece of genomic data, does it contain CpG islands in it, where, what length ? We “solve” the first question by modeling strings with and without CpG islands as Markov Chains over the same states {A,C,G,T} but different transition probabilities:

27 Example: CpG Island (Cont.)
The “+” model: Use transition matrix M+ = (m+st), Where: m+st = (the probability that t follows s in a CpG island) The “-” model: Use transition matrix M- = (m-st), Where: m-st = (the probability that t follows s in a non CpG island)

28 Example: CpG Island (Cont.)
With this model, to solve Question 1 we need to decide whether a given short sequence of letters is more likely to come from the “+” model or from the “–” model. This is done by using the definitions of Markov Chain, in which the parameters are determined by known data and the log odds-ratio test.

29 Question 1: Using two Markov chains
M+ (For CpG islands): We need to specify p+(xi | xi-1) where + stands for CpG Island. From Durbin et al we have: Xi A C G T 0.18 0.27 0.43 0.12 0.17 p+(C | C) 0.274 p+(T|C) 0.16 p+(C|G) p+(G|G) p+(T|G) 0.08 p+(C |T) p+(G|T) p+(T|T) Xi-1 (Recall: rows must add up to one; columns need not.)

30 Question 1: Using two Markov chains
M- (For non-CpG islands): …and for p-(xi | xi-1) (where “-” stands for Non CpG island) we have: Xi A C G T 0.3 0.2 0.29 0.21 0.32 p-(C|C) 0.078 p-(T|C) 0.25 p-(C|G) p-(G|G) p-(T|G) 0.18 p-(C|T) p-(G|T) p-(T|T) Xi-1

31 Discriminating between the two models
X1 X2 XL-1 XL Given a string x=(x1….xL), now compute the ratio If RATIO>1, CpG island is more likely. Actually – the log of this ratio is computed: Note: p+(x1|x0) is defined for convenience as p+(x1). p-(x1|x0) is defined for convenience as p-(x1).

32 Log Odds-Ratio test Taking logarithm yields
If logQ > 0, then + is more likely (CpG island). If logQ < 0, then - is more likely (non-CpG island).

33 Where do the parameters (transition- probabilities) come from ?
Learning from complete data, namely, when the label is given and every xi is measured: Source: A collection of sequences from CpG islands, and a collection of sequences from non-CpG islands. Input: Tuples of the form (x1, …, xL, h), where h is + or - Output: Maximum Likelihood parameters (MLE) Count all pairs (Xi=a, Xi-1=b) with label +, and with label -, say the numbers are Nba,+ and Nba,- .

34 Maximum Likelihood Estimate (MLE) of the parameters (using labeled data)
XL-1 XL The needed parameters are: P+(x1), p+ (xi | xi-1), p-(x1), p-(xi | xi-1) The ML estimates are given by: Where Nx,+ is the number of times letter x appear in CpG islands in the dataset. Where Nbx,+ is the number of times letter x appears after letter b in CpG islands in the dataset. Using MLE is justified when we have a large sample. The numbers appearing in our tables (taken from Durbin et al p. 50) are based on 60,000 nucleotides.

35 Question 2: Finding CpG Islands
Given a long genomic string with possible CpG Islands, we define a Markov Chain over 8 states, all interconnected (hence it is ergodic): The problem is that we don’t know the sequence of states which are traversed, but just the sequence of letters. A+ C+ G+ T+ A- C- G- T- Therefore we use here Hidden Markov Model

36 Hidden Markov Model M T A Markov chain (s1,…,sL):
x1 x2 XL-1 xL M T A Markov chain (s1,…,sL): and for each state s and a symbol x we have p(Xi=x|Si=s) Application in communication: message sent is (s1,…,sm) but we receive (x1,…,xm) . Compute what is the most likely message sent. Application in speech recognition: word said is (s1,…,sm) but we recorded (x1,…,xm) . Compute what is the most likely word said.

37 Hidden Markov Model M T Notations:
SL-1 SL x1 x2 XL-1 xL M T Notations: Markov Chain transition probabilities: p(Si+1= t|Si = s) = ast Emission probabilities: p(Xi = b| Si = s) = es(b) For Markov Chains we know: What is p(s,x) = p(s1,…,sL;x1,…,xL)?

38 p(Xi=xi|x1,..,xi-1,xi+1,..,xL,s1,..,si,..,sL) = esi(xi)
Hidden Markov Model S1 S2 SL-1 SL x1 x2 XL-1 xL M T p(Xi = b| Si = s) = es(b), means that the probability of xi depends only on the probability of si. Formally, this is equivalent to the conditional independence assumption: p(Xi=xi|x1,..,xi-1,xi+1,..,xL,s1,..,si,..,sL) = esi(xi) Thus


Download ppt "Markov Chains Lecture #5"

Similar presentations


Ads by Google