. Odds and Ends Tutorial #13 © Ilan Gronau
2 The Noisy Transmission Model
3 0 I0I0 1 I1I I0I0 I1I I0I I1I Transitions: Stationary distribution: (8, 24, 1, 3)/36
4 Questions Given an output sequence (including blanks), what is the most probable path which yields this sequence? (1.c) - Viterbi algorithm Given an output sequence, what is the most probable path to yield it, which passes through M non-noise states ( 0/1 )? (1.d) Given an output sequence, what is the most probable path to yield it? (bonus) Given an output sequence, what is the most probable transmission? Problem: each transmission corresponds to multiple paths!
5 Answer to 1d Given an output sequence X 1,…,X n and M, we calculate the following values for all states S, i=1..n and j=1..M : v S (i,j) – log-probability of most probable path yielding output X 1,…,X i, passing through j non-noise states, and ending in state S. Initialize: v S (0,0) – initial log-probability of S (stationary distribution) For i,j>0 and a=0/1 : Hold update-pointers Most values are -∞ t(∙,∙), e(∙,∙) are log-probabilities
6 Answer to 1d Given an output sequence X 1,…,X n and M, we calculate the following values for all states S, i=1..n and j=1..M : v S (i,j) – log-probability of most probable path yielding output X 1,…,X i, passing through j non-noise states, and ending in state S. Recursion formulae: (For i,j>0 and a=0/1 ) At the end choose: and follow pointers to recover path Hold update-pointers
7 Bonus Given an output sequence, what is the most probable path to yield it? Approach 1: If we don’t know M, then we can fill in the tables column by column Eventually the probability of columns starts deteriorating Approach 2: a-priori bound Note that an optimal path doesn’t have 2 consecutive deletions (-) SiSi Si+1Si+1 S i+2 -- SiSi S i+2 -- Pr < Conclusion: M < 2n+2
8 2-species Evolution Observe the following evolution model for binary-character vectors: Each specie corresponds to a binary vector in {0,1} n Two species Y,Z evolve from a common ancestor X Each bit in X is chosen uniformly by random Each bit in X is flipped w.p. θ during evolution towards Y or Z Given binary vectors for Y, Z calculate most probable value for θ 1.Define the sufficient statistics of the problem 2.Give formula for L(θ) 3.Formulate EM algorithm for the problem 4.Give analytic solution (if exists) for MLE X Y Z θ θ hidden observed
9 2-species Evolution Define the sufficient statistics of the problem Given Y = y 1,…y n and Z = z 1,…z n define n 0 =|{i | y i = z i }|, n 1 =|{i | y i ≠ z i }| Give formula for L(θ) L(θ)= Pr[ Y,Z | θ]= Π i=1..n ( Pr[ Y i,Z i | θ] ) = X Y Z θ θ YiYi ZiZi XiXi Pr[X i,Y i,Z i ] 000½(1-θ) 2 1½θ2½θ2 010½ θ(1-θ) 1 Similarly if Y i =1
10 2-species Evolution Formulate EM algorithm for the problem E – Given θ calculate the expected number of flips from X to Y and Z E(#flips) = Σ i=1..n ( Pr[x i ≠ y i ] + Pr[x i ≠ z i ] ) = X Y Z θ θ YZXPr[X,Y,Z]Pr[X|Y,Z] 000½(1-θ) 2 1½θ2½θ2 010½ θ(1-θ) 1 #flips = sum of indicator variables M – Given expected number of flips from X to Y and Z calculate θ’ θ’= E(#flips) / 2n E+M –
11 2-species Evolution Give analytic solution (if exists) for MLE Find extreme-points of log-likelihood: X Y Z θ θ minimum maxima
12 Generalizing The Model Alphabet of size k : Uniform transition model: More complex transition models Evolution of n species (given the phylogenetic topology): X1X1 X2X2 X3X3 θ2θ2 θ1θ1 X4X4 X5X5 θ4θ4 θ3θ3 Y1Y1 Y2Y2 Y3Y3 YnYn observed hidden θ i correlates to evolutionary distance along the edge solves ‘small’ likelihood problem