Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions Shay Mozes Oren Weimann (MIT) Michal Ziv-Ukelson (Tel-Aviv U.)

2 Shortly: Hidden Markov Models are extensively used to model processes in many fields The runtime of HMM algorithms is usually linear in the length of the input We show how to exploit repetitions to obtain speedup First provable speedup of Viterbi’s algorithm Can use different compression schemes Applies to several decoding and training algorithms

3 Markov Models q1q1 q2q2 states q 1, …, q k transition probabilities P i←j emission probabilities e i (σ) σєΣ time independent, discrete, finite e 1 (A) = 0.3 e 1 (C) = 0.2 e 1 (G) = 0.2 e 1 (T) = 0.3 e 2 (A) = 0.2 e 2 (C) = 0.3 e 2 (G) = 0.3 e 2 (T) = 0.2 P 1←1 = 0.9P 2←1 = 0.1P 2←2 = 0.8 P 1←2 = 0.2

4 Hidden Markov Models 1 k 2 k 1 2 1 k time states observed string 2 k 1 2 x1x1 x2x2 xnxn x3x3 Markov Models We are only given the description of the model and the observed string Decoding: find the hidden sequence of states that is most likely to have generated the observed string

5 probability of best sequence of states that emits first 5 chars and ends in state 2 v 6 [4]= e 4 (c)·P 4←2 · v 5 [2] probability of best sequence of states that emits first 5 chars and ends in state j v 6 [4]= P 4←2 · v 5 [2]v 6 [4]= v 5 [2] v 6 [4]=max j { e 4 (c)·P 4←j · v 5 [j] } v 5 [2] Decoding – Viterbi’s Algorithm 1 23456789n 1 2 3 4 5 6 aacgacggt states time

6 Outline Overview Exploiting repetitions Using LZ78 Using Run-Length Encoding Summary of results

7 v n =M(x n ) ⊗ M(x n-1 ) ⊗ ··· ⊗ M(x 1 ) ⊗ v 0 v 2 = M(x 2 ) ⊗ M(x 1 ) ⊗ v 0 VA in Matrix Notation Viterbi’s algorithm: v 1 [i]=max j { e i (x 1 )·P i←j · v 0 [j] } v 1 [i]=max j { M ij (x 1 ) · v 0 [j] } M ij (σ) = e i (σ)·P i←j v 1 = M(x 1 ) ⊗ v 0 (A ⊗ B) ij = max k {A ik ·B kj } O(k 2 n) O(k 3 n)

8 use it twice! v n =M(W) ⊗ M(t) ⊗ M(W) ⊗ M(t) ⊗ M(a) ⊗ M(c) ⊗ v 0 Exploiting Repetitions c a t g a a c t g a a c 12 steps 6 steps v n =M(c) ⊗ M(a) ⊗ M(a) ⊗ M(g) ⊗ M(t) ⊗ M(c) ⊗ M(a) ⊗ M(a) ⊗ M(g) ⊗ M(t) ⊗ M(a) ⊗ M(c) ⊗ v 0 compute M(W) = M(c) ⊗ M(a) ⊗ M(a) ⊗ M(g) once

9 ℓ - length of repetition W λ – number of times W repeats in string computing M(W) costs ( ℓ -1)k 3 each time W appears we save ( ℓ -1)k 2 W is good if λ( ℓ -1)k 2 > ( ℓ -1)k 3 number of repeats λ > k number of states Exploiting repetitions > matrix-matrix multiplication matrix-vector multiplication

10 I. dictionary selection: choose the set D={W i } of good substrings II. encoding: compute M(W i ) for every W i in D III. parsing: partition the input X into good substrings X = W i 1 W i 2 … W i n’ X’ = i 1,i 2, …,i n’ IV. propagation: run Viterbi’s Algorithm on X’ using M(W i ) General Scheme Offline

12 LZ78 The next LZ-word is the longest LZ-word previously seen plus one character Use a trie a c g g aacgacg Number of LZ-words is asymptotically < n ∕ log n

13 I. O(n) II. O(k 3 n ∕ log n) III. O(n) IV. O(k 2 n ∕ log n) Using LZ78 Cost I. dictionary selection: D = words in LZ parse of X II. encoding: use incremental nature of LZ M(Wσ)= M(W) ⊗ M(σ) III. parsing: X’ = LZ parse of X IV. propagation: run VA on X’ using M(W i ) Speedup: k 2 n log n k 3 n ∕ log n k

14 Remember speedup condition: λ > k Use just LZ-words that appear more than k times These words are represented by trie nodes with more than k descendants Now must parse X (step III) differently Ensures graceful degradation with increasing k: Speedup: min(1,log n ∕ k) Improvement a c g g

15 Experimental results Short - 1.5Mbp chromosome 4 of S. Cerevisiae (yeast) Long - 22Mbp human Y-chromosome ~x5 faster:

17 Run Length Encoding aaaccggggg → a 3 c 2 g 5 aaaccggggg → a 2 a 1 c 2 g 4 g 1

18 Summary of results General framework LZ78 log(n) ∕ k RLE r ∕ log(r) Byte-Pair Encodingr Path reconstructionO(n) F/B algorithms (standard matrix multiplication) Viterbi training same speedups apply Baum-Welch trainingspeedup, many details Parallelization

19 Thank you! Any questions?

20 Path traceback In VA, easy to do in O(n) time by keeping track of maximizing states during computation The problem: we run VA on X’, so we get the sequence of states for X’, not for X. we only get the states on the boundaries of good substrings of X Solution: keep track of maximizing states when computing the matrices M(w). Takes O(n) time and O(nk 2 ) space

21 Training Estimate unknown parameters P i←j, e i (σ) Use Expectation Maximization: 1.Decoding 2.Recalculate parameters Viterbi Training: each iteration costs O( VA + n + k 2 ) Decoding (bottleneck) speedup! path traceback + update P i←j, e i (σ)

22 Baum Welch Training each iteration costs: O( FB + nk 2 ) If substring w has length l and repeats λ times satisfies: then can speed up the entire process by precalculation path traceback + update P i←j, e i (σ) Decoding O(nk 2 )

Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions Shay Mozes Oren Weimann (MIT) Michal Ziv-Ukelson (Tel-Aviv U.)

Similar presentations

Presentation on theme: "Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions Shay Mozes Oren Weimann (MIT) Michal Ziv-Ukelson (Tel-Aviv U.)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions Shay Mozes Oren Weimann (MIT) Michal Ziv-Ukelson (Tel-Aviv U.)

Similar presentations

Presentation on theme: "Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions Shay Mozes Oren Weimann (MIT) Michal Ziv-Ukelson (Tel-Aviv U.)"— Presentation transcript:

Similar presentations

About project

Feedback