Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions Yuri Lifshits (Caltech) Shay Mozes (Brown Uni.) Oren Weimann (MIT) Michal Ziv-Ukelson.

Similar presentations


Presentation on theme: "Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions Yuri Lifshits (Caltech) Shay Mozes (Brown Uni.) Oren Weimann (MIT) Michal Ziv-Ukelson."— Presentation transcript:

1 Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions Yuri Lifshits (Caltech) Shay Mozes (Brown Uni.) Oren Weimann (MIT) Michal Ziv-Ukelson (Ben-Gurion Uni.)

2 Hidden Markov Models (HMMs) Transition probabilities: the probability of the weather given the previous day's weather. P i←j = the probability to make a transition to state i from state j SunnyRainyCloudy Hidden States : sunny, rainy, cloudy. q 1, …, q k

3 Emission probabilities : the probability of observing a particular observable state given that we are in a particular hidden state. e i (σ) = the probability to observe σєΣ given that the state is i Observable states : the states of the process that are `visible‘ Σ ={soggy, damp, dryish, dry} Humidity in IBM Hidden Markov Models (HMMs)

4 Shortly: Hidden Markov Models are extensively used to model processes in many fields (error-correction, speech recognition, computational linguistics, bioinformatics) We show how to exploit repetitions to obtain speedup of HMM algorithms Can use different compression schemes Applies to several decoding and training algorithms

5 HMM Decoding and Training Decoding: Given the model (emission and transition probabilities) and an observed sequence X, find the hidden sequence of states that is most likely to have generated the observed sequence. –X = dry,dry,damp,soggy… Training: Given the number of hidden states and an observed sequence estimate emission and transition probabilities P i←j, e i (σ)

6 1 k 2 k 1 2 1 k time states observed string 2 k 1 2 x1x1 x2x2 xnxn x3x3 Decoding Decoding: Given the model and the observed string find the hidden sequence of states that is most likely to have generated the observed string

7 probability of best sequence of states that emits first 5 chars and ends in state 2 v 6 [4]= e 4 (c)·P 4←2 · v 5 [2] probability of best sequence of states that emits first 5 chars and ends in state j v 6 [4]= P 4←2 · v 5 [2]v 6 [4]= v 5 [2] v 6 [4]=max j { e 4 (c)·P 4←j · v 5 [j] } v 5 [2] Decoding – Viterbi’s Algorithm (VA) 1 23456789n 1 2 3 4 5 6 aacgacggt states time

8 Outline Overview Exploiting repetitions Using Four-Russions speedup Using LZ78 Using Run-Length Encoding Training Summary of results

9 v n =M(x n ) ⊗ M(x n-1 ) ⊗ ··· ⊗ M(x 1 ) ⊗ v 0 v 2 = M(x 2 ) ⊗ M(x 1 ) ⊗ v 0 VA in Matrix Notation Viterbi’s algorithm: v 1 [i]=max j { e i (x 1 )·P i←j · v 0 [j] } v 1 [i]=max j { M ij (x 1 ) · v 0 [j] } M ij (σ) = e i (σ)·P i←j v 1 = M(x 1 ) ⊗ v 0 (A ⊗ B) ij = max k {A ik ·B kj } O(k 2 n) O(k 3 n) 2 k 1 2 k 1 σ ji 2 k 1 2 k 1 jk 1 2 k i x1x1 x2x2

10 use it twice! v n =M(W) ⊗ M(t) ⊗ M(W) ⊗ M(t) ⊗ M(a) ⊗ M(c) ⊗ v 0 Exploiting Repetitions c a t g a a c t g a a c 12 steps 6 steps v n =M(c) ⊗ M(a) ⊗ M(a) ⊗ M(g) ⊗ M(t) ⊗ M(c) ⊗ M(a) ⊗ M(a) ⊗ M(g) ⊗ M(t) ⊗ M(a) ⊗ M(c) ⊗ v 0 compute M(W) = M(c) ⊗ M(a) ⊗ M(a) ⊗ M(g) once

11 ℓ - length of repetition W λ – number of times W repeats in string computing M(W) costs ( ℓ -1)k 3 each time W appears we save ( ℓ -1)k 2 W is good if λ( ℓ -1)k 2 > ( ℓ -1)k 3 number of repeats = λ > k = number of states Exploiting repetitions > matrix-matrix multiplication matrix-vector multiplication

12 I. dictionary selection: choose the set D={W i } of good substrings II. encoding: compute M(W i ) for every W i in D III. parsing: partition the input X into good substrings X’ = W i 1 W i 2 … W i n’ IV. propagation: run Viterbi’s Algorithm on X’ using M(W i ) General Scheme Offline

13 Outline Overview Exploiting repetitions Using Four-Russions speedup Using LZ78 Using Run-Length Encoding Training Summary of results

14 I. O(1) II. O(2|Σ| l k 3 ) III. O(n) IV. O(k 2 n / l ) Using the Four-Russians Method Cost I. dictionary selection: D = all strings over Σ of length < l II. encoding: incremental construction M(Wσ)= M(W) ⊗ M(σ) III. parsing: X’ = split X to words of length l IV. propagation: run VA on X’ using M(W i ) Speedup: k 2 n O(2|Σ| l k 3 + k 2 n / l ) = Θ (log n)

15 Outline Overview Exploiting repetitions Using Four-Russions speedup Using LZ78 Using Run-Length Encoding Training Summary of results

16 Lempel Ziv 78 The next LZ-word is the longest LZ-word previously seen plus one character Use a trie a c g g aacgacg Number of LZ-words is asymptotically < n ∕ log n

17 I. O(n) II. O(k 3 n ∕ log n) III. O(n) IV. O(k 2 n ∕ log n) Using LZ78 Cost I. dictionary selection: D = all LZ-words in X II. encoding: use incremental nature of LZ M(Wσ)= M(W) ⊗ M(σ) III. parsing: X’ = LZ parse of X IV. propagation: run VA on X’ using M(W i ) Speedup: k 2 n log n k 3 n ∕ log n k

18 Remember speedup condition: λ > k Use just LZ-words that appear more than k times These words are represented by trie nodes with more than k descendants Now must parse X (step III) differently Ensures graceful degradation with increasing k: Speedup: min(1,log n ∕ k) Improvement a c g g

19 Experimental Results – CpG Islands Short - 1.5Mbp chromosome 4 of S. Cerevisiae (yeast) Long - 22Mbp human Y-chromosome ~x5 faster:

20 Outline Overview Exploiting repetitions Using Four-Russions speedup Using LZ78 Using Run-Length Encoding Training Summary of results

21 Run Length Encoding aaaccggggg → a 3 c 2 g 5 aaaccggggg → a 2 a 1 c 2 g 4 g 1 | Offline

22 Path traceback In VA, easy to do in O(n) time by keeping track of maximizing states during computation The problem: we only get the states on the boundaries of good substrings of X Solution: keep track of maximizing states when computing the matrices M(W)=M(W 1 ) ⊗ M(W 2 ). Takes O(n) time and O(n’k 2 ) space

23 Outline Overview Exploiting repetitions Using Four-Russions speedup Using LZ78 Using Run-Length Encoding Training Summary of results

24 Training Estimate model θ = {P i←j, e i (σ)} given X. –find θ that maximize P(X | θ). Use Expectation Maximization: 1.Decoding using current θ 2.Use decoding result to update θ

25 VA Training A ij = #of times state i follows state j in the most likely sequence of states. E i (σ) = #of times the letter σ is emitted by the state i in the most likely sequence. Each iteration costs O( VA + n + k 2 ) Decoding (bottleneck) speedup! path traceback + update P i←j, e i (σ)

26 The Forward-Backward Algorithm –The forward algorithm calculates f t [ i ] the probability to observe the sequence x 1, x 2, …, x t requiring that the t’th state is i. –The backward algorithm calculates b t [ i ] the probability to observe the sequence x t+1, x t+2, …, x n given that the t’th state is i. f t =M(x t ) ● M(x t-1 ) ● … ● M(x 1 ) ● f 0 b t = b n ● M(x n ) ● M(x n-1 ) ● … ● M(x t+2 ) ● M(x t+1 )

27 Baum Welch Training (in a nutshell) A ij = f t [j] ● P i←j ● e i (x t+ 1 ) ● b t+ 1 [i] each iteration costs: O( FB + nk 2 ) If substring W has length l and repeats λ times satisfies: then can speed up the entire process by precalculation path traceback + update P i←j, e i (σ) Decoding O(nk 2 ) Σ t

28 Outline Overview Exploiting repetitions Using Four-Russions speedup Using LZ78 Using Run-Length Encoding Training Summary of results

29 General framework Four-Russianslog(n) LZ78 log(n) ∕ k RLE r ∕ log(r) Byte-Pair Encodingr SLP, LZ77r/k Path reconstructionO(n) Forward-Backward same speedups Viterbi training same speedups Baum-Welch trainingspeedup, many details Parallelization

30 Thank you!


Download ppt "Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions Yuri Lifshits (Caltech) Shay Mozes (Brown Uni.) Oren Weimann (MIT) Michal Ziv-Ukelson."

Similar presentations


Ads by Google