Download presentation
Presentation is loading. Please wait.
1
Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions Yuri Lifshits (Caltech) Shay Mozes (Brown Uni.) Oren Weimann (MIT) Michal Ziv-Ukelson (Ben-Gurion Uni.)
2
Hidden Markov Models (HMMs) Transition probabilities: the probability of the weather given the previous day's weather. P i←j = the probability to make a transition to state i from state j SunnyRainyCloudy Hidden States : sunny, rainy, cloudy. q 1, …, q k
3
Emission probabilities : the probability of observing a particular observable state given that we are in a particular hidden state. e i (σ) = the probability to observe σєΣ given that the state is i Observable states : the states of the process that are `visible‘ Σ ={soggy, damp, dryish, dry} Humidity in IBM Hidden Markov Models (HMMs)
4
Shortly: Hidden Markov Models are extensively used to model processes in many fields (error-correction, speech recognition, computational linguistics, bioinformatics) We show how to exploit repetitions to obtain speedup of HMM algorithms Can use different compression schemes Applies to several decoding and training algorithms
5
HMM Decoding and Training Decoding: Given the model (emission and transition probabilities) and an observed sequence X, find the hidden sequence of states that is most likely to have generated the observed sequence. –X = dry,dry,damp,soggy… Training: Given the number of hidden states and an observed sequence estimate emission and transition probabilities P i←j, e i (σ)
6
1 k 2 k 1 2 1 k time states observed string 2 k 1 2 x1x1 x2x2 xnxn x3x3 Decoding Decoding: Given the model and the observed string find the hidden sequence of states that is most likely to have generated the observed string
7
probability of best sequence of states that emits first 5 chars and ends in state 2 v 6 [4]= e 4 (c)·P 4←2 · v 5 [2] probability of best sequence of states that emits first 5 chars and ends in state j v 6 [4]= P 4←2 · v 5 [2]v 6 [4]= v 5 [2] v 6 [4]=max j { e 4 (c)·P 4←j · v 5 [j] } v 5 [2] Decoding – Viterbi’s Algorithm (VA) 1 23456789n 1 2 3 4 5 6 aacgacggt states time
8
Outline Overview Exploiting repetitions Using Four-Russions speedup Using LZ78 Using Run-Length Encoding Training Summary of results
9
v n =M(x n ) ⊗ M(x n-1 ) ⊗ ··· ⊗ M(x 1 ) ⊗ v 0 v 2 = M(x 2 ) ⊗ M(x 1 ) ⊗ v 0 VA in Matrix Notation Viterbi’s algorithm: v 1 [i]=max j { e i (x 1 )·P i←j · v 0 [j] } v 1 [i]=max j { M ij (x 1 ) · v 0 [j] } M ij (σ) = e i (σ)·P i←j v 1 = M(x 1 ) ⊗ v 0 (A ⊗ B) ij = max k {A ik ·B kj } O(k 2 n) O(k 3 n) 2 k 1 2 k 1 σ ji 2 k 1 2 k 1 jk 1 2 k i x1x1 x2x2
10
use it twice! v n =M(W) ⊗ M(t) ⊗ M(W) ⊗ M(t) ⊗ M(a) ⊗ M(c) ⊗ v 0 Exploiting Repetitions c a t g a a c t g a a c 12 steps 6 steps v n =M(c) ⊗ M(a) ⊗ M(a) ⊗ M(g) ⊗ M(t) ⊗ M(c) ⊗ M(a) ⊗ M(a) ⊗ M(g) ⊗ M(t) ⊗ M(a) ⊗ M(c) ⊗ v 0 compute M(W) = M(c) ⊗ M(a) ⊗ M(a) ⊗ M(g) once
11
ℓ - length of repetition W λ – number of times W repeats in string computing M(W) costs ( ℓ -1)k 3 each time W appears we save ( ℓ -1)k 2 W is good if λ( ℓ -1)k 2 > ( ℓ -1)k 3 number of repeats = λ > k = number of states Exploiting repetitions > matrix-matrix multiplication matrix-vector multiplication
12
I. dictionary selection: choose the set D={W i } of good substrings II. encoding: compute M(W i ) for every W i in D III. parsing: partition the input X into good substrings X’ = W i 1 W i 2 … W i n’ IV. propagation: run Viterbi’s Algorithm on X’ using M(W i ) General Scheme Offline
13
Outline Overview Exploiting repetitions Using Four-Russions speedup Using LZ78 Using Run-Length Encoding Training Summary of results
14
I. O(1) II. O(2|Σ| l k 3 ) III. O(n) IV. O(k 2 n / l ) Using the Four-Russians Method Cost I. dictionary selection: D = all strings over Σ of length < l II. encoding: incremental construction M(Wσ)= M(W) ⊗ M(σ) III. parsing: X’ = split X to words of length l IV. propagation: run VA on X’ using M(W i ) Speedup: k 2 n O(2|Σ| l k 3 + k 2 n / l ) = Θ (log n)
15
Outline Overview Exploiting repetitions Using Four-Russions speedup Using LZ78 Using Run-Length Encoding Training Summary of results
16
Lempel Ziv 78 The next LZ-word is the longest LZ-word previously seen plus one character Use a trie a c g g aacgacg Number of LZ-words is asymptotically < n ∕ log n
17
I. O(n) II. O(k 3 n ∕ log n) III. O(n) IV. O(k 2 n ∕ log n) Using LZ78 Cost I. dictionary selection: D = all LZ-words in X II. encoding: use incremental nature of LZ M(Wσ)= M(W) ⊗ M(σ) III. parsing: X’ = LZ parse of X IV. propagation: run VA on X’ using M(W i ) Speedup: k 2 n log n k 3 n ∕ log n k
18
Remember speedup condition: λ > k Use just LZ-words that appear more than k times These words are represented by trie nodes with more than k descendants Now must parse X (step III) differently Ensures graceful degradation with increasing k: Speedup: min(1,log n ∕ k) Improvement a c g g
19
Experimental Results – CpG Islands Short - 1.5Mbp chromosome 4 of S. Cerevisiae (yeast) Long - 22Mbp human Y-chromosome ~x5 faster:
20
Outline Overview Exploiting repetitions Using Four-Russions speedup Using LZ78 Using Run-Length Encoding Training Summary of results
21
Run Length Encoding aaaccggggg → a 3 c 2 g 5 aaaccggggg → a 2 a 1 c 2 g 4 g 1 | Offline
22
Path traceback In VA, easy to do in O(n) time by keeping track of maximizing states during computation The problem: we only get the states on the boundaries of good substrings of X Solution: keep track of maximizing states when computing the matrices M(W)=M(W 1 ) ⊗ M(W 2 ). Takes O(n) time and O(n’k 2 ) space
23
Outline Overview Exploiting repetitions Using Four-Russions speedup Using LZ78 Using Run-Length Encoding Training Summary of results
24
Training Estimate model θ = {P i←j, e i (σ)} given X. –find θ that maximize P(X | θ). Use Expectation Maximization: 1.Decoding using current θ 2.Use decoding result to update θ
25
VA Training A ij = #of times state i follows state j in the most likely sequence of states. E i (σ) = #of times the letter σ is emitted by the state i in the most likely sequence. Each iteration costs O( VA + n + k 2 ) Decoding (bottleneck) speedup! path traceback + update P i←j, e i (σ)
26
The Forward-Backward Algorithm –The forward algorithm calculates f t [ i ] the probability to observe the sequence x 1, x 2, …, x t requiring that the t’th state is i. –The backward algorithm calculates b t [ i ] the probability to observe the sequence x t+1, x t+2, …, x n given that the t’th state is i. f t =M(x t ) ● M(x t-1 ) ● … ● M(x 1 ) ● f 0 b t = b n ● M(x n ) ● M(x n-1 ) ● … ● M(x t+2 ) ● M(x t+1 )
27
Baum Welch Training (in a nutshell) A ij = f t [j] ● P i←j ● e i (x t+ 1 ) ● b t+ 1 [i] each iteration costs: O( FB + nk 2 ) If substring W has length l and repeats λ times satisfies: then can speed up the entire process by precalculation path traceback + update P i←j, e i (σ) Decoding O(nk 2 ) Σ t
28
Outline Overview Exploiting repetitions Using Four-Russions speedup Using LZ78 Using Run-Length Encoding Training Summary of results
29
General framework Four-Russianslog(n) LZ78 log(n) ∕ k RLE r ∕ log(r) Byte-Pair Encodingr SLP, LZ77r/k Path reconstructionO(n) Forward-Backward same speedups Viterbi training same speedups Baum-Welch trainingspeedup, many details Parallelization
30
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.