Download presentation
Presentation is loading. Please wait.
1
Suffix arrays
2
Suffix array We loose some of the functionality but we save space. Let s = abab Sort the suffixes lexicographically: ab, abab, b, bab The suffix array gives the indices of the suffixes in sorted order 2031
3
How do we build it ? Build a suffix tree Traverse the tree in DFS, lexicographically picking edges outgoing from each node and fill the suffix array. O(n) time
4
How do we search for a pattern ? If P occurs in T then all its occurrences are consecutive in the suffix array. Do a binary search on the suffix array Takes O(mlogn) time
5
Example Let S = mississippi i ippi issippi ississippi mississippi pi 7 4 1 0 9 8 6 3 10 5 2 ppi sippi sisippi ssippi ssissippi L R Let P = issa M
6
How do we accelerate the search ? L R Maintain = LCP(P,L) Maintain r = LCP(P,R) Assume ≥ r M r
7
L R M r If = r then start comparing M to P at + 1
8
L R M r > r
9
L R M r Someone whispers LCP(L,M) LCP(L,M) >
10
L R M r Continue in the right half LCP(L,M) >
11
L R M r LCP(L,M) <
12
L R M r LCP(L,M) < Continue in the left half
13
L R M r LCP(L,M) = start comparing M to P at + 1
14
Analysis If we do more than a single comparison in an iteration then max(, r ) grows by 1 for each comparison O(m + logn) time
15
Construct the suffix array without the suffix tree
16
Linear time construction Recursively ? Say we want to sort only suffixes that start at even positions ?
17
Change the alphabet You in fact sort suffixes of a string shorter by a factor of 2 ! Every pair of characters is now a character
18
Change the alphabet a$0 aa1 ab2 b$3 ba4 bb5 $ a b a aa b 2 12
19
But we do not gain anything…
20
Divide into triples $ yab ba da b bad o abb ada bba do$
21
Divide into triples $ yab ba da b bad o abb ada bba do$ $ yab ba da b bad o bba dab bad o$$
22
Sort recursively 2/3 of the suffixes $ yab ba da b bad o abb ada bba do$ bba dab bad o$$ 124645 37 016425 37 $ yab ba da b bad o 142653 78 148275 1011 1234 56 789101112 0 01234 56 7
23
Sort the remaining third $ yab ba da b bad o 142653 78 (b, 2)(a, 5) (a, 7) (y, 1) (b, 2) (a, 5) (a, 7) (y, 1) 3 6 9 0 1234 56 789101112 0 148275 1011
24
Merge $ yab ba da b bad o 142653 78 1234 56 789101112 1 0 3 6 9 0 148275 1011
25
Merge $ yab ba da b bad o 142653 78 1234 56 789101112 1 0 3 6 9 0 48275 1011 6
26
Merge $ yab ba da b bad o 142653 78 1234 56 789101112 1 0 39 0 48275 1011 64
27
Merge $ yab ba da b bad o 142653 78 1234 56 789101112 1 0 39 0 8275 1011 649
28
Merge $ yab ba da b bad o 142653 78 1234 56 789101112 1 0 3 0 8275 1011 6493
29
Merge $ yab ba da b bad o 142653 78 1234 56 789101112 1 0 0 8275 1011 64938
30
Merge $ yab ba da b bad o 142653 78 1234 56 789101112 1 0 0 275 1011 649382
31
Merge $ yab ba da b bad o 142653 78 1234 56 789101112 1 0 0 75 1011 6493827
32
Merge $ yab ba da b bad o 142653 78 1234 56 789101112 1 0 0 5 1011 64938275
33
Merge $ yab ba da b bad o 142653 78 1234 56 789101112 1 0 0 1011 64938275
34
Merge $ yab ba da b bad o 142653 78 1234 56 789101112 1 0 649382751011 0
35
summary $ yab ba da b bad o 142653 78 1234 56 789101112 1 0 6493827510110 When comparing to a suffix with index 1 (mod 3) we compare the char and break ties by the ranks of the following suffixes When comparing to a suffix with index 2 (mod 3) we compare the char, the next char if there is a tie, and finally the ranks of the following suffixes
36
Compute LCP’s $ yab ba da b bad o 1217928 1011 1234 56 789101112 1 0 6493827510110 abbado$ abbadabbado$ adabbado$ ado$ badabbado$ bado$ bbadabbado$ bbado$ dabbado$ do$ o$ yabbadabbado$ 536 4 1 4 8 2 7 5 10 11 0 6 3 9
37
Crucial observation $ yab ba da b bad o 1217928 1011 1234 56 789101112 1 0 6493827510110 abbado$ abbadabbado$ adabbado$ ado$ badabbado$ bado$ bbadabbado$ bbado$ dabbado$ do$ o$ yabbadabbado$ LCP(i,j) = min {LCP(i,i+1),LCP(i+1,i+2),….,LCP(j-1,j)} 536 4 1 4 8 2 7 5 10 11 0 6 3 9
38
$ yab ba da b bad o 1217928 1011 1234 56 789101112 1 0 6493827510110 abbado$ abbadabbado$ adabbado$ ado$ badabbado$ bado$ bbadabbado$ bbado$ dabbado$ do$ o$ yabbadabbado$ LCP(11,0) 536 4 1 6 4 9 3 8 2 7 5 10 11 0 0 Find LCP’s of consecutive suffixes
39
$ yab ba da b bad o 1217928 1011 1234 56 789101112 1 0 6493827510110 abbado$ abbadabbado$ adabbado$ ado$ badabbado$ bado$ bbadabbado$ bbado$ dabbado$ do$ o$ yabbadabbado$ LCP(8,2) 536 4 1 6 4 9 3 8 2 7 5 10 11 0 01
40
$ yab ba da b bad o 1217928 1011 1234 56 789101112 1 0 6493827510110 abbado$ abbadabbado$ adabbado$ ado$ badabbado$ bado$ bbadabbado$ bbado$ dabbado$ do$ o$ yabbadabbado$ LCP(9,3) 536 4 1 6 4 9 3 8 2 7 5 10 11 0 01 0
41
$ yab ba da b bad o 1217928 1011 1234 56 789101112 1 0 6493827510110 abbado$ abbadabbado$ adabbado$ ado$ badabbado$ bado$ bbadabbado$ bbado$ dabbado$ do$ o$ yabbadabbado$ LCP(6,4) 536 4 1 6 4 9 3 8 2 7 5 10 11 0 101 0
42
$ yab ba da b bad o 1217928 1011 1234 56 789101112 1 0 6493827510110 abbado$ abbadabbado$ adabbado$ ado$ badabbado$ bado$ bbadabbado$ bbado$ dabbado$ do$ o$ yabbadabbado$ LCP(7,5) 536 4 1 6 4 9 3 8 2 7 5 10 11 0 0101 0
43
$ yab ba da b bad o 1217928 1011 1234 56 789101112 1 0 6493827510110 abbado$ abbadabbado$ adabbado$ ado$ badabbado$ bado$ bbadabbado$ bbado$ dabbado$ do$ o$ yabbadabbado$ LCP(1,6) 536 4 1 6 4 9 3 8 2 7 5 10 11 0 5 0101 0
44
$ yab ba da b bad o 1217928 1011 1234 56 789101112 1 0 6493827510110 abbado$ abbadabbado$ adabbado$ ado$ badabbado$ bado$ bbadabbado$ bbado$ dabbado$ do$ o$ yabbadabbado$ LCP(2,7) 536 4 1 6 4 9 3 8 2 7 5 10 11 0 45 0101 0
45
$ yab ba da b bad o 1217928 1011 1234 56 789101112 1 0 6493827510110 abbado$ abbadabbado$ adabbado$ ado$ badabbado$ bado$ bbadabbado$ bbado$ dabbado$ do$ o$ yabbadabbado$ 536 4 1 6 4 9 3 8 2 7 5 10 11 0 45 0101 0 LCP(3,8) 3
46
$ yab ba da b bad o 1217928 1011 1234 56 789101112 1 0 6493827510110 abbado$ abbadabbado$ adabbado$ ado$ badabbado$ bado$ bbadabbado$ bbado$ dabbado$ do$ o$ yabbadabbado$ 536 4 1 6 4 9 3 8 2 7 5 10 11 0 45 0101 0 LCP(4,9) 3 2
47
$ yab ba da b bad o 1217928 1011 1234 56 789101112 1 0 6493827510110 abbado$ abbadabbado$ adabbado$ ado$ badabbado$ bado$ bbadabbado$ bbado$ dabbado$ do$ o$ yabbadabbado$ 536 4 1 6 4 9 3 8 2 7 5 10 11 0 45 0101 0 LCP(5,10) 3 21
48
$ yab ba da b bad o 1217928 1011 1234 56 789101112 1 0 6493827510110 abbado$ abbadabbado$ adabbado$ ado$ badabbado$ bado$ bbadabbado$ bbado$ dabbado$ do$ o$ yabbadabbado$ 536 4 1 6 4 9 3 8 2 7 5 10 11 0 45 0101 0 LCP(10,11) 3 210
49
$ yab ba da b bad o 1217928 1011 1234 56 789101112 1 0 6493827510110 abbado$ abbadabbado$ adabbado$ ado$ badabbado$ bado$ bbadabbado$ bbado$ dabbado$ do$ o$ yabbadabbado$ 536 4 1 6 4 9 3 8 2 7 5 10 11 0 45 0101 0 The starting position deceases by 1 in every iteration. So it cannot increase more than O(n) times 3 210 Analysis
50
$ yab ba da b bad o 1217928 1011 1234 56 789101112 1 0 6493827510110 536 4 45 0101 0 3 210 We need more LCPs for search Linearly many, calculate the all bottom up
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.