Reachability on Suffix Tree Graphs Yasuto Higa, Hideo Bannai, Shunsuke Inenaga, and Masayuki Takeda Department of Informatics, Kyushu University
A suffix trie is a kind of index structure. Tree (P. Weiner :1973) A suffix trie is a kind of index structure. The suffix trie of text T is a trie representing the set of suffixes of T. A suffix tree is a compacted suffix trie where the nodes of out degree 1 are removed. First, I’ll talk about the suffix tree. No.1 A suffix trie is a kind of index structure. No.2 The suffix trie of text T is a trie representing the set of suffixes of T. No.3 A suffix tree is a compacted suffix trie where the nodes of out degree 1 are removed. As a result, the space of suffix trees is order n. We assume that the last character of text T is “$” which occurs nowhere else in text T. Space ( n : length of T ) Suffix trie : O(n2) Suffix tree : O(n)
Reachable on suffix tree T = a b a b b a b b b a $ a b ab ab $ b b a v $ a b b b a $ a b b $ $ b abbba$ abbba$ b a b a b a b b A leaf node of the suffix tree represents a suffix of text T. Like this. And An internal node of the suffix tree represents a substring of text T. $ a b b b a b b a u $ b b b $ a b a a $ $ $ Important property Reachable on suffix tree Prefix
Suffix Links abb abb babb babb T = a b a b b a b b b a $ a b $ b b a $ v b b a b u a b a b a b Next I’ll talk about Suffix Links. Notice that dotted red lines represent suffix links. The suffix link of a node points to the node which represents the suffix obtained by removing the first character of this string. $ b b b a b b a $ b b b $ a b a a $ $ $
Suffix Link s Tree T = a b a b b a b b b a $ a b $ b b a $ a b b b a $ And suffix links also form a tree. We call “Suffix Link Tree”. $ a b b b a b b a $ b b b $ a b a a $ $ $
Reachable on suffix link tree T = a b a b b a b b b a $ v u This is a root node and these are leaf nodes. Now you can see this is a tree. Important property Reachable on suffix link tree Suffix
Suffix Tree Graph = Suffix Tree + Suffix Link Tree b $ b b a $ a b b b a $ a b b $ $ b b a b a b a b b $ a b b b a b b a $ b b b $ a b a a $ $ $
Substring inclusion problem T = a b a b b a b b b a $ a b ab ab $ b b a Substring inclusion problem Input : Two nodes u, v of a suffix tree. Output : whether or not the string of u is a substring of that of v. u $ a bba b b b a $ a v b babb babb b $ $ b b a b v a b a b a b Next, I’ll define “Substring inclusion problem”. Two nodes are given and whether or not the string of u is a substring of that of v is return. In this case, the answer is “yes”. In this case, the answer is “No”. This is the Substring inclusion problem. $ b b b a b b a $ b b b $ a b a a $ $ $
Substring inclusion problem Reachability problem on Suffix Tree Graph Reachability on Suffix Tree Graph T = a b a b b a b b b a $ a b $ ab ab b b a Substring inclusion problem Reachability problem on Suffix Tree Graph u $ bba a b b b a $ a v b babb babb b $ $ b b a b v a b a b a b Reachability problem on this graph. I’ll show two example. In this case, u is reachable from v. The string of u is a substring of that of v. In the other case, u is not reachable from v. The string of u is not a substring of that of v. $ b b b a b b a $ b b b $ a b a a $ $ $
- O(n2) Naïve solutions to substring inclusion problem - O(n) O(n3) preprocessing query processing no preprocessing - O(n) precomputing all possible queries O(n3) O(1) matching String no preprocessing - O(n) precomputing transitivity closure O(n2) O(1) reachability Graph We consider the case where test T is fixed and a lot of substring inclusion queries are performed. We can solve this problem but the matter is how effciently. Efficient algorithm for the case where text T is fixed and a lot of substring inclusion queries are performed.
? Interval labeling R. Agrawal, et al.(1989) An algorithm for reachability on DAGs. Query processing time is proportional to the number of interval labels of the node u. The total number of interval labels is O(n2) in the worst case for general DAGs. ? (1,6) (1,5) u v (15,16) (15,15) To do so, we use the Agrawal interval labeling algorithm for reachability on DAGs. This algorithm assigns some interval labels to each node of the DAG. One of these labels is special label, red one. What kind of special? A special label represents the node itself. When we check whether or not u is reachable from v, We have only to check whether or not the special label of v is sub subsumed by some interval label of u. The number of labels of each node depends on the structure of the suffix tree. Query processing time is proportional to the number of interval labels of the node u. The total number of interval labels is order n square in the worst case for general DAGs. Let’s use this algorithm on the suffix tree graph. There are many things I want to say. but (18,18)
Agrawal labeling algorithm on Suffix Tree Graph Input : Suffix Tree Graph Output : labeled Suffix Tree Graph 1 foreach node v in post order on suffix link tree do v.special:=[minimum post order number of subtree of v, post order number of v]; v.labels:={ v.special }; endfch foreach node v in post order on suffix tree do v.labels:=merge v.labels and { labels of children of v}; Remove if s.t. ; 2 3 4 5 6 7 8
Answering reachability query by interval labels (1,20) (1,10) 20 (1,9) (12,14) a b (12,13) (15,16) $ (15,19) (1,6) (1,11) (18,18) 14 11 19 (1,8) (15,16) b b (12,12) (18,18) a (1,9) (15,17) 18 $ (12,13) 17 a (15,15) (1,8) 13 b (1,10) b (12,12) b a (1,6) 10 $ a 12 b b (15,16) $ $ 16 9 b b a b (1,5) 15 (1,9) 8 a b a b (15,15) $ a b Sorry. I don’t have enough time to explain the algorithm. So I’ll show only the result. Now we can answer the reachability query effectively. b 7 (1,8) b (1,6) b a b b a 6 $ (1,7) b b b $ a b 5 a a (1,5) $ 4 $ (1,3) $ 3 (1,4) (1,2) 2 (1,1) 1
Time complexity Query time Each node has at most n labels, because the number of suffix link tree leaves is at most n. The labels of each node can be sorted during preprocessing without increasing time complexity. Therefore, query time is O(log n). (binary search) Each node has at most n labels, because the number of suffix link tree leaf nodes are at most n. The labels of each node can be sorted during preprocessing without increasing time complexity. Therefore, query time is O(log n) by using binary search.
Time complexity Preprocessing time Preprocessing time is proportional to the total number of interval labels. Therefore, we have only to count the total number of interval labels. (Lemma) And Preprocessing time is proportional to the total number of interval labels. This is lemma 2. Therefore, we have only to count the total number ofinterval labels.
The expected total number of interval labels Theorem The expected total number of interval labels is O(n log n) for random strings. The total number of interval labels is at most O(n・(height of the tree)). (Lemma) The expected height of the suffix tree of a random string is O(log n). The expected height of suffix tree of random string is order log n . and The total number of interval labels is at most order n cross the height of the tree. Therefore, The expected total number of interval labels is order n log n. This is the theorem 1. A. Apostolico et al.(1992)
… Worst case Lower bound: The following sequence of strings Xi gives a (the?) lower bound on the total number of labels. X1 = ab1ab2ab1ab1a$ X2 = ab2ab3ab2ab1ab2ab2a$ X3 = ab3ab4ab3ab1ab3ab2ab3ab3a$ (length 11) (length 20) (length 32) Next we consider the lower bound of the algorithm. This sequence of strings Xi gives a lower bound on the total number of labels. … (length )
… = abia Structure of Xi … … i/2 1 2 k … … i-1 i-2 i-k i/2 X1 = ab1ab2ab1ab1a$ X2 = ab2ab3ab2ab1ab2ab2a$ X3 = ab3ab4ab3ab1ab3ab2ab3ab3a$ X4 = ab4ab5ab4ab1ab4ab3ab4ab2ab4ab4a$ X5 = ab5ab6ab5ab1ab5ab4ab5ab2ab5ab3ab5ab5a$ X6 = ab6ab7ab6ab1ab6ab5ab6ab2ab6ab4ab6ab3ab6ab6a$ + = i Let’s think on Xi in a little more detail. Xi is made of two string sequence. One is …the other is… For this regular order, the suffix tree of Xi is very beautiful structure. …
The number of interval labels of each node on the suffix tree of bk bi bk bi a b b b b b b b b a b b b b b b b b … … … … $ b a b b b b b b b b a b b b b b b b … … … … b … … … b a b b b b b b b b a b b b b b … … … b bi a b b b b b b b b a b b b b … … … b This orange is the root and these green square are leaves. The suffix links is… a b b b b b b b b a b b b … … … b bk : leaf node … … b a b b b b b b b b a b : internal node … … b a b b b b b b b b a … …
The number of interval labels of each node on the suffix tree of bk bi bk bi a b b b b b b b b a b b b b b b b b … … … … $ b a b b b b b b b b a b b b b b b b … … … … b … … … b a b b b b b b b b a b b b b b … … … b bi a b b b b b b b b a b b b b … … … The suffix links of leaves are omitted. Since this suffix tree graph is beauty well structured, It’s easy for us to count the interval labels on each node. b a b b b b b b b b a b b b … … … b bk : leaf node … … b a b b b b b b b b a b : internal node … … b a b b b b b b b b a … …
The number of interval labels of each node on the suffix tree of bk bi bk bi a b b b b b b b b a b b b b b b b b … … … … $ b a b b b b b b b b a b b b b b b b … … … … b … … … b a b b b b b b b b a b b b b b … … … b bi a b b b b b b b b a b b b b … … … b a b b b b b b b b a b b b … … … b bk : leaf node … … b a b b b b b b b b a b : internal node … … b a b b b b b b b b a … …
The number of interval labels of each node on the suffix tree of … … … … … … … … … … … … … … Leave only node. This zone is increasing zone. The number of node is increasing like this. … … … … … … … … …
Root and each leaf has one interval label. The number of interval labels of each node on the suffix tree of increasing zone decreasing zone constant zone i+1 … 4 3 2 … … … 1 i+1 … 4 3 2 … … … 3 … i+2 i+1 … … 4 3 2 … k-1 … … … i+1 … 4 3 2 k … … … i+1 … 4 3 2 Leave only node. This zone is increasing zone. The number of node is increasing like this. k+1 … … … … … i+1 … 4 3 2 … … Root and each leaf has one interval label. i+1 i+1 … 4 3 2 … …
# Xi : the total number of labels for Xi constant zone increasing zone Where therefore decreasing zone root & leaves So the total number of labels for Xi is this one. Where i is order root n. Therefore the total number of labels for Xi is order n root n.
Theorem 2 the total number of interval labels is in the worst case. So we can conclude that the total number of interval labels is omega n root n in the worst case. It’s not difficult to prove that the upper bound is order n square. …but is this bound tight?
Upper bound A trivial upper bound is O(n2). …but is this bound tight?
Computational Experiments n : the length of strings Un : the maximum total number of labels for n n 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Un 18 22 26 30 34 39 44 49 54 59 We exhaustively enumerated all strings of length n consisting of a and b, and ending with $. For each n, the number of labels in the worst case was recorded. 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 64 69 74 79 85 91 97 103 109 115 121 127 133 139 145 151 158 We exhaustively enumerated all strings of length n consisting of a and b, and ending with $. For each n, the number of labels in the worst case was recorded. Can you see a pattern? It’s easy to see! Fn is equal to Un for n which is less than 33. Fn has a amazing property. Fn is equal to the total number of labels of strings Xi. F(n) = Un F(n) = #Xi (for all i)
Un = F(n) = #Xi Conjecture : The upper bound is Summary of worst case bounds For all Un = F(n) = #Xi The inductively determined function for Un. The exact upper bound on the total number of interval labels for A lower bound on the total number of interval labels Let’s round up the results. The exact upper bound on the total number of interval labels for n which is less than 33. Fn is the inductively determined function for Un. Un is equal to Fn between n= 7 and n=33. And amazingly, Fn is equal to the total number of interval labels of Xi for all i. So we Conjectured that The upper bound is also order n root n. Conjecture : The upper bound is The complexity of the algorithm is
precomputing transitivity closure Conclusion We considered the Substring inclusion problem and showed that it can be reduced to the Reachability problem on Suffix Tree Graphs. We showed bounds for Agrawal’s interval labeling algorithm when applied to Suffix Tree Graphs. preprocessing query processing no preprocessing - O(n) O(log n) precomputing transitivity closure O(n2) O(1) expected our results for Xi
Future work The upper bound is Future work is surely to proof that the upper bound of this algorithm is order n root n.
optimal pattern form a pattern set If pattern A is a substring of pattern B, the set B is the substring of the set A. “set A” is the set of strings that include pattern A “set B” is the set of strings that include pattern B. Set A Pattern A Pattern B text Set B
Suffix Link Tree a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 a b $ b 1 2 3 4 5 6 7 8 9 10 11 a b $ b b a $ a b b b a $ a b b $ $ b b a b a b a b a b $ b b b a b b a $ b b b $ a b a a $ $ $
Suffix Link Tree 1. Assign a number to each nodes a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 20 a b $ 14 11 19 b b a 18 $ 17 a 1. Assign a number to each nodes in post order numbering 13 b b b a 10 $ a 12 b b $ $ 16 9 b b a b 15 8 a b a b b $ a b 7 b b a b b a 6 $ b b b $ a b 5 a a $ 4 $ $ 3 2 1
Suffix Link Tree a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11
Suffix Link Tree 1. Assign a number to each nodes a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 20 14 11 19 18 17 1. Assign a number to each nodes in post order numbering 13 10 12 16 9 15 8 7 6 5 4 3 2 1
Suffix Tree with Suffix Links Graph a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 a b $ b b a $ a b b b a $ a b b $ $ b b a b a b a b a b $ b b b a b b a $ b b b $ a b a a $ $ $
1. assign a number to each nodes Suffix Tree Graph a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 20 a b $ 14 11 19 b b a 18 $ 17 a 1. assign a number to each nodes in post order 13 b b b a 10 $ a 12 b b $ $ 16 9 b b a b 15 8 a b a b a b $ b 7 b b a b b a 6 $ b b b $ a b 5 a a $ 4 $ $ 3 2 1
1. Assign a number to each nodes Suffix Tree Graph a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 20 a b $ 14 11 19 b b a 18 $ 17 a 1. Assign a number to each nodes in post order 13 b b b a 10 $ a 12 b b $ $ 16 9 b b a b 15 8 a b a b a b $ b 7 b b a b b a 6 $ b b b $ a b 5 a a $ 4 $ $ 3 2 1
Suffix Tree Graph a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 a b $ 1 2 3 4 5 6 7 8 9 10 11 a b $ b b a $ a b b b a $ a b b $ $ b b a b a b a b $ a b b b b a b b a $ b b b $ a b a a $ $ $
Suffix Tree Graph a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 20 a b 1 2 3 4 5 6 7 8 9 10 11 20 a b $ 14 11 19 b b a 18 $ 17 a 13 b b b a 10 $ a 12 b b $ $ 16 9 b b a b 15 8 a b a b b $ a b 7 b b a b b a 6 $ b b b $ a b 5 a a $ 4 $ $ 3 2 1
Suffix Tree Graph a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 20 a b 1 2 3 4 5 6 7 8 9 10 11 20 a b $ 14 11 19 b b a 18 $ 17 a 13 b b b a 10 $ a 12 b b $ $ 16 9 b b a b 15 8 a b a b b $ a b 7 b b a b b a 6 $ b b b $ a b 5 a a $ 4 $ $ 3 2 1
Suffix Tree with Suffix Links a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 20 a b $ 19 11 16 b b a 15 $ 14 a 18 b b b a 10 $ a 17 b b $ $ 13 9 b b a b 12 8 a b a b b $ a b 7 b b a b b a 6 $ b b b $ a b 5 a a $ 4 $ $ 3 2 1
Suffix Tree with Suffix Links a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 20 a b $ 19 11 16 b b a 15 $ 14 a 18 b b b a 10 $ a 17 b b $ $ 13 9 b b a b 12 8 a b a b b $ a b 7 b b a b b a 6 $ b b b $ a b 5 a a $ 4 $ $ 3 2 1
Agrawal a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 1011 (10,11) (10,11) (10,11) (10,11) 20 (10,11) (10,11) a (10,11) (10,11) $ b (10,11) (10,11) (10,11) (1,11) 19 11 16 (10,11) b a b (10,11) $ (1,10) (10,11) 15 10 18 14 (10,11) a (10,11) b b b $ (10,11) (1,9) 17 b a $ $ 13 9 12 8 a $ 7 (1,8) $ a b b b a b b a (1,6) b a $ $ a b b b $ a b b b a 6 (1,7) $ a b b b a 5 (1,5) 4 (1,3) 3 (1,4) (1,2) 2 (1,1) 1
Motivation
I’m going to answer these questions. FAQ What is the Suffix Tree Graph? I’m going to answer these questions.
Main discussion Background Outline distinguish the sets of strings More skillful pattern Substring and Suffix Tree Problem establishment Suffix Tree Graph (ST-Graph) How to use ST-Graph The complexity of the algorithm Main discussion Background
Outline Technical term Pattern discovery problem (background) Motivation Problem establishment Suffix Tree Graph Labeling algorithm Complexity of the algorithm Future works
Technical term (1/2) Substring For any string s ∈ ∑* Suffix Tree s = uvw u ; prefix v ; substring w ; suffix substring suffix prefix Substring Suffix Tree
Technical term (2/2) Substring 例:nonno$ Suffix Tree 3 6 no$ $ 4 1 n 5 6 $ n o no$ nno$ 例:nonno$ Substring Suffix Tree
Pattern Discovery problem Find a pattern string that occurs in all strings of A and in no strings of B. A B AKEBONO MUSASHIMARU CONTRIBUTIONS OF AI BEYOND MESSY LEARNING BASED ON LOCAL SEARCH ALGORITHMS BOOLEAN CLASSIFICATION SYMBOLIC TRANSFORMATION BACON SANDWICH PUBLICATION OF DISSERTATION WAKANOHANA TAKANOHANA CONTRIBUTIONS OF UN TRADITIONAL APPROACHES GENETIC ALGORITHMS PROBABILISTIC RULE NUMERIC TRANSFORMATION PLAIN OMELETTE TOY EXAMPLES
Pattern Discovery problem More skillful pattern
Motivation To build a pruning algorithm of pattern discovery algorithm
Problem establishment Input : Output :
Theory of graphs No algorithm calculates reachability on Suffix Tree Graph in theory of graphs So we have to build new algorithm Our strategy is to label on nodes
Labeling algorithm
Suffix Tree Graph 例:nonno$ 3 6 no$ $ 4 $ 1 n o nno$ o nno$ 2 $ 5 (1,3) (1,6) 例:nonno$ 3 6 no$ Preprocessing time is proportional to the number of labels. So time complexity is (1,4) $ (1,4) (7,7) (9,9) 4 (1,1) $ 1 n o nno$ (1,4) (7,7) o nno$ Naive algorithm 2 (1,2) (1,5) (7,8) $ 5 (1,5)
bi bi bk bk a b b b b b b a b b b b b b … … … … $ b a b b b b b b a b b b b b … … … … b … … … b bi a b b b b b b a b b b … … … b a b b b b b b a b b … … … b bk … … b a b b b b b b a … …
bi bi bk bk a b b b b b b a b b b b b b … … … … $ b a b b b b b b a b b b b b … … … … b … … … b a b b b b b b a b b b b … … … b bi a b b b b b b a b b b … … … b a b b b b b b a b b … … … b bk … … b a b b b b b b a b … … b a b b b b b b a … …
bi bi bk bk a b b b b b b b b a b b b b b b b b … … … … $ b a b b b b b b b b a b b b b b b b … … … … b … … … b a b b b b b b b b a b b b b b … … … b bi a b b b b b b b b a b b b b … … … b a b b b b b b b b a b b b … … … b bk … … b a b b b b b b b b a b … … b a b b b b b b b b a … …
Summary of the background ○(正例) ×(負例)
Problem establishment
Suffix Tree Graph (ST-Graph)
How to use ST-Graph
Labeling algorithm
The complexity of the algorithm
Suffix Tree with Suffix Links a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 a b $ b b a $ a b b b a $ a b b $ $ b b a b a b a b $ a b b b b a b b a $ b b b $ a b a a $ $ $
Suffix Tree with Suffix Links Graph a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 a b $ b b a $ a b b b a $ a b b $ $ b b a b a b a b a b $ b b b a b b a $ b b b $ a b a a $ $ $
Every nodes have a substring. Suffix Tree a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 a b $ b b a ba ba $ a Every nodes have a substring. b abb abb b b a $ a b babb abb ba b $ $ b b a b a b a b b $ a b b b a b b a $ b b b $ a b a a $ $ $
Suffix Link Tree abb b babb a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 a b $ b b a $ a b b abb b b a $ a b babb b $ $ b b a b a b a b b $ a b b b a b b a $ b b b $ a b a a $ $ $
Every nodes have a substring. Suffix Tree a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 a b $ b b a ba ba $ a Every nodes have a substring. For example… b b b a $ a b ba babb babb b $ $ b b a b a b a b b $ a b b b a babbba$ b ba babb b a $ b b b $ a b a a $ $ $
Suffix Link Tree a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11
Suffix Link Tree a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11
Suffix Link Tree a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11
Suffix Link Tree a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11
bi bi bk bk … … … … a b b b b b b b b a b b b b b b b b $ b … … … … b a b b b b b b b b a b b b b b b b … … … b … … … a b b b b b b b b a b b b b b b bi … … … a b b b b b b b b a b b b b b … … … b a b b b b b b b b a b b b bk … … b … … a b b b b b b b b a b b … … a b b b b b b b b a
Suffix Link Tree a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 (1,20) 1 2 3 4 5 6 7 8 9 10 11 (1,20) 20 (1,10) (1,9) a b (12,14) (12,13) $ (15,16) (15,19) (1,11) (18,18) (1,6) 14 11 19 (15,16) b (1,8) b (18,18) (12,12) a (15,17) (1,9) 18 $ 17 (12,13) a (15,15) 13 (1,8) b (1,10) b (12,12) b a 10 $ a 12 (1,6) b b (15,16) $ $ 16 9 b b a b 15 (1,9) 8 a (1,5) b a b (15,15) a b $ b 7 (1,8) b (1,6) b a b b a 6 $ (1,7) b b b $ a b 5 a a (1,5) $ 4 $ (1,3) $ 3 (1,4) (1,2) 2 (1,1) 1
Suffix Link Tree a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 (1,20) 1 2 3 4 5 6 7 8 9 10 11 (1,20) 20 (1,10) (1,9) a b (15,16) (12,13) $ (18,18) (15,19) (1,11) (12,14) (1,6) 14 11 19 (15,16) b (1,8) b (18,18) (12,12) a (15,17) (1,9) 18 $ 17 (15,15) a (12,13) 13 (1,8) b (1,10) b (12,12) b a 10 $ a 12 (1,6) b b (15,16) $ $ 16 9 b b a b 15 (1,9) 8 a (1,5) b a b (15,15) a b $ b 7 (1,8) b (1,6) b a b b a 6 $ (1,7) b b b $ a b 5 a a (1,5) $ 4 $ (1,3) $ 3 (1,4) (1,2) 2 (1,1) 1
Suffix Link Tree 1. Assign a number to each nodes a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 20 14 11 19 18 17 1. Assign a number to each nodes in post order numbering 13 10 12 16 9 15 8 7 6 5 4 3 2 1
Suffix Link Tree a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 (1,20) 1 2 3 4 5 6 7 8 9 10 11 (1,20) 20 (1,11) (15,19) (12,14) 14 11 19 (18,18) 18 17 (15,17) (12,13) 13 (1,10) 10 12 16 9 (12,12) (15,16) 15 (1,9) 8 (15,15) 7 (1,8) (1,6) 6 (1,7) 5 (1,5) 4 (1,3) 3 (1,4) (1,2) 2 (1,1) 1
Suffix Link Tree a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 (1,20) 1 2 3 4 5 6 7 8 9 10 11 (1,20) 20 (1,10) (1,9) a b (15,16) (12,13) $ (18,18) (15,19) (1,11) (12,14) (1,6) 14 11 19 (15,16) b (1,8) b (18,18) (12,12) a (15,17) (1,9) 18 $ 17 (15,15) a (12,13) 13 (1,8) b (1,10) b (12,12) b a 10 $ a 12 (1,6) b b (15,16) $ $ 16 9 b b a b 15 (1,9) 8 a (1,5) b a b (15,15) a b $ b 7 (1,8) b (1,6) b a b b a 6 $ (1,7) b b b $ a b 5 a a (1,5) $ 4 $ (1,3) $ 3 (1,4) (1,2) 2 (1,1) 1
Suffix Tree of Xi bi bi bk bk a b b b b b b b b a b b b b b b b b … … $ b a b b b b b b b b a b b b b b b b … … … … b a … … … b a b b b b b b b b a b b b b b … … … b bi a b b b b b b b b a b b b b … … … b a b b b b b b b b a b b b … … … b bk … … b a b b b b b b b b a b … … b a b b b b b b b b a Suffix Tree of Xi … …
Suffix Link Tree Assign an interval label to each node. a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 (1,20) 20 (1,11) (15,19) (12,14) 14 11 19 (18,18) 18 17 (15,17) (12,13) Assign an interval label to each node. One is the number of the node, the other is the minimum number in the subtree. 13 (1,10) 10 12 16 9 (12,12) (15,16) 15 (1,9) 8 (15,15) 7 (1,8) (1,6) 6 (1,7) 5 (1,5) 4 (1,3) 3 (1,4) (1,2) 2 (1,1) 1
Suffix Link Tree Assign an interval label to each node. a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 (1,20) 20 (1,11) (15,19) (12,14) 14 11 19 (18,18) 18 17 (15,17) (12,13) Assign an interval label to each node. One is the number of the node, the other is the minimum number in the subtree. 13 (1,10) 10 12 16 9 (12,12) (15,16) 15 (1,9) 8 (15,15) 7 (1,8) (1,6) 6 (1,7) 5 (1,5) 4 (1,3) 3 (1,4) (1,2) 2 (1,1) 1
Suffix Link Tree a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 (1,20) 1 2 3 4 5 6 7 8 9 10 11 (1,20) 20 (1,11) (15,19) (12,14) 14 11 19 (18,18) 18 17 (15,17) (12,13) 13 (1,10) 10 12 16 9 (12,12) (15,16) 15 (1,9) 8 (15,15) 7 (1,8) (1,6) 6 (1,7) 5 (1,5) 4 (1,3) 3 (1,4) (1,2) 2 (1,1) 1
Suffix Link Tree a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 (1,20) 1 2 3 4 5 6 7 8 9 10 11 (1,20) 20 (1,10) (1,9) a b (12,14) (12,13) $ (15,16) (15,19) (1,11) (18,18) (1,6) 14 11 19 (15,16) b (1,8) b (18,18) (12,12) a (15,17) (1,9) 18 $ 17 (12,13) a (15,15) 13 (1,8) b (1,10) b (12,12) b a 10 $ a 12 (1,6) b b (15,16) $ $ 16 9 b b a b 15 (1,9) 8 a (1,5) b a b (15,15) a b $ b 7 (1,8) b (1,6) b a b b a 6 $ (1,7) b b b $ a b 5 a a (1,5) $ 4 $ (1,3) $ 3 (1,4) (1,2) 2 (1,1) 1
Suffix Link Tree a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 (1,20) 1 2 3 4 5 6 7 8 9 10 11 (1,20) (1,10) 20 (1,9) (15,16) a b (12,13) (18,18) $ (15,19) (1,6) (1,11) (12,14) 14 11 19 (1,8) (15,16) b b (12,12) (18,18) a (1,9) (15,17) 18 $ (15,15) 17 a (12,13) (1,8) 13 b (1,10) b (12,12) b a (1,6) 10 $ a 12 b b (15,16) $ $ 16 9 b b a b (1,5) 15 (1,9) 8 a b a b (15,15) a b $ b 7 (1,8) b (1,6) b a b b a 6 $ (1,7) b b b $ a b 5 a a (1,5) $ 4 $ (1,3) $ 3 (1,4) (1,2) 2 (1,1) 1
Suffix Tree of Xi bi bi bk bk a b b b b b b b b a b b b b b b b b … … $ b a b b b b b b b b a b b b b b b b … … … … b … … … b a b b b b b b b b a b b b b b … … … b bi a b b b b b b b b a b b b b … … … b a b b b b b b b b a b b b … … … b bk … … b a b b b b b b b b a b … … b a b b b b b b b b a Suffix Tree of Xi … …
General case “Suffix Tree Graph” This graph is a DAG. a b a b b a b b b a $ 1 2 3 4 5 6 7 8 9 10 11 a b $ b b a $ a b b b a $ a b b $ $ b b a b a b a b a b $ b b b a b b a $ b b b This graph is a DAG. The name is “Suffix Tree Graph” $ a b a a $ $ $ This graph is a DAG.
General case T = a b a b b a b b b a $ a b $ b b a $ a b b b a $ a b b