Download presentation
Presentation is loading. Please wait.
1
Succinct Data Structures
Kunihiko Sadakane National Institute of Informatics
2
Suffix Trees [1,2] ababac$ 1234567 Edge labels Depths of nodes
$ a c b a 7 1 6 Edge labels Depths of nodes Leaf indexes Pointers to children Suffix link String T c b a 2 3 b c 5 b c 2 4 1 3 ababac$
3
Operations on Suffix Trees
root(): returns the root node isleaf(v): returns Yes if v is a leaf child(v,c): returns a child w of v (edge label from v to w begins with a letter c) firstchild(v): returns the first child of v sibling(v): returns the immediate sibling of v parent(v): returns the parent of v
4
edge(v,d): returns d-th letter of label of edge to v
depth(v): returns the string depth of v lca(v,w): returns lca between v, w sl(v): returns the node pointed by suffix link of v $ a b c 7 1 3 5 2 4 6
5
Components of Suffix Trees [3]
String: n lg |A| bits Tree structure: O(n lg n) bits String depths of nodes: n lg n bits Edge labels: n lg n bits Suffix link: n lg n bits
6
Representation of Tree Structure
Represent the tree by BP sequence Internal nodes: (...) n-1 Leaves:() n At most 4n+o(n) bits Nodes are represented by positions of ( 1 3 5 2 7 4 6 7 1 3 5 2 4 6 (()((()())())(()())())
7
Representation of Nodes
v: position of ( in the BP sequence j: preorder of node j = rank((P,v) v = select((P,j) i: inorder of node preorder 1 3 8 4 2 5 6 7 9 10 11 1 2 3 4 5 6 7 8 9 10 11 (()((()())())(()())())
8
Inorder of Nodes Defined for only internal nodes
Number of internal nodes visited from below during DFS traversal from the root to v An internal node may have more than one inorder (A node with degree k has exactly k1 inorders) 146 x 3 x 5 2 x x x x x
9
Computation of inorder
v and its smallest inorder i are converted each other in constant time i = rank()(P,findclose(P,v+1)) v = enclose(P,select)( (P,i)+1) 146 3 5 2 x 1 7 3 2 1 3 5 5 2 4 6 (()((()())())(()())()) v
10
Proof: i = rank()(P,findclose(P,v+1))
v+1 is the first child w of v. u = findclose(P,v+1) is the last position of the subtree rooted at w. inorder is defined once on a path from a leaf to the next leaf. There is one-to-one correspondence between leaves and inorders. Value of inorder is number of leaves on the tour from root to v. Thus, i = rank()(P,u) 146 3 5 2 x v w v w u (()((()())())(()())())
11
Proof: v = enclose(P,select)( (P,i)+1)
i is the number of times that during the DFS traversal a node w is visited from below and a child of w is visited next. This action is represented by “)(” on P. x = select)( (P,i)+1 represents a child of v. Its parent is the answer. 146 3 5 2 x v v x (()((()())())(()())())
12
String Depths of Nodes ababac$ $ a b c 7 1 3 5 2 4 6 1 2 3
$ a b c 7 1 3 5 2 4 6 1 2 3 ababac$ Hgt String depths are represented by the lengths of common prefixes between two adjacent leaves. Hgt array represents it.
13
Hgt Array Hgt[i]= lcp(SA[i], SA[i+1]) Size: n log n bits 0 7 $
3 1 ababac$ 1 3 abac$ 0 5 ac$ 2 2 babac$ 0 4 bac$ 0 6 c$ SA Hgt
14
Hgt[i] is equal to the string depth of node with inorder i
2 3 5 $ a b c 7 1 4 6 Hgt 0 (()((()())())(()())()) One-to-one correspondence between internal nodes and leaves. It can be computed in constant time. i = rank()(findclose(v+1)) depth(v) = Hgt[i]
15
Computation of Edge Labels
Let i be the inorder of node v i-th leaf is a descendant of v i-th leaf represents SA[i] Edge incoming to v is a subsring of SA[i] v parent(v) SA[i] d1 d2 b a c d Edge length = d2 d1
16
Computation of Hgt Array
Given i and SA[i], Hgt[i] is computed in constant time using an index of 2n +o(n) bits
17
Permuting Hgt Array Values of SA+Hgt become increasing if they are
Hgt[i]= lcp(SA[i], SA[i+1]) Hgt SA SA+Hgt Values of SA+Hgt become increasing if they are sorted with respect to values of SA SA+Hgt SA n increasing numbers in [1,n] is represented in 2n bits
18
Lemma: Let SA[i]=p, SA[j]=p+1. Then Hgt[j] Hgt[i] 1
d p ababac$ q abac$ d-1 p+1 babac$ q+1 bac$ SA Hgt i j d p ababac$ q abac$ d-1 p+1 babac$ bab.. q+1 bac$ SA Hgt i j Hgt[SA-1[p+1]] Hgt[SA-1[p]]-1
19
Hgt[SA-1[k]]+k (k = 1,2,...,n) are monotone increasing
and in the range [1, n]
20
Computation of Hgt[i] Compute k = SA[i]
constant time using the suffix array O(log n) time using the compressed suffix array (0<<2) Decode the k-th element v in the monotone sequence constant time by select Hgt[i] = v - k
21
Computation of lca lca = lowest common ancestor u = lca(v,w)
Constant time v w u
22
Let E[i] = rank((P,i) rank)(P,i). Then u = parent(RMQE(v,w)+1)
m = RMQE(v,w): the index of minimum value in E[v..w] u 146 3 5 2 7 1 4 6 w 1 7 3 2 1 3 5 5 2 4 6 v P (()((()())())(()())()) E u v m w
23
Representing Suffix links
c sl(v) b 2 5 3 6 x y x’ y’ v w sl(node(c)) = node() Use the function of the compressed suffix array
24
Proof: Leaves are represented by () and appear in P in lex
Proof: Leaves are represented by () and appear in P in lex. orders of suffixes. Therefore x = rank()(P,v1)+1 is the smallest suffix in lex. order among descendant leaves of v y = rank()(P,findclose(P,v)) is the largest suffix in lex. order among descendant leaves of v x, y represent T[SA[x]..n], T[SA[y]..n]. x’, y’ represent T[SA[x]+1..n], T[SA[y]+1..n].
25
x is the leftmost leaf, y is the rightmost leaf
Let l = lcp(x,y). Then l is identical to the string depth of v It holds lcp(x’,y’) = l1 lca(x’,y’) represents a string one shorter than v. That is, sl(v). v y x SA[y] SA[x]
26
Going to a Child Node w = child(v,c): a child w of v with edge label starting with letter c By enumerating children of v enumerate a child u by firstchild and sibling find u such that edge(u,1) = c By binary search on children of v use the operation to find i-th child of v By binary search on SA find lex. orders l, r of leftmost/rightmost leaves of v binary search on SA[l..r] according to (d +1)-th letter of suffixes (d = depth(v))
27
Data Structure of Compressed Suffix Trees
It consists of the following components Compressed Suffix Arrays: |CSA| BP sequence of the tree: 4n+o(n) bits Hgt array: 2n+o(n) bits The size of the compressed suffix tree is |CSA|+6n+o(n) bits
28
Time Complexities of Operations
root, isleaf, firstchild, sibling, parent, lca: O(1) depth, edge: O(tSA) time sl: O(t) time child: O(tSA log |A|) time tSA: time to compute SA[i] t: time to compute [i]
29
References [1] P. Weiner. Linear Pattern Matching Algorithms. In Proceedings of the 14th IEEE Symposium on Switching and Automata Theory, pages 1–11, 1973. [2] E. M. McCreight. A Space-economical Suffix Tree Construction Algorithm. Journal of the ACM, 23(12):262–272, 1976. [3] Kunihiko Sadakane: Compressed Suffix Trees with Full Functionality. Theory Comput. Syst. 41(4): (2007)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.