Succinct Data Structures

Slides:



Advertisements
Similar presentations
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
Advertisements

Constant-Time LCA Retrieval
Binary Trees, Binary Search Trees COMP171 Fall 2006.
Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.
Lowest Common Ancestors Two vertices (u, v) Lowest common ancestors, lca (u, v) Example lca (5, 6) = 4 lca (3, 7) = 2 lca (7, 8) = 1 l(v):
Chapter 4: Trees General Tree Concepts Binary Trees Lydia Sinapova, Simpson College Mark Allen Weiss: Data Structures and Algorithm Analysis in Java.
Lists A list is a finite, ordered sequence of data items. Two Implementations –Arrays –Linked Lists.
Binary Trees A binary tree is made up of a finite set of nodes that is either empty or consists of a node called the root together with two binary trees,
© 2006 Pearson Addison-Wesley. All rights reserved11 A-1 Chapter 11 Trees.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
CS Data Structures Chapter 5 Trees. Chapter 5 Trees: Outline  Introduction  Representation Of Trees  Binary Trees  Binary Tree Traversals 
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Section 10.1 Introduction to Trees These class notes are based on material from our textbook, Discrete Mathematics and Its Applications, 6 th ed., by Kenneth.
CISC220 Fall 2009 James Atlas Lecture 13: Trees. Skip Lists.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Constant-Time LCA Retrieval Presentation by Danny Hermelin, String Matching Algorithms Seminar, Haifa University.
Trees, Binary Trees, and Binary Search Trees COMP171.
Starting at Binary Trees
Trees  Linear access time of linked lists is prohibitive Does there exist any simple data structure for which the running time of most operations (search,
Suffix trees. Trie A tree representing a set of strings. a b c e e f d b f e g { aeef ad bbfe bbfg c }
Trees 2: Section 4.2 and 4.3 Binary trees. Binary Trees Definition: A binary tree is a rooted tree in which no vertex has more than two children
Tree Traversals, TreeSort 20 February Expression Tree Leaves are operands Interior nodes are operators A binary tree to represent (A - B) + C.
Chapter 4: Trees Part I: General Tree Concepts Mark Allen Weiss: Data Structures and Algorithm Analysis in Java.
Trees By P.Naga Srinivasu M.tech,(MBA). Basic Tree Concepts A tree consists of finite set of elements, called nodes, and a finite set of directed lines.
M180: Data Structures & Algorithms in Java Trees & Binary Trees Arab Open University 1.
Rooted Tree a b d ef i j g h c k root parent node (self) child descendent leaf (no children) e, i, k, g, h are leaves internal node (not a leaf) sibling.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
1 Trees. 2 Trees Trees. Binary Trees Tree Traversal.
Prof. Amr Goneid, AUC1 CSCE 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 4. Trees.
Binary Trees.
Succinct Data Structures
Non Linear Data Structure
CSCE 210 Data Structures and Algorithms
Lecture 1 (UNIT -4) TREE SUNIL KUMAR CIT-UPES.
Succinct Data Structures
Binary Trees.
Tree.
Succinct Data Structures
Succinct Data Structures
Chapter 5 : Trees.
Discrete Methods in Mathematical Informatics
MCS680: Foundations Of Computer Science
Source Code for Data Structures and Algorithm Analysis in C (Second Edition) – by Weiss
Lecture Trees Chapter 9 of textbook 1. Concepts of trees
Data Structures Review Session 2
CMSC 341 Introduction to Trees.
Section 8.1 Trees.
Lecture 18. Basics and types of Trees
Binary Trees, Binary Search Trees
TREES General trees Binary trees Binary search trees AVL trees
CS223 Advanced Data Structures and Algorithms
Binary Trees.
Trees and Binary Trees.
Trees.
Binary Trees.
Discrete Methods in Mathematical Informatics
Trees Definitions Implementation Traversals K-ary Trees
Suffix trees.
String Data Structures and Algorithms
String Data Structures and Algorithms
Binary Trees, Binary Search Trees
CE 221 Data Structures and Algorithms
Trees.
Binary Trees.
Chapter 20: Binary Trees.
Binary Trees.
General Tree Concepts Binary Trees
Binary Trees, Binary Search Trees
Presentation transcript:

Succinct Data Structures Kunihiko Sadakane National Institute of Informatics

Suffix Trees [1,2] ababac$ 1234567 Edge labels Depths of nodes $ a c b a 7 1 6 Edge labels Depths of nodes Leaf indexes Pointers to children Suffix link String T c b a 2 3 b c 5 b c 2 4 1 3 1234567 ababac$

Operations on Suffix Trees root(): returns the root node isleaf(v): returns Yes if v is a leaf child(v,c): returns a child w of v (edge label from v to w begins with a letter c) firstchild(v): returns the first child of v sibling(v): returns the immediate sibling of v parent(v): returns the parent of v

edge(v,d): returns d-th letter of label of edge to v depth(v): returns the string depth of v lca(v,w): returns lca between v, w sl(v): returns the node pointed by suffix link of v $ a b c 7 1 3 5 2 4 6

Components of Suffix Trees [3] String: n lg |A| bits Tree structure: O(n lg n) bits String depths of nodes: n lg n bits Edge labels: n lg n bits Suffix link: n lg n bits

Representation of Tree Structure Represent the tree by BP sequence Internal nodes: (...) n-1 Leaves:() n At most 4n+o(n) bits Nodes are represented by positions of ( 1 3 5 2 7 4 6 7 1 3 5 2 4 6 (()((()())())(()())())

Representation of Nodes v: position of ( in the BP sequence j: preorder of node j = rank((P,v) v = select((P,j) i: inorder of node preorder 1 3 8 4 2 5 6 7 9 10 11 1 2 3 4 5 6 7 8 9 10 11 (()((()())())(()())())

Inorder of Nodes Defined for only internal nodes Number of internal nodes visited from below during DFS traversal from the root to v An internal node may have more than one inorder (A node with degree k has exactly k1 inorders) 146 x 3 x 5 2 x x x x x

Computation of inorder v and its smallest inorder i are converted each other in constant time i = rank()(P,findclose(P,v+1)) v = enclose(P,select)( (P,i)+1) 146 3 5 2 x 1 7 3 2 1 3 5 5 2 4 6 (()((()())())(()())()) v

Proof: i = rank()(P,findclose(P,v+1)) v+1 is the first child w of v. u = findclose(P,v+1) is the last position of the subtree rooted at w. inorder is defined once on a path from a leaf to the next leaf. There is one-to-one correspondence between leaves and inorders. Value of inorder is number of leaves on the tour from root to v. Thus, i = rank()(P,u) 146 3 5 2 x v w v w u (()((()())())(()())())

Proof: v = enclose(P,select)( (P,i)+1) i is the number of times that during the DFS traversal a node w is visited from below and a child of w is visited next. This action is represented by “)(” on P. x = select)( (P,i)+1 represents a child of v. Its parent is the answer. 146 3 5 2 x v v x (()((()())())(()())())

String Depths of Nodes ababac$ $ a b c 7 1 3 5 2 4 6 1 2 3 $ a b c 7 1 3 5 2 4 6 1 2 3 ababac$ Hgt 0 3 1 0 2 0 0 String depths are represented by the lengths of common prefixes between two adjacent leaves. Hgt array represents it.

Hgt Array Hgt[i]= lcp(SA[i], SA[i+1]) Size: n log n bits 0 7 $ 3 1 ababac$ 1 3 abac$ 0 5 ac$ 2 2 babac$ 0 4 bac$ 0 6 c$ SA Hgt

Hgt[i] is equal to the string depth of node with inorder i 2 3 5 $ a b c 7 1 4 6 Hgt 0 3 1 0 2 0    0 (()((()())())(()())()) One-to-one correspondence between internal nodes and leaves. It can be computed in constant time. i = rank()(findclose(v+1)) depth(v) = Hgt[i]

Computation of Edge Labels Let i be the inorder of node v i-th leaf is a descendant of v i-th leaf represents SA[i] Edge incoming to v is a subsring of SA[i] v parent(v) SA[i] d1 d2 b a c d Edge length = d2  d1

Computation of Hgt Array Given i and SA[i], Hgt[i] is computed in constant time using an index of 2n +o(n) bits

Permuting Hgt Array Values of SA+Hgt become increasing if they are Hgt[i]= lcp(SA[i], SA[i+1]) 0 3 1 0 2 0 0 Hgt SA 7 1 3 5 2 4 6 7 4 4 5 4 4 6 SA+Hgt Values of SA+Hgt become increasing if they are sorted with respect to values of SA 4 4 4 4 5 6 7 SA+Hgt SA 1 2 3 4 5 6 7 n increasing numbers in [1,n] is represented in 2n bits 00001 1 1 1 01 01 01

Lemma: Let SA[i]=p, SA[j]=p+1. Then Hgt[j]  Hgt[i]  1 d p ababac$ q abac$ d-1 p+1 babac$ q+1 bac$ SA Hgt i j d p ababac$ q abac$ d-1 p+1 babac$ bab.. q+1 bac$ SA Hgt i j Hgt[SA-1[p+1]]  Hgt[SA-1[p]]-1

Hgt[SA-1[k]]+k (k = 1,2,...,n) are monotone increasing and in the range [1, n]

Computation of Hgt[i] Compute k = SA[i] constant time using the suffix array O(log n) time using the compressed suffix array (0<<2) Decode the k-th element v in the monotone sequence constant time by select Hgt[i] = v - k

Computation of lca lca = lowest common ancestor u = lca(v,w) Constant time v w u

Let E[i] = rank((P,i)  rank)(P,i). Then u = parent(RMQE(v,w)+1) m = RMQE(v,w): the index of minimum value in E[v..w] u 146 3 5 2 7 1 4 6 w 1 7 3 2 1 3 5 5 2 4 6 v P (()((()())())(()())()) 1212343432321232321210 E u v m w

Representing Suffix links  c sl(v) b 2 5 3 6  x y x’ y’ v w sl(node(c)) = node() Use the  function of the compressed suffix array

Proof: Leaves are represented by () and appear in P in lex Proof: Leaves are represented by () and appear in P in lex. orders of suffixes. Therefore x = rank()(P,v1)+1 is the smallest suffix in lex. order among descendant leaves of v y = rank()(P,findclose(P,v)) is the largest suffix in lex. order among descendant leaves of v x, y represent T[SA[x]..n], T[SA[y]..n]. x’, y’ represent T[SA[x]+1..n], T[SA[y]+1..n].

x is the leftmost leaf, y is the rightmost leaf Let l = lcp(x,y). Then l is identical to the string depth of v It holds lcp(x’,y’) = l1 lca(x’,y’) represents a string one shorter than v. That is, sl(v). v y x SA[y] SA[x]

Going to a Child Node w = child(v,c): a child w of v with edge label starting with letter c By enumerating children of v enumerate a child u by firstchild and sibling find u such that edge(u,1) = c By binary search on children of v use the operation to find i-th child of v By binary search on SA find lex. orders l, r of leftmost/rightmost leaves of v binary search on SA[l..r] according to (d +1)-th letter of suffixes (d = depth(v))

Data Structure of Compressed Suffix Trees It consists of the following components Compressed Suffix Arrays: |CSA| BP sequence of the tree: 4n+o(n) bits Hgt array: 2n+o(n) bits The size of the compressed suffix tree is |CSA|+6n+o(n) bits

Time Complexities of Operations root, isleaf, firstchild, sibling, parent, lca: O(1) depth, edge: O(tSA) time sl: O(t) time child: O(tSA log |A|) time tSA: time to compute SA[i] t: time to compute [i]

References [1] P. Weiner. Linear Pattern Matching Algorithms. In Proceedings of the 14th IEEE Symposium on Switching and Automata Theory, pages 1–11, 1973. [2] E. M. McCreight. A Space-economical Suffix Tree Construction Algorithm. Journal of the ACM, 23(12):262–272, 1976. [3] Kunihiko Sadakane: Compressed Suffix Trees with Full Functionality. Theory Comput. Syst. 41(4): 589-607 (2007)