An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile.

Slides:



Advertisements
Similar presentations
Succinct Representations of Dynamic Strings Meng He and J. Ian Munro University of Waterloo.
Advertisements

Succinct Data Structures for Permutations, Functions and Suffix Arrays
 Definition of B+ tree  How to create B+ tree  How to search for record  How to delete and insert a data.
Dynamic Planar Convex Hull Operations in Near- Logarithmic Amortized Time TIMOTHY M. CHAN.
Succinct Representation of Balanced Parentheses, Static Trees and Planar Graphs J. Ian Munro & Venkatesh Raman.
© The McGraw-Hill Companies, Inc., Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
A New Compressed Suffix Tree Supporting Fast Search and its Construction Algorithm Using Optimal Working Space Dong Kyue Kim 1 andHeejin Park 2 1 School.
Succinct Representations of Trees S. Srinivasa Rao Seoul National University.
CSE332: Data Abstractions Lecture 10: More B-Trees Tyler Robison Summer
1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
Rank-Pairing Heaps Bernhard Haeupler, Siddhartha Sen, and Robert Tarjan, ESA
Nick Harvey & Kevin Zatloukal
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
1 A Lempel-Ziv text index on secondary storage Diego Arroyuelo and Gonzalo Navarro Combinatorial Pattern Matching 2007.
The Complexity of Algorithms and the Lower Bounds of Problems
Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
CHAPTER 71 TREE. Binary Tree A binary tree T is a finite set of one or more nodes such that: (a) T is empty or (b) There is a specially designated node.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 6.
Succinct Representations of Trees
Space Efficient Data Structures for Dynamic Orthogonal Range Counting Meng He and J. Ian Munro University of Waterloo.
CS 1031 Tree Traversal Techniques; Heaps Tree Traversal Concept Tree Traversal Techniques: Preorder, Inorder, Postorder Full Trees Almost Complete Trees.
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
AVL Trees Amanuel Lemma CS252 Algoithms Dec. 14, 2000.
Binary Trees. Binary Tree Finite (possibly empty) collection of elements A nonempty binary tree has a root element The remaining elements (if any) are.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Computer Algorithms Submitted by: Rishi Jethwa Suvarna Angal.
Chapter 11 Heap. Overview ● The heap is a special type of binary tree. ● It may be used either as a priority queue or as a tool for sorting.
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz,
Comp 335 File Structures B - Trees. Introduction Simple indexes provided a way to directly access a record in an entry sequenced file thereby decreasing.
Succinct Dynamic Cardinal Trees with Constant Time Operations for Small Alphabet Pooya Davoodi Aarhus University May 24, 2011 S. Srinivasa Rao Seoul National.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
Union Find ADT Data type for disjoint sets: makeSet(x): Given an element x create a singleton set that contains only this element. Return a locator/handle.
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
Succinct Ordinal Trees Based on Tree Covering Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT University of Copenhagen.
+ David Kauchak cs312 Review. + Midterm Will be posted online this afternoon You will have 2 hours to take it watch your time! if you get stuck on a problem,
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
Rooted Tree a b d ef i j g h c k root parent node (self) child descendent leaf (no children) e, i, k, g, h are leaves internal node (not a leaf) sibling.
1 Fat heaps (K & Tarjan 96). 2 Goal Want to achieve the performance of Fibonnaci heaps but on the worst case. Why ? Theoretical curiosity and some applications.
Binary Tree.
Navigation Piles with Applications to Sorting, Priority Queues, and Priority Deques Jyrki Katajainen and Fabio Vitale Department of Computing, University.
CSE 373, Copyright S. Tanimoto, 2001 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Decision Trees DEFINITION: DECISION TREE A decision tree is a tree in which the internal nodes represent actions, the arcs represent outcomes of an action,
Succinct Data Structures
Succinct Data Structures
Succinct Data Structures
Multiway Search Trees Data may not fit into main memory
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Experimental evaluation of Navigation piles
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
Persistent Data Structures (Version Control)
Reducing the Space Requirement of LZ-index
Paolo Ferragina Dipartimento di Informatica, Università di Pisa
Interval Heaps Complete binary tree.
Wednesday, April 18, 2018 Announcements… For Today…
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Discrete Methods in Mathematical Informatics
Paolo Ferragina Dipartimento di Informatica, Università di Pisa
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
Paolo Ferragina Dipartimento di Informatica, Università di Pisa
Lecture 21 Amortized Analysis
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
Presentation transcript:

An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile

Roadmap Succinct data structures  Static tree representations  Dynamic tree representations Our basic dynamic tree representation  Representing blocks  Representing the frontier of blocks  Representing inter-block pointers Solving operations  Basic operations  Specialized operations Discussion

Roadmap Succinct data structures  Static tree representations  Dynamic tree representations Our basic dynamic tree representation  Representing blocks  Representing the frontier of blocks  Representing inter-block pointers Solving operations  Basic operations  Specialized operations Discussion

Succinct data structures In a k-ary tree each node has at most k children, each children labeled with a symbol in the set {1,…, k} (tries) A succinct data structure requires space close to the information-theoretic lower bound There are different k-ary trees with n nodes Therefore, the information-theoretical lower bound is about bits if k is not a constant with respect to n

Succinct data structures We are interested in succinct representation that can be navigated We are interested in operations  parent ( x ): parent of node x  child ( x, i ): ith child of node x  child ( x, a ): child of node x by label a  depth ( x )  degree ( x )  subtree-size ( x )  preorder ( x )  is-ancestor ( x, y ): is node x an ancestor of node y ?  insertions (assume in the leaves)  deletions (just for unary nodes and leaves) The traditional representation of trees requires nlog n bits for (almost) each operation

Succinct tree representations Succinct representations for static trees:  LOUDS [Jacobson, FOCS’89]  Balanced Parentheses [MR, STOC’97]  DFUDS [Benoit et al., Algorithmica 2005]  xbw [Ferragina et al., FOCS’05]  Ultra succinct trees [Jansson et al., SODA’07] These must be rebuilt from scrath upon insertion or deletion of nodes

Succinct tree representations The case of succinct dynamic trees has been studied only for binary trees  Munro, Raman, and Storm [SODA’01] 2n + o(n) bits parent, child in constant time Updates and subtree-size in O(polylog(n)) time  Raman and Rao [ICALP’03] 2n + o(n) bits Parent, child, preorder, and subtree-size in O(1) time Updates in O((loglog n) 1 +  ) amortized ( O(log n loglog n) worst case) k-ary trees: basic navigation in O(k) time (assume k is not a constant)

Dynamic balanced parentheses Chan et al. [TALG 2007] define a dynamic representation for balanced parentheses This can be used to represent a dynamic k-ary tree using O(n) bits of space The time for all operations is related to the number of nodes in the tree rather than to k (O(log n) time) This data structure cannot take advantage when k is asymptotically smaller than n (e.g., k = O(polylog(n))) We look to achieve o(log n) time whenever log k=o(log u)

Motivations This work is motivated by previous works on LZ-indices  Space-efficient construction of LZ-index [AN, ISAAC’05]  Very preliminary representation:  nlog n bits for pointers, child operation and insertions in O(k) worst-case time  LZ-index on disk [AN, CPM’07]  Basic operations in O(1) CPU time, yet  nlog n bits are needed for pointers and does not support insertions nor deletions

Roadmap Succinct data structures  Static tree representations  Dynamic tree representations Our basic dynamic tree representation  Representing blocks  Representing the frontier of blocks  Representing inter-block pointers Solving operations  Basic operations  Specialized operations Discussion

Our basic tree representation We incrementally divide the tree into disjoint blocks [MRS, RR, AN] Every block represents a subtree of N nodes such that N min ≤ N ≤ N max We arrange these blocks in a tree by adding inter-block pointers (entire tree is tree of subtrees)

Our basic tree representation frontier of the block duplicated nodes

Our basic tree representation We define N min (minimum block size) as follows  Inter-block pointers should require o(n) bits  Therefore we define N min =  (log 2 n) (In general, N min =  (log n f(n)), for f(n) =  (1))  In this way we have (worst case) one pointer out of  (log 2 n) nodes  And hence o(n) bits for pointers

Our basic tree representation We define N max (maximum block size) as follows  In case of block overflow we should be able to create a new block of size at least N min from the full block  In the worst case, the root of the block has its k children, all of them having a subtree of the same size  By choosing N max =  (klog 2 n) we solve this problem …

Our basic tree representation The blocks cannot be as small as we would like We support dynamic operations on the tree by:  Dividing the tree into blocks (we only need to rebuild a block upon updates)  Making these smaller trees dynamic (different to other approaches) We represent the blocks using a dynamic DFUDS representation on top of Chan et al.’s [TALG, 2007]  We solve the basic navigation inside blocks in O(log N) = O(log k + loglog n)  Insertions can be also handled in the same time  We require overall 2n+o(n) bits

Roadmap Succinct data structures  Static tree representations  Dynamic tree representations Our basic dynamic tree representation  Representing blocks  Representing the frontier of blocks  Representing inter-block pointers Solving operations  Basic operations  Specialized operations Discussion

Representing the blocks We represent the symbols S p labeling the arcs of the trie with a data structure for rank and select [GN, submitted]  We compute child p (x, a) by rank and select on S p child p (x, i) on p child p (x, a) can be computed in O(log N log k / loglog N) = O((log 2 k + loglog n) / log(logk + log log n)) time The space requirement is nlog k + o(nlog k) bits

Roadmap Succinct data structures  Static tree representations  Dynamic tree representations Our basic dynamic tree representation  Representing blocks  Representing the frontier of blocks  Representing inter-block pointers Solving operations  Basic operations  Specialized operations Discussion

Representing the frontier of a block We need to indicate which nodes in a block have a pointer to a child block This can be done by using a bit vector  However this would require 3n+o(n) bits overall for the tree structure We define array F p storing the preorders of the nodes having a child pointer  Since there are O(n/log 2 n) pointers, this requires o(n) bits

Representing the frontier of a block T p : (((())(()))((()))) Fp:Fp: We must change all the preorders in FP from this position (3) (8) (16) (20) (3) (9) (17) (21) O(log N) time Array Fp is represented in differential form with a data structure for Searchable Partial Sums

Roadmap Succinct data structures  Static tree representations  Dynamic tree representations Our basic dynamic tree representation  Representing blocks  Representing the frontier of blocks  Representing inter-block pointers Solving operations  Basic operations  Specialized operations Discussion

Representing inter-block pointers Pointers to child blocks  We store the pointers to child blocks in array PTRp  Increasingly sorted according to the preorders of the nodes in the frontier Pointers to parent block  In each block p we need a pointer to the representation of the root of p in the parent block  However the position of a node change upon updates  A parent pointer is composed of A pointer to the parent block q If p is the j-th child of q, then we store value j in p

Representing inter-block pointers p,1 p,2 p,3 p,4 T p : (((())(()))((()))) Fp:Fp: PTR p : p

Roadmap Succinct data structures  Static tree representations  Dynamic tree representations Our basic dynamic tree representation  Representing blocks  Representing the frontier of blocks  Representing inter-block pointers Solving operations  Basic operations  Specialized operations Discussion

Solving the basic operations child(x, i):  Look for preorder of x in Fp  If we find it, follow child pointer to block q and apply child q on the root of q  Otherwise, use child p operation  This takes O(log N) = O(log k + loglog n) time child(x,a) is solved in the same way, but using child p (x,a) instead parent(x): if x is the root of block, follow parent pointer to block p. Then apply parent p (x)

Solving the basic operations Insert:  We use the corresponding insertion operation on the block  When a block p becomes full 1. Choose node z in block p 2. Reinsert the nodes in the subtree of z in a new block q (along with the corresponding part in the frontier of p) 3. Delete the subtree of z from p Total cost is O(log k + loglog n) amortized (if we are able to spend time proportional to the size of the subtree of z)  List of candidates subtrees in each block (o(n) bits overall)

Roadmap Succinct data structures  Static tree representations  Dynamic tree representations Our basic dynamic tree representation  Representing blocks  Representing the frontier of blocks  Representing inter-block pointers Solving operations  Basic operations  Specialized operations Discussion

Solving specialized operations We can solve other operations by using this representation  degree(x)  depth(x)  subtree-size(x) x Size p

Solving specialized operations We can solve other operations by using this representation  preorder(x)  is-ancestor(x, y)  lca(x, y)

Conclusions We have defined a representation for dynamic k-ary trees requiring space close to the information-theoretical lower bound We can profit from smaller alphabets  o(log n) time for operations whenever log k = o(log n)  In particular, O(loglog n) time for k=O(polylog(n))  Versus O(log n) time of Chan et al. for any alphabet size We need extra o(nlog k) bits of space

Discussion What happens if we have external pointers to the tree nodes? Can we compress the dynamic DFUDS representation of blocks? (just as in [JSS, SODA’07]) Suffix links in little space? (assuming a suffix-closed trie)