Succinct Data Structures

Slides:



Advertisements
Similar presentations
Boosting Textual Compression in Optimal Linear Time.
Advertisements

Two Segments Intersect?
Distance and Routing Labeling Schemes in Graphs
WSPD Applications.
1 CS 201 Compiler Construction Machine Code Generation.
QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005.
5th July 2004CPM A Simple Optimal Representation for Balanced Parentheses Richard Geary, Naila Rahman, Rajeev Raman (University of Leicester, UK)
Succinct Representation of Balanced Parentheses, Static Trees and Planar Graphs J. Ian Munro & Venkatesh Raman.
An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
Multidimensional Data Rtrees Bitmap indexes. R-Trees For “regions” (typically rectangles) but can represent points. Supports NN, “where­am­I” queries.
Constant-Time LCA Retrieval
Succinct Representations of Trees S. Srinivasa Rao Seoul National University.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Minimal Spanning Trees. Spanning Tree Assume you have an undirected graph G = (V,E) Spanning tree of graph G is tree T = (V,E T E, R) –Tree has same set.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Succinct Representations of Trees
Huffman Encoding Veronica Morales.
“On an Algorithm of Zemlyachenko for Subtree Isomorphism” Yefim Dinitz, Alon Itai, Michael Rodeh (1998) Presented by: Masha Igra, Merav Bukra.
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
Data : The Small Forwarding Table(SFT), In general, The small forwarding table is the compressed version of a trie. Since SFT organizes.
Summer School '131 Succinct Data Structures Ian Munro.
The LCA Problem Revisited Michael A.Bender & Martin Farach-Colton Presented by: Dvir Halevi.
Binary Trees. Binary Tree Finite (possibly empty) collection of elements A nonempty binary tree has a root element The remaining elements (if any) are.
The LCA Problem Revisited
Constant-Time LCA Retrieval Presentation by Danny Hermelin, String Matching Algorithms Seminar, Haifa University.
 Rooted tree and binary tree  Theorem 5.19: A full binary tree with t leaves contains i=t-1 internal vertices.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Suffix trees. Trie A tree representing a set of strings. a b c e e f d b f e g { aeef ad bbfe bbfg c }
Succinct Ordinal Trees Based on Tree Covering Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT University of Copenhagen.
ENHANCED EXTRACTION FROM HUFFMAN ENCODED FILES Shmuel T. Klein Dana Shapira Bar Ilan University Ariel University PSC-AUGUST 2015.
Lecture 9COMPSCI.220.FS.T Lower Bound for Sorting Complexity Each algorithm that sorts by comparing only pairs of elements must use at least 
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
5.6 Prefix codes and optimal tree Definition 31: Codes with this property which the bit string for a letter never occurs as the first part of the bit string.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Succinct Data Structures
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Succinct Data Structures
Succinct Data Structures
Succinct Data Structures
Top 50 Data Structures Interview Questions
Succinct Data Structures
Succinct Data Structures
Chapter 5. Greedy Algorithms
Discrete Methods in Mathematical Informatics
Succinct Data Structures
Reducing the Space Requirement of LZ-index
Proving the Correctness of Huffman’s Algorithm
Ariel Rosenfeld Bar-Ilan Uni.
Data Structures Review Session 2
Orthogonal Range Searching and Kd-Trees
Quick-Sort 11/14/2018 2:17 PM Chapter 4: Sorting    7 9
Minimal Spanning Trees
Minimum Spanning Tree Verification
Quick-Sort 11/19/ :46 AM Chapter 4: Sorting    7 9
Distance and Routing Labeling Schemes in Graphs
Discrete Methods in Mathematical Informatics
Comparative RNA Structural Analysis
Data Compression Section 4.8 of [KT].
Quick-Sort 2/23/2019 1:48 AM Chapter 4: Sorting    7 9
The LCA Problem Revisited
1 Lecture 13 CS2013.
Huffman Coding Greedy Algorithm
Presentation transcript:

Succinct Data Structures Kunihiko Sadakane National Institute of Informatics

BP Representation [3] ((()()())(()())) Each node is represented by a pair of matching open and close parentheses 2n bits for n nodes The size matches the lower bound 2 6 8 1 7 3 5 4 P ((()()())(()())) BP

Data Structure for findclose [4] Divide the parentheses sequence into blocks of length B = ½ log n b(p): block number containing p (p): position of parenthesis matching p parenthesis p is said to be far ⇔ b(p)  b((p)) Far open parenthesis p is said to be opening pioneer ⇔ For the far open parenthesis q which immediately precedes p, b((p))  b((q)) Represent positions of parentheses which match with opening pioneers are represented by 0,1 vector ( ( ) ) ) p (p) (q) q r ( (r)

Lemma: Let  denote the number of blocks Lemma: Let  denote the number of blocks. Then the number of opening pioneers is at most 23. Proof: A graph whose nodes correspond to the blocks and whose edges are (b(p), b((p)) is an outer-planar graph. Opening/closing pioneers form a BP again.  = n/B = 2n/log n ⇒ Length of BP is O(n/log n)

Representing Recursive Structure opening pioneers and their matching parentheses are represented by a 0,1 vector B B is a sparse vector of length 2n with O(n/log n) 1’s Can be represented in O(n log log n/log n) bits ( ( ) ) ) p (p) (q) q r ( (r) P B 0100 0101 0000 0000 0010 1001 P1 ((()))

Let S(n) denote the size of BP representation for an n node tree S(n) = 2n + O(n log log n/log n) + S(O(n/log n)) If the number of nodes becomes O(n/log2 n), a naïve data structure which stores all the answers uses only O(n/log n) bits Therefore S(n) = 2n + O(n log log n/log n)

Algorithm for findclose To compute (p) = findclose(P,p) If p is not far, (p) is computed by a table Find the pioneer p* that immediately precedes p Find (p*) using the BP for pioneers If p is not pioneer, b((p))  b((p*)) The position of (p) is determined from the difference between depths of p and p* p* p (p) (p*) ( ( ) )

enclose Let (p) = enclose(P,p) If b((p)) = b(p), (p) is found from a table If b((p))  b(p), store those positions also store positions of matching parentheses if there are more than one pairs of parentheses, store only the outermost one Recur for extracted parentheses ( ( (()))( ) ) )

Additional Basic Operations on BP rankp(P,i): number of pattern p in P[1..i] selectp(P,i): position of i-th occurrence of p in P If the length of p is constant, rank/select is done in O(1) time 1 1 2 3 4 5 6 7 8 9 10 11 2 3 11 8 P (()((()())())(()())()) 4 7 9 10 rank()(P,10) = 3 5 6

Operations on Leaves [5] Each leaf is represented by()in BP Position of i-th leaf = select()(P, i) Number of leaves in a subtree, leftmost/rightmost leaf in a subtree are also found 1 1 2 3 4 5 6 7 8 9 10 11 2 3 11 8 P (()((()())())(()())()) 4 7 Subtree rooted at 3 9 10 5 6

Node Depths Define excess array E[i] = rank((P,i)  rank)(P,i) depth(v) = E[v] E is not explicitly stored; it can be computed by the rank index on P (()((()())())(()())()) 1212343432321232321210 P E 2 1 3 8 4 5 6 7 9 10 11

Lowest Common Ancestor (lca) lca = lowest common ancestor u = lca(v,w): common ancestor of v and w which is furthest from root Found in O(1) time v w u

(()((()())())(()())()) 1212343432321232321210 u v m w u = parent(RMQE(v,w)+1) E is the excess array, which represents node depths m = RMQE(v,w): the index of a minimum value in E[v..w] (RMQ = Range Minimum Query) u 146 3 5 2 7 1 4 6 w 1 7 3 2 1 3 5 5 2 4 6 v P (()((()())())(()())()) 1212343432321232321210 E u v m w

DFUDS Representation [6] It encodes the degrees of nodes in unary codes in depth-first order (DFUDS = Depth First Unary Degree Sequence) Degree d ⇒ d (’s, followed by a ) Add a dummy ( at the beginning 2n bits 1 2 6 3 4 5 7 8 DFUDS U ((()((())))(())) 1 2 3 4 5 6 7 8

Proof: For n = 1, the root has no children (degree 0). Lemma: The DFUDS of an n node ordered tree forms a balanced parentheses sequence of length 2n. Proof: For n = 1, the root has no children (degree 0). Its DFUDS is (). Assume that for any tree with at most n1 nodes, the lemma holds. Let U1, U2,..., Up denote the DFUDS for p trees. (Summation of numbers of nodes is n1, total length of their DFUDS’s is 2n2) Consider a tree whose root has those trees as its children. The DFUDS U for this tree is Ui whose dummy parenthesis at the head is removed Degree of root = p Head dummy parenthesis

From the assumption of the induction, Ui is balanced. Because the head open parenthesis is removed, it lacks an open parenthesis to be balanced. The head dummy open parenthesis of U and the parentheses sequence for the root node ((p) have p open parentheses unbalanced. Therefore U is balanced. The number of nodes is n and the length of the sequence is 2n. This proves the lemma. Ui whose dummy parenthesis at the head is remove Degree of root = p Head dummy parenthesis