5th July 2004CPM 20041 A Simple Optimal Representation for Balanced Parentheses Richard Geary, Naila Rahman, Rajeev Raman (University of Leicester, UK)

Slides:



Advertisements
Similar presentations
Xiaoming Sun Tsinghua University David Woodruff MIT
Advertisements

Space-Efficient Static Trees and Graphs Guy Jacobson IEEE Symposium on Foundations of Computer Science, 1989 Speaker: 吳展碩.
Succinct Representations of Dynamic Strings Meng He and J. Ian Munro University of Waterloo.
Succinct Data Structures for Permutations, Functions and Suffix Arrays
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part R4. Disjoint Sets.
CS252: Systems Programming Ninghui Li Program Interview Questions.
Counting the bits Analysis of Algorithms Will it run on a larger problem? When will it fail?
S. Sudarshan Based partly on material from Fawzi Emad & Chau-Wen Tseng
Tree Data Structures &Binary Search Tree 1. Trees Data Structures Tree  Nodes  Each node can have 0 or more children  A node can have at most one parent.
1 Parallel Parentheses Matching Plus Some Applications.
Succinct Representation of Balanced Parentheses, Static Trees and Planar Graphs J. Ian Munro & Venkatesh Raman.
An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile.
Author: Nan Hua, Bill Lin, Jun (Jim) Xu, Haiquan (Chuck) Zhao Publisher: ANCS’08 Presenter: Yun-Yan Chang Date:2011/02/23 1.
IP Routing Lookups Scalable High Speed IP Routing Lookups.
A New Compressed Suffix Tree Supporting Fast Search and its Construction Algorithm Using Optimal Working Space Dong Kyue Kim 1 andHeejin Park 2 1 School.
Compressed Compact Suffix Arrays Veli Mäkinen University of Helsinki Gonzalo Navarro University of Chile compact compress.
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz,
1 Data structures for Pattern Matching Suffix trees and suffix arrays are a basic data structure in pattern matching Reported by: Olga Sergeeva, Saint.
Succinct Representations of Trees S. Srinivasa Rao Seoul National University.
Advanced Algorithm Design and Analysis (Lecture 4) SW5 fall 2004 Simonas Šaltenis E1-215b
XML Parsing Using Java APIs AIP Independence project Fall 2010.
IP Address Lookup for Internet Routers Using Balanced Binary Search with Prefix Vector Author: Hyesook Lim, Hyeong-gee Kim, Changhoon Publisher: IEEE TRANSACTIONS.
Approximate Range Searching in the Absolute Error Model Guilherme D. da Fonseca CAPES BEX Advisor: David M. Mount.
1 Assignment 2: (Due at 10:30 a.m on Friday of Week 10) Question 1 (Given in Tutorial 5) Question 2 (Given in Tutorial 7) If you do Question 1 only, you.
Efficient Multidimensional Packet Classification with Fast Updates Author: Yeim-Kuan Chang Publisher: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 4, APRIL.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
1 Basic Text Processing and Indexing. 2 Document Processing Steps Lexical analysis (tokenizing) Stopwords removal Stemming Selection of indexing terms.
An Approach to Generalized Hashing Michael Klipper With Dan Blandford Guy Blelloch.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Compact Representations of Separable Graphs From a paper of the same title submitted to SODA by: Dan Blandford and Guy Blelloch and Ian Kash.
Data Structures Hashing Uri Zwick January 2014.
C o n f i d e n t i a l Developed By Nitendra NextHome Subject Name: Data Structure Using C Title: Overview of Data Structure.
Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,
A In-Memory Compressed XML Representation of Astronomical Data PPARC UK e-Science Postgraduate School ’05 O’Neil Delpratt – PhD Student University of Leicester.
Succinct Representations of Trees
Space Efficient Data Structures for Dynamic Orthogonal Range Counting Meng He and J. Ian Munro University of Waterloo.
Compressed suffix arrays and suffix trees with applications to text indexing and string matching.
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
Summer School '131 Succinct Data Structures Ian Munro.
Heapsort. Heapsort is a comparison-based sorting algorithm, and is part of the selection sort family. Although somewhat slower in practice on most machines.
Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing Prosenjit Bose, Carleton University Meng He, Unversity of Waterloo.
Integer Representations and Counting in the Bit Probe Model M. Zaiur Rahman and J. Ian Munro Cheriton School of Computer Science University of Waterloo.
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz,
Compressed Prefix Sums O’Neil Delpratt Naila Rahman Rajeev Raman.
A Small IP Forwarding Table Using Hashing Yeim-Kuan Chang and Wen-Hsin Cheng Dept. of Computer Science and Information Engineering National Cheng Kung.
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
Compressed Suffix Arrays and Suffix Trees Roberto Grossi, Jeffery Scott Vitter.
Succinct Ordinal Trees Based on Tree Covering Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT University of Copenhagen.
Collections Data structures in Java. OBJECTIVE “ WHEN TO USE WHICH DATA STRUCTURE ” D e b u g.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
Hashing TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA Course: Data Structures Lecturer: Haim Kaplan and Uri Zwick.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Introduction to Algorithm Complexity Bit Sum Problem.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Advanced Data Structures Lecture 8 Mingmin Xie. Agenda Overview Trie Suffix Tree Suffix Array, LCP Construction Applications.
Succinct Data Structures
Generic Trees—Trie, Compressed Trie, Suffix Trie (with Analysi
Succinct Data Structures
Succinct Data Structures
Succinct Data Structures
Succinct Data Structures
Succinct Data Structures
RE-Tree: An Efficient Index Structure for Regular Expressions
Discrete Methods in Mathematical Informatics
Index construction: Compression of postings
A Small and Fast IP Forwarding Table Using Hashing
Succinct Data Structures
Presentation transcript:

5th July 2004CPM A Simple Optimal Representation for Balanced Parentheses Richard Geary, Naila Rahman, Rajeev Raman (University of Leicester, UK) and Venkatesh Raman (Institute for Mathematical Sciences, Chennai, India)

5th July 2004CPM A Parentheses Data Structure Given: Balanced string of 2n parentheses. ( ( ( ( ) ) ) ( ) ( ) ) Support operations: –ENCLOSE ( i ) –FINDCLOSE ( i ), FINDOPEN( i ) –EXCESS ( i ) Applications to suffix tree, ordinal trees and stack- sortable permutations.

5th July 2004CPM Parentheses Representation 2n bits, O(n) time. Θ(n lg n ) bits, O(1) time. O(n) bits, O(1) time. [Jacobson, `89] 2n+o(n) bits, O(1) time. [Munro, Raman, `01] 2n+o(n) bits, O(1) time. New data structure. Our new DS – is simpler (no perfect hash tables), – smaller o(n) term, – uniform o(n) time and space construction algorithm. Implemented and shown to be quite practical – far more compact than D/S using naïve representation, – speed comparable to D/S using naïve representation.

5th July 2004CPM XML XML: eXtensible Markup Language –de facto standard for electronic data interchange. Document Object Model (DOM) standard API for manipulating XML documents –holds all data in memory, –large memory usage.

5th July 2004CPM Example XML document Bill Bloggs 1 April 1961 DOM NODE interface has methods PARENT(x), NEXTSIB(x), PREVSIB(x), LASTCHILD(x),FIRSTCHILD(x) person name firstname surname day month year dob

5th July 2004CPM Obvious representation 2n pointers –DOM: 3n. Ω(n log n) bits.

5th July 2004CPM Using parentheses Bill Bloggs 1 April 1961 parentheses representation: ( ( ( ) ( ) ) ( ( ) ( ) ( ) ) ) n + o(n) bits for tree structure

5th July 2004CPM Node interface ops using Parentheses DS Node interfaceParentheses DS PARENT ENCLOSE NEXTSIB FINDCLOSE PREVSIB FINDOPEN LASTCHILDFINDCLOSE, FINDOPEN

5th July 2004CPM Succinct DOM Succinct DOM: –uses far less space than standard DOM, –performance competitive with DOM. Node interface implemented by natural parentheses ops. Operations supported by parentheses data structures –Jacobson `89, –Munro and Raman `01, –Our new data structure.

5th July 2004CPM Our new D/S Input: balanced string of 2n parentheses. Assume recursive data structure to store balanced string of 2N 2n parentheses. If N is O(n / lg 2 n) store answers explicitly for every pair of parentheses. Otherwise Divide into blocks of size Number of blocks

5th July 2004CPM FINDCLOSE(x) ( ( ( ) ( ( ( ) ) ) ( ) ( ( ) ) ( ) ) ) FINDCLOSE(3)? Matching parenthesis inside block – near parenthesis. Pre-computed table stores position of matching parentheses for all near parentheses. –O(1) time if near parenthesis. –Table size is

5th July 2004CPM Pioneer Parentheses FINDCLOSE(5)? Matching parenthesis outside block – far parenthesis. b(p) = block# of parenthesis at position p = position of match of p q is 1 st far parenthesis before p p is pioneer if At most 2β-3 open pioneers. Similarly at most 2β-3 close pioneers. ( ( ( ) ( ( ( ) ) ) ( ) ( ( ) ) ( ) ) )

5th July 2004CPM Pioneer Family Pioneer family: set of all opening and closing pioneers along with their matching parentheses. Balanced string of size at most 4β-6. ( ( ( ) ( ( ( ) ) ) ( ) ( ( ) ) ( ) ) ) ( ( ) )

5th July 2004CPM Our D/S ( ( ( ) ( ( ( ) ) ) ( ) ( ( ) ) ( ) ) ) ( ( ) ) NND 2N O(N / lg N) Two levels of recursion. When pioneer family is O(N/lg2N) we store explicit answers.

5th July 2004CPM Space usage NND uses O(N lg lg N / lg N) bits. Tables use O( N lg lg N / lg N) bits. S(n) = 2n+ O(n lg lg n / lg n) = 2n +o(n) bits.

5th July 2004CPM Pseudo-pioneers Near blocks: blocks which have no pioneers. Insert pseudo-pioneers at start and end of every near block. –Pseudo-pioneers do not effect FINDOPEN(x), FINDCLOSE(x), ENCLOSE(x) Gap between pioneers now at most 2B = O(lg N).

5th July 2004CPM NND 2n-bit vector used to find the pioneer for a far parenthesis. If pioneer at pos i in parentheses string then 1 at i in NND. Operations we need: –Find address of most recent 1 at position i r = Rank(i) p = Select(r) –Find i th 1in bit vector p = Select(i) We want succinct representation. D/S should be simple and fast. ( ( ( ) ( ( ( ) ) ) ( ) ( ( ) ) ( ) ) )

5th July 2004CPM NND Bit vector of length M with N 1s. Gap between 1s at most (lg M) c. t = lg M / 2 c lg lg M.

5th July 2004CPM Select(i) Find i th 1 in bit vector. Array A1 stores position of every t th 1 –Space is Array A2 stores gaps between consecutive 1s –Space is O( N lg lg M ) or O( M lg lg M / lg M ) bits. Table T1 allows us to lookup sum of upto t gap. –Space is SELECT(i) i’ = i’’ = (i+1)/mod t y = concat of A2[i’+1],..,A2[i’+i’’] return A1[x] + T1[y]

5th July 2004CPM Rank(i) Prefix sum at position i. Need two more arrays and tables of size at most O(M lg lg M / lg M) bits.

5th July 2004CPM Implementation Details C++ on Sun UltraSparc-III and Pentium 4. Implemented new and optimised Jacobson D/S. CenterPoint XML for DOM. Sample of 12 XML documents of varying sizes and node counts. Blocksizes 32, 64, 128 and 256. Test was depth first tree walk, counting nodes of a given XML type.

5th July 2004CPM Space usage and performance Space usage for tree structure –Std DOM: 96 bits per node. –Jacobson: 3.3 – 16 bits per node. –New D/S: 2.9 – 12.8 bits per node. Avg performance for succinct D/S relative to std DOM –UltraSparc: 1 to 2.5 times slower. –Pentium 4: 1.7 to 4 times slower.

5th July 2004CPM Conclusions and Future work Conceptually simple succinct representation for balanced parentheses with O(1) time ops. o(n) time and space construction algorithm. Improved lower bound term for space bound. Relative performance very good on UltraSparc but poorer on Pentium 4, which has small cache –Cache optimisation is an interesting problem. Complete set of D/S for succinct DOM.