Mike 66 Sept 20081 Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,

Slides:



Advertisements
Similar presentations
I/O and Space-Efficient Path Traversal in Planar Graphs Craig Dillabaugh, Carleton University Meng He, University of Waterloo Anil Maheshwari, Carleton.
Advertisements

Succinct Representations of Dynamic Strings Meng He and J. Ian Munro University of Waterloo.
Succinct Data Structures for Permutations, Functions and Suffix Arrays
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Chapter 4: Trees Part II - AVL Tree
5th July 2004CPM A Simple Optimal Representation for Balanced Parentheses Richard Geary, Naila Rahman, Rajeev Raman (University of Leicester, UK)
Succinct Representation of Balanced Parentheses, Static Trees and Planar Graphs J. Ian Munro & Venkatesh Raman.
An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile.
MS 101: Algorithms Instructor Neelima Gupta
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
SODA Jan 11, 2004Partial Sums1 Tight Bounds for the Partial-Sums Problem Mihai PǎtraşcuErik Demaine (presenting) MIT CSAIL.
Suffix Sorting & Related Algoritmics Martin Farach-Colton Rutgers University USA.
1 Suffix Trees and Suffix Arrays Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 8)
1 Prof. Dr. Th. Ottmann Theory I Algorithm Design and Analysis (12 - Text search: suffix trees)
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz,
Tries Standard Tries Compressed Tries Suffix Tries.
Succinct Representations of Trees S. Srinivasa Rao Seoul National University.
Advanced Algorithm Design and Analysis (Lecture 4) SW5 fall 2004 Simonas Šaltenis E1-215b
Modern Information Retrieval Chapter 8 Indexing and Searching.
Modern Information Retrieval
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
BTrees & Bitmap Indexes
Lists A list is a finite, ordered sequence of data items. Two Implementations –Arrays –Linked Lists.
Lecture 5: Master Theorem and Linear Time Sorting
Data Structures – LECTURE 10 Huffman coding
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Compact Representations of Separable Graphs From a paper of the same title submitted to SODA by: Dan Blandford and Guy Blelloch and Ian Kash.
CSE 373 Data Structures Lecture 15
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Binary Trees A binary tree is made up of a finite set of nodes that is either empty or consists of a node called the root together with two binary trees.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 6.
Succinct Representations of Trees
Space Efficient Data Structures for Dynamic Orthogonal Range Counting Meng He and J. Ian Munro University of Waterloo.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
Summer School '131 Succinct Data Structures Ian Munro.
Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing Prosenjit Bose, Carleton University Meng He, Unversity of Waterloo.
Integer Representations and Counting in the Bit Probe Model M. Zaiur Rahman and J. Ian Munro Cheriton School of Computer Science University of Waterloo.
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz,
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
An experimental study of priority queues By Claus Jensen University of Copenhagen.
Succinct Dynamic Cardinal Trees with Constant Time Operations for Small Alphabet Pooya Davoodi Aarhus University May 24, 2011 S. Srinivasa Rao Seoul National.
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
Succinct Ordinal Trees Based on Tree Covering Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT University of Copenhagen.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Exam 3 Review Data structures covered: –Hashing and Extensible hashing –Priority queues and binary heaps –Skip lists –B-Tree –Disjoint sets For each of.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Tries 4/16/2018 8:59 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Tries 07/28/16 11:04 Text Compression
Succinct Data Structures
Tries 5/27/2018 3:08 AM Tries Tries.
Succinct Data Structures
Decision trees Polynomial-Time
Succinct Data Structures: Upper, Lower & Middle Bounds
Paolo Ferragina Dipartimento di Informatica, Università di Pisa
Tries 9/14/ :13 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Paolo Ferragina Dipartimento di Informatica, Università di Pisa
Paolo Ferragina Dipartimento di Informatica, Università di Pisa
Succinct Data Structures
Switching Lemmas and Proof Complexity
Presentation transcript:

Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski, Meng He How do we encode a combinatorial object (e.g. a tree or a permutation) or a text file … even a static one in a small amount of space & still perform queries in constant time ???

Mike 66 Sept An Early Focus on Trees A Big Patricia Trie / Suffix Trie  Given a large text file; treat it as bit vector  Construct a trie with leaves pointing to unique locations in text that “match” path in trie (paths must start at character boundaries)  Skip the nodes where no branching (n-1 internal nodes)

Mike 66 Sept Abstract data type: binary tree Size: n-1 internal nodes, n leaves Operations: child, parent, subtree size, leaf data Motivation: “Obvious” representation of an n node tree takes about 6 n lg n words (up, left, right, size, memory manager, leaf reference) i.e. full suffix tree takes about 5 or 6 times the space of suffix array (i.e. leaf references only) Space for Trees

Mike 66 Sept Start with Jacobson, then others: There are about 4 n /(πn) 3/2 ordered rooted trees, and same number of binary trees Lower bound on specifying is about 2n bits What are the natural representations? Succinct Representations of Trees

Mike 66 Sept  Use parenthesis notation  Represent the tree  As the binary string (((())())((())()())): traverse tree as “(“ for node, then subtrees, then “)”  Each node takes 2 bits Arbitrary Ordered Trees

Mike 66 Sept Add external nodes Enumerate level by level Store vector length 2n+1 (Here don’t know size of subtrees; can be overcome. Could use isomorphism to flip between notations) Heap-like Notation for a Binary Tree

Mike 66 Sept Recent work (i.e. 2008)  Other classes of trees … e.g. binary but unordered … n bits (optimal + o(n))  Arbitrary graphs, n nodes, m edges (optimal + o(n))  Fast updates to trees Farzan & M Representing other Trees and Graphs in General

Mike 66 Sept How do we Navigate? Jacobson’s key suggestion: Operations on a bit vector rank(x) = # 1’s up to & including x select(x) = position of x th 1 So in the binary tree leftchild(x) = 2 rank(x) rightchild(x) = 2 rank(x) + 1 parent(x) = select(x/2)

Mike 66 Sept Rank & Select Rank: Auxiliary storage ~ 2nlglg n / lg n bits #1’s up to each (lg n) 2 rd bit #1’s within these too each lg n th bit Table lookup after that Select: More complicated (especially to get this lower order term) but similar notions Key issue: Rank & Select take O(1) time with lg n bit word (M. et al)

Mike 66 Sept Lower Bound: for Rank & for Select Theorem (Golynski): Given a bit vector of length n and an “index” (extra data) of size r bits, let t be the number of bits probed to perform rank (or select) then:r=Ω(n (lg t)/t). Proof idea: Argue to reconstructing the entire string with too few rank queries (similarly for select) Corollary (Golynski): Under the lg n bit RAM model, an index of size (n lglg n/ lg n) is necessary and sufficient to perform the rank and the select operations.

Mike 66 Sept Let P be a simple array giving π; P[i] = π[i] Also have B[i] be a pointer t positions back in (the cycle of) the permutation; B[i]= π -t [i].. But only define B for every t th position in cycle. (t is a constant; ignore cycle length “round-off”) So array representation P = [ x x 3 x 2 x 10 1] Permutations: a Shortcut Notation

Mike 66 Sept In a cycle there is a B every t positions … But these positions can be in arbitrary order Which i’s have a B, and how do we store it? Keep a vector of all positions: 0 = no B1 = B Rank gives the position of B[“i”] in B array So: π(i) & π -1 (i) in O(1) time & (1+ε)n lg n bits Representing Shortcuts

Mike 66 Sept Extension (M & Rao): Iterated Evaluation: This can be used to perform π k (i) (k in [-(n-1),n-1], in constant time Or Arbitrary Functions [n] → [n] “A function is just a hairy permutation” Aside.. extending

Mike 66 Sept A Lower Bound Theorem (a bunch of us): Under a pointer machine model with space (1+ ε) n references, we need time 1/ε to answer π and π -1 queries; i.e. this is as good as it gets … in the pointer model. A Lower Bound

Mike 66 Sept This is the best we can do for O(1) operations But using Benes networks: 1-Benes network is a 2 input/2 output switch r+1-Benes network … join tops to tops #bits(n)=2#bits(n/2)+n=n lg n-n+1=min+(n) R-Benes Network Getting n lg n Bits

Mike 66 Sept Realizing the permutation (std π (i) notation) ( ) Note: (n) bits more than “necessary” A Benes Network

Mike 66 Sept Divide into blocks of lg lg n gates … encode their actions in a word. Taking advantage of regularity of address mechanism and also Modify approach to avoid power of 2 issue Can trace a path in time O(lg n/(lg lg n)) This is the best time we are able get for π and π -1 in nearly minimum space. What can we do with it?

Mike 66 Sept Observe: This method “violates” the pointer machine lower bound by using “micropointers”. But … More general Lower Bound (Golynski): Both methods are optimal for their respective extra space constraints Both are Best

Mike 66 Sept Operations: a reciprocal property, a bijection between operations (e.g. π,π -1 or A[i] and find(a,j); or rank and select Tree program: restricted on how much of the data input can be read. Manipulation: if operations can be performed “too quickly”, we can reconstruct all the “raw data” with too few cell probes. Approach of Golynski Proof(s)

Mike 66 Sept Interesting, and useful, combinatorial objects can be: Stored succinctly … lower bound +o() So that Natural queries are performed in O(1) time (or at least very close) And We are starting to understand the +o(); Perhaps we will understand situations when the extra term is not +o() Conclusion