Paolo Ferragina, Università di Pisa Compressed Rank & Select on general strings Paolo Ferragina Dipartimento di Informatica, Università di Pisa.

Slides:



Advertisements
Similar presentations
Splay Tree Algorithm Mingda Zhao CSC 252 Algorithms Smith College Fall, 2000.
Advertisements

Succinct Representation of Balanced Parentheses, Static Trees and Planar Graphs J. Ian Munro & Venkatesh Raman.
Paolo Ferragina, Università di Pisa Compressed Permuterm Index Paolo Ferragina Dipartimento di Informatica, Università di Pisa.
An Improved Succinct Dynamic k-Ary Tree Representation (work in progress) Diego Arroyuelo Department of Computer Science, Universidad de Chile.
The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space Roberto GrossiGiuseppe Ottaviano * Università di Pisa * Part of the work.
Succinct Representations of Trees S. Srinivasa Rao Seoul National University.
Wavelet Trees Ankur Gupta Butler University. Text Dictionary Problem The input is a text T drawn from an alphabet Σ. We want to support the following.
© 2004 Goodrich, Tamassia Greedy Method and Compression1 The Greedy Method and Text Compression.
Chapter 9: Greedy Algorithms The Design and Analysis of Algorithms.
Nick Harvey & Kevin Zatloukal
Trees and Red-Black Trees Gordon College Prof. Brinton.
6/26/2015 7:13 PMTries1. 6/26/2015 7:13 PMTries2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3) Huffman encoding.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
1 Section 9.2 Tree Applications. 2 Binary Search Trees Goal is implementation of an efficient searching algorithm Binary Search Tree: –binary tree in.
Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,
2-3 Trees Extended tree.  Tree in which all empty subtrees are replaced by new nodes that are called external nodes.  Original nodes are called internal.
Basics of Data Compression Paolo Ferragina Dipartimento di Informatica Università di Pisa.
Succinct Representations of Trees
Data Compression1 File Compression Huffman Tries ABRACADABRA
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
© 2004 Goodrich, Tamassia Tries1. © 2004 Goodrich, Tamassia Tries2 Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries.
Summer School '131 Succinct Data Structures Ian Munro.
Basics of Data Compression Paolo Ferragina Dipartimento di Informatica Università di Pisa.
Copyright © Cengage Learning. All rights reserved. CHAPTER 10 GRAPHS AND TREES.
Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing Prosenjit Bose, Carleton University Meng He, Unversity of Waterloo.
1 Red-Black Trees By Mary Hudachek-Buswell Red Black Tree Properties Rotate Red Black Trees Insertion Red Black Trees.
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz,
Random access to arrays of variable-length items
2-3 Trees Extended tree.  Tree in which all empty subtrees are replaced by new nodes that are called external nodes.  Original nodes are called internal.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Binomial Tree B k-1 B0B0 BkBk B0B0 B1B1 B2B2 B3B3 B4B4 Adapted from: Kevin Wayne B k : a binomial tree B k-1 with the addition of a left child with another.
1 Fat heaps (K & Tarjan 96). 2 Goal Want to achieve the performance of Fibonnaci heaps but on the worst case. Why ? Theoretical curiosity and some applications.
Index construction: Compression of postings Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 5.3 and a paper.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1Computer Sciences. 2 HEAP SORT TUTORIAL 4 Objective O(n lg n) worst case like merge sort. Sorts in place like insertion sort. A heap can be stored as.
Index construction: Compression of documents Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading Managing-Gigabytes: pg 21-36, 52-56,
Greedy algorithms 2 David Kauchak cs302 Spring 2012.
Nov 2, 2001CSE 373, Autumn Hash Table example marking deleted items + choice of table size.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
A simple storage scheme for strings achieving entropy bounds Paolo Ferragina and Rossano Venturini Dipartimento di Informatica University of Pisa, Italy.
Index construction: Compression of postings
Tries 4/16/2018 8:59 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
COMP261 Lecture 23 B Trees.
Lec 13 Oct 17, 2011 AVL tree – height-balanced tree Other options:
Tries 07/28/16 11:04 Text Compression
Data Structures: Disjoint Sets, Segment Trees, Fenwick Trees
Tries 5/27/2018 3:08 AM Tries Tries.
Succinct Data Structures
Binary search tree. Removing a node
The Greedy Method and Text Compression
Paolo Ferragina Dipartimento di Informatica, Università di Pisa
Tries 9/14/ :13 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Data Structures: Segment Trees, Fenwick Trees
Auto-completion Search
Math 221 Huffman Codes.
Lecture 29 Heaps Chapter 12 of textbook Concept of heaps Binary heaps
Trees Lecture 9 CS2110 – Fall 2009.
Trees & Forests D. J. Foreman.
Index construction: Compression of postings
Forests D. J. Foreman.
Trees (Part 1, Theoretical)
Paolo Ferragina Dipartimento di Informatica, Università di Pisa
Trees Addenda.
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
Index construction: Compression of postings
Index construction: Compression of postings
Paolo Ferragina Dipartimento di Informatica, Università di Pisa
Sequences 5/17/ :43 AM Pattern Matching.
Succinct Data Structures
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Paolo Ferragina, Università di Pisa Compressed Rank & Select on general strings Paolo Ferragina Dipartimento di Informatica, Università di Pisa

Paolo Ferragina, Università di Pisa Generalised Rank and Select Rank(c,i) = #c in L[1,i] Select(c,i) = position of the i-th c in L L = a b a a a c b c d a b e c d... Rank( a, 7 ) = 4Select( a, 2 ) = 3

Paolo Ferragina, Università di Pisa Generalised Rank and Select  If  is small (i.e. constant)  Build binary Rank data structure for each symbol of  Rank takes O(1) time and small space  If  is large ( words ?)  Need a smarter solution: Wavelet Tree data structure Algorithmic reduction: >> Reduce Rank&Select over arbitrary strings... to Rank&Select over binary strings

Paolo Ferragina, Università di Pisa The Wavelet Tree ac br d abracadabra (Alphabetic ?) Tree

Paolo Ferragina, Università di Pisa The Wavelet Tree ac br d abracadabra aacaaabrdbr brbr rr ? aaaaa ? bb ? d ?

Paolo Ferragina, Università di Pisa The Wavelet Tree ac br d abracadabra aacaaa brdbr brbr abracadabra aacaaa brdbr brbr Fact. Given the tree and the binary strings, we can recover the original string !! In any case, O(|  | log |  |) bits. Easier Alphabetic order + Heap structure

Paolo Ferragina, Università di Pisa brdbr abracadabra brbr 0101 aacaaa The Wavelet Tree ac br d Rank(b,8) Rank(b,3) Rank(b,2) Reduce to right symbols Reduce to left symbols It’s binary Every step can be turned to binary

Paolo Ferragina, Università di Pisa abracadabra Rank 1 (8)=3 Rank 0 (2) = 2 – Rank 1 (1)= 1 Rank 0 (3) = 3 – Rank 1 (3)= 2 brbr 0101 brdbr aacaaa The Wavelet Tree ac br d Generalised R&S implemented with log |  | binary R&S Rank(b,8) Right move = Rank 1 Left move = Rank 0 Left move = Rank 0 Select is similar

Paolo Ferragina, Università di Pisa Representing Trees Paolo Ferragina Dipartimento di Informatica, Università di Pisa

Standard representation Binary tree: each node has two pointers to its left and right children An n-node tree takes 2n pointers or 2n lg n bits. Supports finding left child or right child of a node (in constant time). For each extra operation (eg. parent, subtree size) we have to pay additional n lg n bits each. x xxxx xxxx

Can we improve the space bound? There are less than 2 2n distinct binary trees on n nodes. 2n bits are enough to distinguish between any two different binary trees. Can we represent an n node binary tree using 2n bits?

Binary tree representation A binary tree on n nodes can be represented using 2n+o(n) bits to support: parent left child right child in constant time.

Heap-like notation for a binary tree Add external nodes Label internal nodes with a 1 and external nodes with a 0 Write the labels in level order One can reconstruct the tree from this sequence An n node binary tree can be represented in 2n+1 bits. What about the operations?

Heap-like notation for a binary tree parent(x) = On red ( ⌊ x/2 ⌋ ) left child(x) = On green(2x) right child(x) = On green(2x+1) x  x: # 1’s up to x (Rank) x  x: position of x-th 1 (Select)