Paolo Ferragina, Università di Pisa Compressed Rank & Select on general strings Paolo Ferragina Dipartimento di Informatica, Università di Pisa
Paolo Ferragina, Università di Pisa Generalised Rank and Select Rank(c,i) = #c in L[1,i] Select(c,i) = position of the i-th c in L L = a b a a a c b c d a b e c d... Rank( a, 7 ) = 4Select( a, 2 ) = 3
Paolo Ferragina, Università di Pisa Generalised Rank and Select If is small (i.e. constant) Build binary Rank data structure for each symbol of Rank takes O(1) time and small space If is large ( words ?) Need a smarter solution: Wavelet Tree data structure Algorithmic reduction: >> Reduce Rank&Select over arbitrary strings... to Rank&Select over binary strings
Paolo Ferragina, Università di Pisa The Wavelet Tree ac br d abracadabra (Alphabetic ?) Tree
Paolo Ferragina, Università di Pisa The Wavelet Tree ac br d abracadabra aacaaabrdbr brbr rr ? aaaaa ? bb ? d ?
Paolo Ferragina, Università di Pisa The Wavelet Tree ac br d abracadabra aacaaa brdbr brbr abracadabra aacaaa brdbr brbr Fact. Given the tree and the binary strings, we can recover the original string !! In any case, O(| | log | |) bits. Easier Alphabetic order + Heap structure
Paolo Ferragina, Università di Pisa brdbr abracadabra brbr 0101 aacaaa The Wavelet Tree ac br d Rank(b,8) Rank(b,3) Rank(b,2) Reduce to right symbols Reduce to left symbols It’s binary Every step can be turned to binary
Paolo Ferragina, Università di Pisa abracadabra Rank 1 (8)=3 Rank 0 (2) = 2 – Rank 1 (1)= 1 Rank 0 (3) = 3 – Rank 1 (3)= 2 brbr 0101 brdbr aacaaa The Wavelet Tree ac br d Generalised R&S implemented with log | | binary R&S Rank(b,8) Right move = Rank 1 Left move = Rank 0 Left move = Rank 0 Select is similar
Paolo Ferragina, Università di Pisa Representing Trees Paolo Ferragina Dipartimento di Informatica, Università di Pisa
Standard representation Binary tree: each node has two pointers to its left and right children An n-node tree takes 2n pointers or 2n lg n bits. Supports finding left child or right child of a node (in constant time). For each extra operation (eg. parent, subtree size) we have to pay additional n lg n bits each. x xxxx xxxx
Can we improve the space bound? There are less than 2 2n distinct binary trees on n nodes. 2n bits are enough to distinguish between any two different binary trees. Can we represent an n node binary tree using 2n bits?
Binary tree representation A binary tree on n nodes can be represented using 2n+o(n) bits to support: parent left child right child in constant time.
Heap-like notation for a binary tree Add external nodes Label internal nodes with a 1 and external nodes with a 0 Write the labels in level order One can reconstruct the tree from this sequence An n node binary tree can be represented in 2n+1 bits. What about the operations?
Heap-like notation for a binary tree parent(x) = On red ( ⌊ x/2 ⌋ ) left child(x) = On green(2x) right child(x) = On green(2x+1) x x: # 1’s up to x (Rank) x x: position of x-th 1 (Select)