Download presentation
Presentation is loading. Please wait.
1
Rank and Select data structures
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Rank and Select data structures
2
A basic problem ! How do you retrieve the k-th string? D D B
Abaco$Battle$Car$Cold$Cod .... D Array of n string pointers to strings of total length m (n log m) bits = 32 n bits. it depends on the number of strings it is independent of string length Abaco Battle Car Cold Cod .... D B Spaces are introduced for simplicity You could drop the $ How do you retrieve the k-th string?
3
Rank/Select Wish to index the bit vector B[1,m] (possibly compressed).
Rank1(6) = 2 m = |B| n = #1 Rankb(i) = number of b in B[1,i] Selectb(i) = position of the i-th b in B Two approaches: Takes |B| + o(|B|) bits of space, Aims at achieving n log(m/n) bits, by deplyoing Elias-Fano + point (1)
4
The Bit-Vector Index: |B| + o(|B|)
m = |B| n = #1s The Bit-Vector Index: |B| + o(|B|) Goal. B is read-only, and the additional index takes o(m) bits. Rank B Z 8 18 block pos #1 z (bucket-relative) Rank1 0000 1 .... ... 1011 2 (absolute) Rank1 Setting Z = poly(log m) and z=(1/2) log m: Extra space is + (m/Z) log m + (m/z) log Z + o(m) + O(m loglog m / log m) = o(m) bits Rank time is O(1) Term o(m) is crucial in practice, B is untouched (not compressed)
5
The Select operation B Extra space is + o(m), and B is not touched!
m = |B| n = #1s The Select operation B size r is variable until the subarray includes k = (log m)2 1s Sparse case: If r > k2 = (log m)4 , we store explicitly the position of the k = (log m)2 1s, because we have at most (m/r) blocks of this type, each taking (m/r) * k * log m bits = O(m / log m) = o(m) bits Dense case: k ≤ r ≤ k2, recurse by repeating the argument now with k’ = (log log m)2. If r’ including k’ 1s > log m bits, then store the k’ positions explicitly using O(log log m) bits each, thus O(m/log log m) = o(m) bits in total. Otherwise r’ < log m, and thus a precomputed table is enough. Extra space is + o(m), and B is not touched! Select time is O(1)
6
Via Elias-Fano (|L| + |H| + o(|H|)) Therefore B is not needed
Recall that by setting w = log (m/n) and z = log n, where m = |B| and n = #1 then Space = n log (m/n) bits + 2n bits (Build Select1 on H so we need extra |H| + o(|H|) bits = 2n + o(n) bits ) z = 3, w=2 Select1(i) on B uses L and (Select1(H,i) – i) in +o(n) space Rank1(i) on B Needs binary search over B
7
If you wish to play with Rank and Select
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" If you wish to play with Rank and Select m/10 + n log (m/n) Rank in 0.4 msec, Select in < 1 msec vs 32n bits of explicit pointers
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.