Presentation is loading. Please wait.

Presentation is loading. Please wait.

Succinct Data Structures

Similar presentations


Presentation on theme: "Succinct Data Structures"— Presentation transcript:

1 Succinct Data Structures
Kunihiko Sadakane National Institute of Informatics

2 Dynamic Data Structures
Bit vectors, strings, ordered trees Operations: access, rank, select, insert, delete

3 Memory Model Memory consists of a bit array (vector) M[0..]
Consecutive w bits can be read/written in O(1) time w: word length of CPU Memory consumption of an algorithm is defined as the maximum memory address accessed by the algorithm

4 Dynamic Memory Management [1]
Consider a data structure for this problem B: an array of m variable-length bit strings B[i] is called block i Each block is of length at most b bits address(i): returns the address of block i realloc(i, b’): changes the length of block i to b’ (address(i) may change) Computation model: word RAM with word length w

5 Theorem: Assume b  m, log m  w.
Let s be the sum of lengths of blocks B[1..m]. Then B can be stored in s + O(m log m + b2) bits, and address is done in O(1) and realloc is done in O(b/w) time. Proof: Let p = (log (mb)). p is #bits to represent a pointer to B. Divide the memory into segments of b+4p bits. The unit of memory allocation and deallocation is a segment. (A middle segment is never deallocated. The last segment is always allocated/deallocated.)

6 Store segments in doubly-linked lists.
List Lx stores all blocks of length x (1  x  b). pred, succ (p bits): addresses of preceding/succeeding segments offset (log b  p bits): address of the first block in the segment block_data (b+p bits): space to store blocks offset pred succ p bits b+p bits block bl ock Lx block_data

7 block stores bit string of B[i] and i (x + log m  x + p bits)
block is stored in a segment from right to left. If it cannot be stored in a segment, a new segment is allocated. The block is divided into two and each piece is stored in one of the segment. A block can be stored in one or two segments. (block length)  b + p = (block_data length) To enumerate all blocks in a segment, traverse block_data from position offset. B[i] can be stored in a list in arbitrary order. Another array is used to store pointers to blocks.

8 For each block B[i] (1  i  m),
Len[i] (log b  p bits): length of B[i] Pos[i] (log (b+p)  p bits): position of block B[i] inside a segment Seg[i], Ind[j] (p bits): segment storing B[i] 1 3 5 10 25 100 Seg 1 2 1 2 2 1 Ind pred succ offset B[1] B[100] B[5] Pos[1] Pos[100] Pos[5] pred succ offset B[3] B[25] B[10]

9 Ind[j] represents the actual address of the segment.
For all blocks stored in the same segment, values of Seg are identical (Seg[i1] = Seg[i2] = … = j). Ind[j] represents the actual address of the segment. 1 3 5 10 25 100 Seg 1 2 1 2 2 1 Ind pred succ offset B[1] B[100] B[5] Pos[1] Pos[100] Pos[5] pred succ offset B[3] B[25] B[10]

10 Implementing address(i)
Because a block is stored in one or two segments, let the return value of address(i) be two pairs of (addr, len). The first pair is determined by Ind[Seg[i]], Pos[i], Len[i]. If the block does not fit the segment, the second pair can be found by succ of the segment. the rest of the block is stored in the first block in the segment pointed to by succ of the segment. O(1) time

11 Implementing realloc(i, b’)
Find the current address and length b of block i Copy the content of block i to temporary space. Move the front block j in Lb to the emptied space. update Pos[j] and Seg[j] If the head segment of Lb becomes empty, delete it move the last segment z in the memory to the emptied space. update Ind[z] Insert block i at the head of Lb’. If the head segment does not have enough space, allocate a new segment. (memory region used is extended.)

12 Movement of blocks and segments takesO(b/w) time
Update of pointers takes O(1) time Required space The sum of block lengths is s At most one segment has empty space for each list Lx The number of segments to store all the blocks is at most s/b + b Required space is s + O(b2 + m log m) bits This data structure is denoted by D(b, m) Note: If the length of a block is w, the redundancy of log m bits for a block is too large.

13 Theorem: Assume b = O(w), log m  w.
Let s be the sum of lengths of all blocks B[1..m]. Then B is stored in s + O(m log w + w4) bits, and address and realloc are done in O(1) time. Proof: Divide B into pieces of w3 elements, and store each by using D(1+w, w3). Let Di denote it. Segments used by Di’s are managed by using another data structure D.

14 Di stores w3 blocks. Thus it stores w4 bits. Divide it into w2 pages.
Each page has w or zero segments. The number of pages used by all Di is m/w. Store these pages in data structure D(O(w2), m/w). address takes O(1) time realloc in D takes O(w) time, but it occurs after w occurrence of realloc in Di, the time complexity can be improved to O(1).

15 Required space Each Di uses (block size)+O(w3 log w) bits They are summed up to s + O(m log w) bits D uses O(w4 + m/w log(m/w)) bits In total, s + O(w4 + m log w) bits If 0,1 vector of length n is stored by this data structure s = n, w = log n b = log n (length of a block) m = s / b (number of blocks) Space is nH0 + O(n log log n/log n + log4 n)

16 Dynamic Bit Vectors [2] Store a bit-vector B[1..n] of length n
Operations access, rank, select insert(i, c): inserts a bit c between B[i] and B[i+1] delete(i): deletes B[i]

17 A Simple Data Structure
Divide the vector into blocks of length between L and 2L (L = (log2 n)) Store blocks in a balanced binary search tree in the order of positions in the vector. Blocks are stored in leaves. O(n/log2 n) blocks. An internal node stores the number of 1’s in blocks stored in the subtree rooted at the node. Space to store the tree: O(n/log n) bits If, by insert, the length of a block exceeds 2L, partition the block into two.

18 If, by delete, the length of a block becomes less than L,
if one of its adjacent block is of length more than L, move 1 bit from the block. if both of its adjacent block are of length L, merge the block with one of the adjacent ones. Operation on balanced binary search tree takes O(log n) time Operations to blocks are also in O(log n) time Note: If the value of n becomes twice, “log n” will change If log n changes, all indexes must be reconstructed.

19 Change of “log n” Partition the vector into three parts
left: use word length w = log n  1 middle: use word length w = log n right: use word length w = log n + 1 If insert is done: move the rightmost bit of left to middle, the rightmost bit of middle to right If delete is done: move the leftmost bit of right to middle, the leftmost bit of middle to left If the value of n is doubled, left becomes empty and middle, right become new left, middle. Reconstruction of indexes is needless.

20 Theorem: A bit-vector of length n is stored in
nH0 + O(n /log n) bits, and access, rank, select, insert, delete are done in O(log n) time.

21 Dynamic Ordered Trees [2]
Tree structure is represented by BP, and stored like the dynamic bit vector. Each node of the balanced binary search tree stores not only the number of 1’s, but also values in nodes of the range min-max tree. Theorem: An n node dynamic ordered tree is represented in 2n + O(n /log n) bits, and all operations are done in O(log n) time.

22 Faster Data Structures
As the range min-max tree, A B-tree with branching factor between and The depth of the range min-max tree becomes O(log n /log log n) L = log2 n /log log n

23 Time for the Operations
P[i], findclose, findopen, enclose, rmq, pre_rank, pre_select, isleaf, isancestor, depth, parent, first_child, last_child, next_sibling, prev_sibling, subtree_size, lca, deepest_node, height, in_rank, in_select, leaf_rank, leaf_select, leftmost_leaf, rightmost_leaf: O(log n / log log n) time level_ancestor, level_next, level_leftmost, level_prev, level_rightmost: O(log n) time insert, delete: O(log n/ log log n) or O(log n) time degree, child, child_rank: O(q log n/ log log n) or O(log n) time (q = degree)

24 References [1] Jesper Jansson, Kunihiko Sadakane, Wing-Kin Sung. Compressed random access memory, arXiv: v1. [2] Kunihiko Sadakane, Gonzalo Navarro: Fully-Functional Succinct Trees. SODA 2010: [3] Jacob Ziv and Abraham Lempel; Compression of Individual Sequences Via Variable-Rate Coding, IEEE Transactions on Information Theory, September 1978. [4] S. Rao Kosaraju, Giovanni Manzini: Compression of Low Entropy Strings with Lempel-Ziv Algorithms. SIAM J. Comput. 29(3): (1999). [5] Rodrigo González and Gonzalo Navarro. Statistical Encoding of Succinct Data Structures. Proc. CPM'06, pages LNCS 4009. [6] P. Ferragina and R. Venturini. A simple storage scheme for strings achieving entropy bounds. Theoretical Computer Science, 372(1):115– 121, 2007.


Download ppt "Succinct Data Structures"

Similar presentations


Ads by Google