Download presentation
Presentation is loading. Please wait.
Published byColton Sinkey Modified over 9 years ago
1
Succinct Representations of Dynamic Strings Meng He and J. Ian Munro University of Waterloo
2
Background: Succinct Data Structures What are succinct data structures (Jacobson 1989) Representing data structures using ideally information-theoretic minimum space Supporting efficient navigational operations Why succinct data structures Large data sets in modern applications: textual, genomic, spatial or geometric
3
Strings: Definitions Notation Alphabet: [σ]={1, 2, …, σ} String: S[1..n] Operations: access(i): S[i] rank( α, i): number of occurrences of α in S[1..i] select( α, i): position of the i th occurrence of α in S
4
Strings: An Example S = a a b a c c c d a d d a b b b c string_access(8) =d string_rank(a, 8) =3 string_select(b, 3) =14
5
Succinct Representations of Strings Information-theoretic minimum: n lg σ bits Succinct representation (Grossi et al. 2003) Space: n H 0 +o(n)∙lg σ bits Time: O(lg σ) There are many more results. The case in which σ = 2 (bit vector) is even more fundamental! Jacobson 1989
6
Applications of Strings and Bit Vectors Ordinal trees on n nodes Standard approach: 3n lg n bits Succinct data structures: 2n + o(n) bits (Jacobson 1989, Munro & Raman 1997, Benoit et al. 1999…) Full text indexes for text string from [σ] n Suffix trees can use as much as 4n lg n to 6n lg n bits! Succinct data structures: n lg σ +o(n lg σ) bits ( Grossi et al. 2003, González and Navarro 2009… ) Labeled trees, planar graphs, binary relations, permutations, functions, …
7
Our Problem: Dynamic Strings Motivation: In many applications, data are also updated frequently For strings, we also consider the following update operations: insert( α, i), which inserts character α between S[i-1] and S[i] delete(i), which deletes S[i] from S
8
Comparisons Space (bits)Access, rank and select Insert and delete Gupta et al. 2007 n lg σ +lg σ∙(o(n)+O(1)) O(lg lg n)O(n ε ) amortized Mäkinen & Navarro 2008 n H 0 +o(n)∙lg σO(lg n lg σ) Lee & Park 2009 n lg σ +o(n)∙lg σ González and Navarro 2009 n H 0 +o(n)∙lg σ This papern H 0 +o(n)∙lg σ O(lg n ( ──── + 1)) lg σ lg lg n O(lg n ( ──── + 1)) lg σ lg lg n amortized O(lg n ( ──── + 1)) lg σ lg lg n O(lg n ( ──── + 1)) lg σ lg lg n O(──── ( ──── + 1)) lg σ lg lg n lg n lg lg n O(──── ( ──── + 1)) lg σ lg lg n lg n lg lg n For the special cases in which σ = polylog (n) or 2 (bit vector!), our results also improve previous results
9
Searchable Partial Sums Data A sequence Q of n nonnegative integers Operations sum(i): Q[1] + Q[2] + … + Q[i] search(x): the smallest i such that sum(i) ≥ x update(i, δ): Q[i] ← Q[i] + δ Raman et al. 2001 Assumptions: |Q| = O(lg ε n), |δ| ≤ lg n Space: O(lg 1+ε n) bits, with a universal table of size O(n ε’ ) bits Operations: O(1) time
10
Collections of Searchable Partial Sums Data d sequences of k-bit nonnegative integers of length n each Operations sum, search, update: supported on each sequence insert, delete: operated simultaneously on the same positions of all the sequences, but only 0’s can be inserted or deleted González and Navarro 2009 (CSPSI) 8 2 9 5 11 9 0 7 3 6 1 5 3 12 4 5 12 0 3 1 19 0 4 2 8 3 5 4 1 0 000000 sum(2, 5) =25insert(6)delete(6)
11
Our results on CSPSI Assumptions d = O(lg η n) |δ| ≤ lg n Space O(kdn + w) bits, where w is the word size Buffer: O(n lg n) bits Time All operations: O ( ──── ) lg n lg lg n
12
Data Structures for Dynamic Strings Over a Small Alphabet of size O(lg 1/2 n) Main data structure: a B-tree constructed over S Leaf Each leaf stores a superblock of at most 2L bits which encodes a substring of S (L = ) The numbers of occurrences of each character in all the superblocks form an integer sequence Maintain the above sequences for all the characters in the alphabet in a CSPSI structure E Internal node v (lg 1/2 n ≤ degree(v) ≤ 2lg 1/2 n) U(v): U(v)[i] = number of leaves of the subtree rooted at the i-th child of v I(v): I(v)[i] = number of characters stored in the subtree rooted at the i-th child of v ──── lg 2 n lg lg n
13
Supporting Queries rank( α, i) Perform a top-down traversal with the help of I(v)’s Locate the superblock, j, containing S[i] with the help of U(v)’s Perform sum( α, j) operation on E to count the number of occurrences of α in superblocks 1, 2, … j-1 Read superblock j in blocks of size (lg n) / 2 bits The support for access and select is similar v ……
14
Insert, delete and deamortization Supporting insert and delete requires traversing and updating the B-tree and updating E It is however much more complicated Merging and splitting B-tree nodes Deamortization
15
Succinct Global Rebuilding A key technique for deamortizing operations on B-trees is global rebuilding (Overmars and van Leeuwen 1981) Global rebuilding Rebuild the B-tree after the number of update operations performed exceeds half the initial length of the string A new copy and an old copy of the B-tree: more space A buffer of O(n lg n) bits is required Succinct global rebuilding Only one copy of the data: no duplication During rebuilding, queries and updates are performed on either the new part or the old part No buffer required
16
Putting Everything Together Dynamic strings over an alphabet of size O(lg 1/2 n) Space: n H 0 +o(n)∙lg σ bits Time: This can be extended to general alphabets using wavelet trees Space: n H 0 +o(n)∙lg σ bits Time: When σ = polylog (n) or 2 (bit vectors) Space: n H 0 +o(n)∙lg σ bits Time: O ( ──── ) lg n lg lg n O(──── ( ──── + 1)) lg σ lg lg n lg n lg lg n O ( ──── ) lg n lg lg n
17
Applications Dynamic text collections Data: a collection of text strings Operations Pattern search Display a substring Insert/delete a text string Compressed construction of full-text indexes Working space: n H k +o(n)∙lg σ bits Time: O(──── ( ──── + 1)) lg σ lg lg n n lg n lg lg n
18
Conclusions We designed a succinct representation of dynamic strings that provide more efficient operations than previous results This structure can be directly applied to improve previous results on text indexing We expect our results to play an important role in the design of dynamic succinct data structures We expect succinct global rebuilding to be useful for the deamotization of algorithms on dynamic succinct data structures
19
Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.