Download presentation
Presentation is loading. Please wait.
Published byMelvyn Cole Modified over 9 years ago
1
Succinct Data Structures Ian Munro University of Waterloo Joint work with David Benoit, Andrej Brodnik, D, Clark, F. Fich, M. He, J. Horton, A. López-Ortiz, S. Srinivasa Rao, Rajeev Raman, Venkatesh Raman, Adam Storm … How do we encode a large tree or other combinatorial object of specialized information … even a static one in a small amount of space and still perform queries in constant time ???
2
Example of a Succinct Data Structure: The (Static) Bounded Subset Given: Universe of n elements [0,...n-1] and m arbitrary elements from this universe Create: a static structure to support search in constant time (lg n bit word and usual operations) Using: Essentially minimum possible # bits... Operation: Member query in O(1) time (Brodnik & M.)
3
Focus on Trees.. Because Computer Science is.. Arbophilic - Directories (Unix, all the rest) - Search trees (B-trees, binary search trees, digital trees or tries) - Graph structures (we do a tree based search) - Search indices for text (including DNA)
4
A Big Patricia Trie / Suffix Trie Given a large text file; treat it as bit vector Construct a trie with leaves pointing to unique locations in text that “match” path in trie (paths must start at character boundaries) Skip the nodes where there is no branching ( n-1 internal nodes) 1 0 0 0 1 1 01 0 1
5
Space for Trees Abstract data type: binary tree Size: n-1 internal nodes, n leaves Operations: child, parent, subtree size, leaf data Motivation: “Obvious” representation of an n node tree takes about 6 n lg n words (up, left, right, size, memory manager, leaf reference) i.e. full suffix tree takes about 5 or 6 times the space of suffix array (i.e. leaf references only)
6
Succinct Representations of Trees Start with Jacobson, then others: There are about 4 n /(πn) 3/2 ordered rooted trees, and same number of binary trees Lower bound on specifying is about 2n bits What are the natural representations?
7
Arbitrary Ordered Trees Use parenthesis notation Represent the tree As the binary string (((())())((())()())): traverse tree as “(“ for node, then subtrees, then “)” Each node takes 2 bits
8
Heap-like Notation for a Binary Tree Add external nodes Enumerate level by level Store vector 11110111001000000 length2n+1 (Here don’t know size of subtrees; can be overcome. Could use isomorphism to flip between notations) 1 1 1 1 11 1 1 0 0 0 0 0 0 0 0 0
9
How do we Navigate? Jacobson’s key suggestion: Operations on a bit vector rank(x) = # 1’s up to & including x select(x) = position of x th 1 So in the binary tree leftchild(x) = 2 rank(x) rightchild(x) = 2 rank(x) + 1 parent(x) = select(x/2)
10
Rank & Select Rank -Auxiliary storage ~ 2nlglg n / lg n bits #1’s up to each (lg n) 2 rd bit #1’s within these too each lg n th bit Table lookup after that Select -more complicated but similar notions Key issue: Rank & Select take O(1) time with lg n bit word (M. et al) Aside: Interesting data type by itself
11
Other Combinatorial Objects Planar Graphs (Lu et al) Permutations [n] → [n] Or more generally Functions [n] → [n] But what are the operations? Clearly π(i), but also π -1 (i) And then π k (i) and π -k (i) Suffix Arrays (special permutations) in linear space
12
Permutations: a Shortcut Notation Let P be a simple array giving π; P[i] = π[i] Also have B[i] be a pointer t positions back in (the cycle of) the permutation; B[i]= π -t [i].. But only define B for every t th position in cycle. (t is a constant; ignore cycle length “round-off”) So array representation P = [8 4 12 5 13 x x 3 x 2 x 10 1] 1 2 3 4 5 6 7 8 9 10 11 12 13 245131831210
13
Representing Shortcuts In a cycle there is a B every t positions … But these positions can be in arbitrary order Which i’s have a B, and how do we store it? Keep a vector of all positions 0 indicates no B1 indicates a B Rank gives the position of B[“i”] in B array So: π(i) and π -1 (i) in O(1) time & (1+ε)n lg n bits Theorem: Under a pointer machine model with space (1+ ε) n references, we need time 1/ε to answer π and π -1 queries; i.e. this is as good as it gets.
14
Getting n lg n Bits: an Aside This is the best we can do for O(1) operations But using Benes networks: 1-Benes network is a 2 input/2 output switch r+1-Benes network … join tops to tops 1234567812345678 3578164235781642 R-Benes Network
15
A Benes Network Realizing the permutation (3 5 7 8 1 6 4 2) 1234567812345678 3578164235781642
16
What can we do with it? Divide into blocks of lg lg n gates … encode their actions in a word. Taking advantage of regularity of address mechanism and also Modify approach to avoid power of 2 issue Can trace a path in time O(lg n/(lg lg n) This is the best time we are able get for π and π -1 in minimum space. Observe: This method “violates” the pointer machine lower bound by using “micropointers”.
17
Back to the main track: Powers of π Consider the cycles of π ( 2 6 8)( 3 5 9 10)( 4 1 7) Keep a bit vector to indicate the start of each cycle ( 2 6 8 3 5 9 10 4 1 7) Ignoring parentheses, view as new permutation, ψ. Note: ψ -1 (i) is position containing i … So we have ψ and ψ -1 as before Use ψ -1 (i) to find i, then bit vector (rank, select) to find π k or π -k
18
Functions Now consider arbitrary functions [n] → [n] “A function is just a hairy permutation” All tree edges lead to a cycle
19
Challenges here Essentially write down the components in a convenient order and use the n lg n bits to describe the mapping (as per permutations) To get f k (i): Find the level ancestor (k levels up) in a tree Or Go up to root and apply f the remaining number of steps around a cycle
20
Level Ancestors There are several level ancestor techniques using O(1) time and O(n) WORDS. Adapt Bender & Farach-Colton to work in O(n) bits But going the other way …
21
f -k is a set Moving Down the tree requires care f -3 ( ) = ( ) The trick: Report all nodes on a given level of a tree in time proportional to the number of nodes, and Don’t waste time on trees with no answers
22
Final Function Result Given an arbitrary function f: [n] → [n] With an n lg n + O(n) bit representation we can compute f k (i) in O(1) time and f -k (i) in time O(1 + size of answer).
23
Back to Text … And Suffix Arrays Text T[1..n] over (a,b)*# (a<#<b) There are 2 n-1 such texts, which of the n! suffix arrays are valid? 12345678 SA=4 75 1 8 3 6 2 is abbaaba# SA -1 =48613725 M=47158236 isn’t..why?
24
Ascending to Max M is a permutation so M -1 is its inverse i.e. M -1 [i] says where i is in M Ascending-to-Max: 1 i n-2 i) M -1 [i] < M -1 [n] and M -1 [i+1] < M -1 [n] M -1 [i] < M -1 [i+1] ii) M -1 [i] > M -1 [n] and M -1 [i+1] > M -1 [n] M -1 [i] > M -1 [i+1] 4751 8 3 6 2OK 47158236NO
25
Non-Nesting Non-Nesting: 1 i,j n-1 and M -1 [i]<M -1 [j] i) M -1 [i] < M -1 [i+1] and M -1 [j] < M -1 [j+1] M -1 [i+1] < M -1 [j+1] ii) M -1 [i] > M -1 [i+1] and M -1 [j] > M -1 [j+1] M -1 [i+1] < M -1 [j+1] 4751 8 3 6 2OK 47158236NO
26
Characterization Theorem for Suffix Arrays on Binary Texts Theorem: Ascending to Max & Non-nesting Suffix Array Corollary: Clean method of breaking SA into segments Corollary: Linear time algorithm to check whether SA is valid
27
Cardinality Queries T= a b a a a b b a a a b a a b b # Remember lengths longest run of a’s and of b’s SA (broken by runs, but not stored explicitly ) 8 3 | 9 4 12| 1 10 5 13|16 | 7 2 11 15|6 14 B a, bit vector.. If SA -1 [i-1] in an “a” section store 1 in B a,[SA -1 [i]], else 0 B a 0 0 1 1 0 0 1 1 1 0 0 1 1 0 1 1 Create rank structure on B a, and similarly B b, (Note these are reversed except at #) Algorithm Count(T,P) s ← 1; e ← n; i ← m; while i>0 and se do if P[i[=a then s ← rank 1 (B a,s-1)+1; e ← rank 1 (B a,e) else s ← n a + 2 + rank 1 (B b,s-1); e ← n a + 1 +rank 1 (B b,e) i ← i-1 Return max(e-s+1,0) Time: O(length of query)
28
Listing Queries Complex methods Key idea: for queries of length at least d, index every d th position.. For T and forT(reversed) So we have matches for T[i..n] and T[1,i-1] View these as points in 2 space (Ferragina & Manzini and Grossi & Vitter) Do a range query (Alstrup et al) Variety of results follow
29
General Conclusion Interesting, and useful, combinatorial objects can be: Stored succinctly … O(lower bound) +o() So that Natural queries are performed in O(1) time (or at least very close) This can make the difference between using them and not …
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.