Download presentation
Presentation is loading. Please wait.
1
Discrete Methods in Mathematical Informatics
Kunihiko Sadakane The University of Tokyo
2
BP Representation ((()()())(()()))
Each node is represented by a pair of matching open and close parentheses 2n bits for n nodes The size matches the lower bound 2 6 8 1 7 3 5 4 P ((()()())(()())) BP
3
Data Structure for findclose
Divide the parentheses sequence into blocks of length B = ½ log n b(p): block number containing p (p): position of parenthesis matching p parenthesis p is said to be far ⇔ b(p) b((p)) Far open parenthesis p is said to be opening pioneer ⇔ For the far open parenthesis q which immediately precedes p, b((p)) b((q)) Positions of parentheses which match with opening pioneers are represented by 0,1 vector ( ( ) ) ) p (p) (q) q r ( (r)
4
Lemma: Let denote the number of blocks
Lemma: Let denote the number of blocks. Then the number of opening pioneers is at most 23. Proof: A graph whose nodes correspond to the blocks and whose edges are (b(p), b((p)) is an outer-planar graph. Opening/closing pioneers form a BP again. = n/B = 2n/log n ⇒ Length of BP is O(n/log n)
5
Representing Recursive Structure
Opening pioneers and their matching parentheses are represented by a 0,1 vector B B is a sparse vector of length 2n with O(n/log n) 1’s Can be represented in O(n log log n/log n) bits ( ( ) ) ) p (p) (q) q r ( (r) P B 0100 0101 0000 0000 0010 1001 P1 ((()))
6
Let S(n) denote the size of BP representation for an n node tree
S(n) = 2n + O(n log log n/log n) + S(O(n/log n)) If the number of nodes becomes O(n/log2 n), a naïve data structure which stores all the answers uses only O(n/log n) bits Therefore S(n) = 2n + O(n log log n/log n)
7
Algorithm for findclose
To compute (p) = findclose(P,p) If p is not far, (p) is computed by a table Find the pioneer p* that immediately precedes p Find (p*) using the BP for pioneers If p is not pioneer, b((p)) b((p*)) The position of (p) is determined from the difference between depths of p and p* p* p (p) (p*) ( ( ) )
8
enclose Let (p) = enclose(P,p)
If b((p)) = b(p), (p) is found from a table If b((p)) b(p), store those positions also store positions of matching parentheses if there are more than one pairs of parentheses, store only the outermost one Recur for extracted parentheses ( ( (()))( ) ) )
9
Range Min-Max Trees In existing succinct data structures for trees, for each operation to be supported, a new index is added. The o(n) term cannot be ignored. The recursive method [6] uses 3.73n bits to support only findopen, findclose, enclose. It is preferable if various operations can be supported by an index
10
Definitions For a vector P[0..2n-1] and a function g
RMQ, RMQi are defined similarly (range maximum)
11
How to support operations on balanced parentheses sequence
Lemma: Let be a function s.t. (() = 1, ()) = 1 (()((()())())(()())()) P E findclose enclose
12
Implementing rank/select
Let , be functions s.t. (0)=0, (1)=1, (0)=1, (1)=0 rank/select and parentheses operations can be handled in a unified manner.
13
Range Min-Max Tree Divide the excess array E into blocks of length s
Each leaf of range min-max tree corresponds to a block, and stores min/max values in the block. Internal nodes have l children and stores min/max values of the children. (()((()())())(()())()) E 1/2 2/4 3/4 2/3 1/3 0/0 m/M 1/4 0/2 0/4 s = l = 3
14
Properties of Range Min-Max Trees
Each node corresponds to a range of the array. Any range of the array is represented by a disjoint union of O(lh) ranges corresponding to internal nodes and at most two ranges corresponding to leaves. (h: tree height) (()((()())())(()())()) E 1/2 2/4 3/4 2/3 1/3 0/0 m/M 1/4 0/2 0/4 s = l = 3
15
Properties of Excess Array
For each i, E[i+1] = E[i]1 or E[i]+1 Let the min/max of E[u,v] be a and b, then in the range all integers e s. t. a e b exist, and other values do not exist. In a range of length l, the difference between min and max is at most l1. ⇒ values can be stored in fewer bits (()((()())())(()())()) P E 2 1 3 8 4 5 6 7 9 10 11
16
Computation of fwd_search(E,i,d)
Divide the range E[i+1,N1] (N: array length) Scan the divided ranges from left to right to find the range containing E[i]+d O(lh+s) time (()((()())())(()())()) E 1/2 2/4 3/4 2/3 1/3 0/0 m/M 1/4 0/2 0/4
17
The case the array is short (polylog)
Let w be the word length (bits) of CPU Lemma If N < wc, fwd_search is done in O(c2) time, and the data structure size is N + O(Nc/w) + exp(w) bits. Proof Excess values are between wc and wc ⇒ O(c log w) bits. w/(c log w) values can be read simultaneously. If the branching factor l of the range min-max tree is w/log w, ⇒ the height of the tree is O(c). Searching a child takes O(c) time.
18
Computation of LCA lca(v,w) = parent(rmqi(v,w)+1)
rmqi: the position of minimum value in E[v,w] Constant time using the range min-max tree The maximum-depth node is found similarly (()((()())())(()())()) E 1/2 2/4 3/4 2/3 1/3 0/0 m/M 1/4 0/2 0/4
19
Sparse Table Algorithm
For each interval [i,i+2k1] in array E[1,m], store the minimum value in M[i,k]. ( i = 1 ,...,m , k = 1 ,2, ・・・ ,lg m ) For a given query interval [s,b] Let k = lg(bs) Compare M[s, k] and M[b2k+1, k], and output the minimum. O(1) time, O(m lg2 m) bit space 3 E
20
This data structure is used when the length of B becomes O(n/lg3 n)
⇒o(n) bit space
21
Computation of Degree Let [v,w] be the range of E corresponding to a node v deg(v) = (# of minimum values in E[v+1,w1]) In each node of the range min-max tree, store the number of minimum values in the range. i-th child is also found. (()((()())())(()())()) E 2 1
22
The case the array is long
Divide the sequence into blocks of length wc Let M1,…, Mt, m1,…, mt be max/min values of the blocks To compute fwd_search(E,i,d), if E[i]+d < (the minimum value of the block containing i), the block containing the answer is the first block j with mj < E[i]+d
23
Other Queries RMQ is done by the sparse table algorithm
Because the number of blocks is small (n/wc), the space can be ignored. Theorem: There exists a data structure supporting all known operations on ordered trees in O(1) time using 2n + O(n/log n) bits.
24
Further Recuding the Space
Use “Succincter” [7] augmented B-tree B-tree for array A[1..n] For each node, a value is added Values are computed from those of child nodes and subtree size Range Min-Max Tree is an augmented B-tree Theorem: 2n + O(n/logc n) bits (c > 0 is an arbitrary constant.)
25
References [1] 定兼邦彦, 渡邉大輔. 文書列挙問題に対する実用的なデータ構造. 日本データベース学会Letters Vol.2, No.1, pp [2] Michael A. Bender, Martin Farach-Colton: The LCA Problem Revisited. LATIN 2000: 88-94 [3] Kunihiko Sadakane: Succinct data structures for flexible text retrieval systems. J. Discrete Algorithms 5(1): (2007) [4] Johannes Fischer: Optimal Succinctness for Range Minimum Queries. LATIN 2010: [5] Kunihiko Sadakane, Gonzalo Navarro: Fully-Functional Succinct Trees. SODA 2010: [6] R. F. Geary, N. Rahman, R. Raman, and V. Raman. A simple optimal representation for balanced parentheses. Theoretical Computer Science, 368:231–246, December 2006. [7] Mihai Pătraşcu. Succincter, Proc. FOCS, 2008.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.