Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discrete Methods in Mathematical Informatics

Similar presentations


Presentation on theme: "Discrete Methods in Mathematical Informatics"— Presentation transcript:

1 Discrete Methods in Mathematical Informatics
Kunihiko Sadakane The University of Tokyo

2 BP Representation ((()()())(()()))
Each node is represented by a pair of matching open and close parentheses 2n bits for n nodes The size matches the lower bound 2 6 8 1 7 3 5 4 P ((()()())(()())) BP

3 Data Structure for findclose
Divide the parentheses sequence into blocks of length B = ½ log n b(p): block number containing p (p): position of parenthesis matching p parenthesis p is said to be far ⇔ b(p)  b((p)) Far open parenthesis p is said to be opening pioneer ⇔ For the far open parenthesis q which immediately precedes p, b((p))  b((q)) Positions of parentheses which match with opening pioneers are represented by 0,1 vector ( ( ) ) ) p (p) (q) q r ( (r)

4 Lemma: Let  denote the number of blocks
Lemma: Let  denote the number of blocks. Then the number of opening pioneers is at most 23. Proof: A graph whose nodes correspond to the blocks and whose edges are (b(p), b((p)) is an outer-planar graph. Opening/closing pioneers form a BP again.  = n/B = 2n/log n ⇒ Length of BP is O(n/log n)

5 Representing Recursive Structure
Opening pioneers and their matching parentheses are represented by a 0,1 vector B B is a sparse vector of length 2n with O(n/log n) 1’s Can be represented in O(n log log n/log n) bits ( ( ) ) ) p (p) (q) q r ( (r) P B 0100 0101 0000 0000 0010 1001 P1 ((()))

6 Let S(n) denote the size of BP representation for an n node tree
S(n) = 2n + O(n log log n/log n) + S(O(n/log n)) If the number of nodes becomes O(n/log2 n), a naïve data structure which stores all the answers uses only O(n/log n) bits Therefore S(n) = 2n + O(n log log n/log n)

7 Algorithm for findclose
To compute (p) = findclose(P,p) If p is not far, (p) is computed by a table Find the pioneer p* that immediately precedes p Find (p*) using the BP for pioneers If p is not pioneer, b((p))  b((p*)) The position of (p) is determined from the difference between depths of p and p* p* p (p) (p*) ( ( ) )

8 enclose Let (p) = enclose(P,p)
If b((p)) = b(p), (p) is found from a table If b((p))  b(p), store those positions also store positions of matching parentheses if there are more than one pairs of parentheses, store only the outermost one Recur for extracted parentheses ( ( (()))( ) ) )

9 Range Min-Max Trees In existing succinct data structures for trees, for each operation to be supported, a new index is added. The o(n) term cannot be ignored. The recursive method [6] uses 3.73n bits to support only findopen, findclose, enclose. It is preferable if various operations can be supported by an index

10 Definitions For a vector P[0..2n-1] and a function g
RMQ, RMQi are defined similarly (range maximum)

11 How to support operations on balanced parentheses sequence
Lemma: Let  be a function s.t. (() = 1, ()) = 1 (()((()())())(()())()) P E findclose enclose

12 Implementing rank/select
Let ,  be functions s.t.  (0)=0,  (1)=1,  (0)=1,  (1)=0 rank/select and parentheses operations can be handled in a unified manner.

13 Range Min-Max Tree Divide the excess array E into blocks of length s
Each leaf of range min-max tree corresponds to a block, and stores min/max values in the block. Internal nodes have l children and stores min/max values of the children. (()((()())())(()())()) E 1/2 2/4 3/4 2/3 1/3 0/0 m/M 1/4 0/2 0/4 s = l = 3

14 Properties of Range Min-Max Trees
Each node corresponds to a range of the array. Any range of the array is represented by a disjoint union of O(lh) ranges corresponding to internal nodes and at most two ranges corresponding to leaves. (h: tree height) (()((()())())(()())()) E 1/2 2/4 3/4 2/3 1/3 0/0 m/M 1/4 0/2 0/4 s = l = 3

15 Properties of Excess Array
For each i, E[i+1] = E[i]1 or E[i]+1 Let the min/max of E[u,v] be a and b, then in the range all integers e s. t. a  e  b exist, and other values do not exist. In a range of length l, the difference between min and max is at most l1.            ⇒ values can be stored in fewer bits (()((()())())(()())()) P E 2 1 3 8 4 5 6 7 9 10 11

16 Computation of fwd_search(E,i,d)
Divide the range E[i+1,N1] (N: array length) Scan the divided ranges from left to right to find the range containing E[i]+d O(lh+s) time (()((()())())(()())()) E 1/2 2/4 3/4 2/3 1/3 0/0 m/M 1/4 0/2 0/4

17 The case the array is short (polylog)
Let w be the word length (bits) of CPU Lemma If N < wc, fwd_search is done in O(c2) time, and the data structure size is N + O(Nc/w) + exp(w) bits. Proof  Excess values are between wc and wc ⇒ O(c log w) bits. w/(c log w) values can be read simultaneously.  If the branching factor l of the range min-max tree is w/log w, ⇒ the height of the tree is O(c).  Searching a child takes O(c) time.

18 Computation of LCA lca(v,w) = parent(rmqi(v,w)+1)
rmqi: the position of minimum value in E[v,w] Constant time using the range min-max tree The maximum-depth node is found similarly (()((()())())(()())()) E 1/2 2/4 3/4 2/3 1/3 0/0 m/M 1/4 0/2 0/4

19 Sparse Table Algorithm
For each interval [i,i+2k1] in array E[1,m], store the minimum value in M[i,k]. ( i = 1 ,...,m , k = 1 ,2, ・・・ ,lg m ) For a given query interval [s,b] Let k = lg(bs) Compare M[s, k] and M[b2k+1, k], and output the minimum. O(1) time, O(m lg2 m) bit space 3 E

20 This data structure is used when the length of B becomes O(n/lg3 n)
 ⇒o(n) bit space

21 Computation of Degree Let [v,w] be the range of E corresponding to a node v deg(v) = (# of minimum values in E[v+1,w1]) In each node of the range min-max tree, store the number of minimum values in the range. i-th child is also found. (()((()())())(()())()) E 2 1

22 The case the array is long
Divide the sequence into blocks of length wc Let M1,…, Mt, m1,…, mt be max/min values of the blocks To compute fwd_search(E,i,d), if E[i]+d < (the minimum value of the block containing i), the block containing the answer is the first block j with mj < E[i]+d

23 Other Queries RMQ is done by the sparse table algorithm
Because the number of blocks is small (n/wc), the space can be ignored. Theorem: There exists a data structure supporting all known operations on ordered trees in O(1) time using 2n + O(n/log n) bits.

24 Further Recuding the Space
Use “Succincter” [7] augmented B-tree B-tree for array A[1..n] For each node, a value is added Values are computed from those of child nodes and subtree size Range Min-Max Tree is an augmented B-tree Theorem: 2n + O(n/logc n) bits (c > 0 is an arbitrary constant.)

25 References [1] 定兼邦彦, 渡邉大輔. 文書列挙問題に対する実用的なデータ構造. 日本データベース学会Letters Vol.2, No.1, pp [2] Michael A. Bender, Martin Farach-Colton: The LCA Problem Revisited. LATIN 2000: 88-94 [3] Kunihiko Sadakane: Succinct data structures for flexible text retrieval systems. J. Discrete Algorithms 5(1): (2007) [4] Johannes Fischer: Optimal Succinctness for Range Minimum Queries. LATIN 2010: [5] Kunihiko Sadakane, Gonzalo Navarro: Fully-Functional Succinct Trees. SODA 2010: [6] R. F. Geary, N. Rahman, R. Raman, and V. Raman. A simple optimal representation for balanced parentheses. Theoretical Computer Science, 368:231–246, December 2006. [7] Mihai Pătraşcu. Succincter, Proc. FOCS, 2008.


Download ppt "Discrete Methods in Mathematical Informatics"

Similar presentations


Ads by Google