Download presentation
Presentation is loading. Please wait.
Published byTamsin Francis Modified over 9 years ago
1
Succinct Indexes for Strings, Binary Relations and Multi-labeled Trees Jérémy Barbay, Meng He, J. Ian Munro, University of Waterloo S. Srinivasa Rao, IT University of Copenhagen
2
Background: Succinct Data Structures What are succinct data structures Jacobson 1989 Why succinct data structures Large data sets in modern applications: textual, genomic, spatial or geometric An implementation: Delpratt et al. 2006 Succinct integrated encodings Main data and auxiliary data structures
3
Our Problem: Succinct Indexes Use of the concept in previous work Compact PAT trees: Clark & Munro 1996 Lower bounds: Demaine & López-Ortiz 2001; Miltersen 2005 Upper bounds: Sadakane & Grossi 2006 Definition of succinct indexes in data structure design ADT: primitive access operators Succinct index: more powerful operators
4
Succinct Integrated Encodings + Navigational Operations Auxiliary Data Structures X Main Data
5
Succinct Indexes + Navigational Operations Succinct IndexMain Data
6
Succinct Indexes vs. Integrated Encodings Maximizing the freedom of the encoding of the main data Allowing incremental design Supporting implicit data
7
Strings: Definitions Notation Alphabet: [σ]={1, 2, …, σ} String: S[1..n] Operations: string_access(x): S[x] string_rank( α, x): number of occurrences of α in S[1..x] string_select( α, r): position of the r th occurrence of α in S
8
Strings: An Example S = a a b a c c c d a d d a b b b c string_access(8) =d string_rank(a, 8) =3 string_select(b, 3) =14
9
Strings: Previous Results Succinct Integrated Encodings Wavelet trees: Grossi et al. 2003 Space: nH 0 + o(n)∙lg σ bits Time: O(lg σ) time for all three operations Golynski et al. 2006 Space: n (lg σ + o(lg σ)) bits Time: O(lglg σ) time for string_access and string_rank, O(1) time for string_select
10
Strings: Our Results Succinct Indexes ADT string_access: f(n, σ) time Space: n∙o(lg σ) bits Operations string_rank: O(lglg σ lglglg σ (f(n, σ)+lglg σ)) string_select: O(lglglg σ (f(n, σ)+lglg σ)) Other operations: negations
11
Binary Relations: Definitions Notation Binary relation: R ⊆ [n] x [σ] Number of objects: n; number of labels: σ Number of object-label pairs: t Operations object_access(x, r): r th label associated with x label_access(x, α ): whether x is associated with α label_rank( α, x): number of objects labeled α up to object x label_select( α, r): r th object labeled α
12
Binary Relations: An Example σ n object_access(1, 2) = label_access(2, 3) = label_rank(3, 4) = label_select(4, 3) = 4 false 3 5 0 1 0 1 0 0 0 0 1 0 1 0 1 1 0 1 1 0 0 1
13
Binary Relations: Previous Results Succinct Integrated Encodings Barbay et al., 2006 Space: t (lg σ + o(lg σ)) bits Time: O(lglg σ) time for object_access, label_rank and label_access, O(1) time for label_select
14
Binary Relations: Our Results Succinct Indexes ADT: object_access: f(n,σ,t) Space: t∙o(lg σ) bits Time: label_rank and label_access: O(lglg σ lglglg σ (f(n,σ,t) + lglg σ)) label_select: O(lglglg σ (f(n,σ,t) + lglg σ))
15
Multi-labeled Trees: Definitions Notation Number of nodes: n Number of labels: σ Number of node-label pairs: t Operations α -descendant α -child α -ancestor
16
Multi-labeled Trees: An Example 1 2 37 56 4 8 91011 {a, c, d} {c, d} {a} {a, c} {a, b}{b,d} {a, b}{b} {c}{c,d}{b,c,d} Node 2 is a c-ancestor of node 6 Node 6 is a b-descendant of node 2 Node 10 is a d-child of node 8
17
Multi-labeled Trees: Previous Results Labeled trees Geary et al. 2004 Ferragina et al. 2005 Barbay et al. 2006 Multi-labeled trees Barbay et al. 2006
18
3 Multi-labeled Trees: Our Approach Traversal Orders Preorder DFUDS order Ordinal Trees: DFUDS Benoit et al. 1999 & 2005 Jansson et al. 2007 2 Binary Relations Nodes in preorder & labels Nodes in DFUDS order & labels 1 2 7 56 4 8 91011 3 456 78
19
Multi-labeled Trees: Our Results Succinct Indexes ADT: node_label(x, r) Supporting α -child/descendant queries: t∙o(lg σ) bits Supporting α -child/descendant/ancestor queries: t∙(lg ρ + o(lg ρ) + o(lg σ))bits (ρ: recursivity) Supporting α -child/descendant/ancestor queries of node x after another node y
20
Applications Compressed Succinct Encodings Strings Space: nH k + o(nlg σ) bits Operations: string_access: O(1) String_rank: O((lglg σ) 2 lglglg σ) string_select: O(lglg σ lglglg σ) First high-order entropy-compressed encoding supporting rank/select efficiently Other Data Structures
21
Applications (Continued) High-order entropy-compressed text indexes for large alphabets Notations: n-text size, σ-alphabet size, m- pattern length, occ-number of occurrences Our results Space: n H k +o(n lg σ) bits Pattern searching: O(m lglg σ+occ lg 1+ε n lglg σ) Previous results: a lg σ factor instead of lglg σ or incompressible
22
Conclusions We showed the importance of succinct indexes in the design of succinct data structures by designing: Succinct representation of multi-labeled trees that supports efficient retrieval of ancestors / children / descendants by label First high-order entropy compressed representation of strings supporting rank/select High-order entropy compressed text indexes for large alphabets
23
Conclusions (Continued) The concept of succinct indexes is useful in designing succinct data structures … it maximizes the freedom of the encoding of the main data and leads to a rich choice of design tradeoffs.
24
Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.