PhD Thesis Iwona Bialynicka-Birula Ranked Queries in Index Data Structures.

PhD Thesis Iwona Bialynicka-Birula Ranked Queries in Index Data Structures

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 2 Outline Background The problem State of the art Rank-sensitivity Making suffix trees rank-sensitive Experimental results A general framework Dynamic Cartesian trees

Part I Introduction and background

Rank-sensitivity Output-sensitive l – size of output set Query time: O(s(n) + l) s(n) = o(n) Rank-sensitive k – runtime parameter Query time: O(s(n) + k) k  l Results in rank order 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 4

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 5 Motivation Output-sensitive data structures can still be too costly Most often additional criteria exist Examples Web pages – PageRank or similar Geometrical objects – Z-order Various databases – physical location News items – time stamp Biological databases – biological relevance Real-time systems

Suffix trees 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 6 $ens 1234567891011121314 senselessness$ l[7–14] n[4–14] ness$ ess$ss$es $ness$$ 52 11741128 103139 6 14

Range trees 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 7

Priority search trees Heap with respect to y coordinate Left subtree < right subtree inorder not necessarily x order (balanced) Three-sided query Input: x 0, x 1, y 1 Output: 〈x, y〉 : x 0 ≤ x ≤ x 1, y ≤ y 1 Dynamic version All points in leaves + possibly on root path Red-black rotations + „pushing down” 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 8

Dynamic priority search tree 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 9 〈11, 1〉 〈1, 2〉 〈12, 3〉 〈3, 9〉 〈1, 2〉 〈3, 9〉 〈4, 18〉 〈10, 7〉〈15, 4〉 〈5, 12〉 〈14, 5〉 〈10, 7〉 〈8, 13〉 〈5, 12〉 〈15, 4〉〈14, 5〉〈11, 1〉 〈12, 3〉 (y)(y)(x)(x) (y)(y)(x)(x) 4 ≤ x ≤ 13 y ≤ 11

Cartesian trees Heap with respect to y coordinate Inorder matches x order Set of points uniquely determines tree shape Dynamic version presented in Part IV 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 10

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 11 Cartesian tree example 〈2, 22〉 〈21, 20〉 〈18, 19〉 〈6, 17〉 〈20, 16〉 〈7, 15〉 〈8, 13〉 〈5, 12〉 〈9, 11〉 〈17, 10〉 〈3, 9〉 〈16, 8〉 〈10, 7〉 〈22, 6〉 〈15, 4〉 〈12, 3〉 〈1, 2〉 〈11, 1〉 〈19, 21〉 〈4, 18〉 〈14, 5〉

Part II Adding rank-sensitivity to suffix trees

The problem 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 13 5 142163 9 4 1 7 6 12 1511138 10

Naive solution (1) For each node, store best-ranking descendant For each leaf– ancestor pair, store successor Problem: quadratic space 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 14

Naive solution (2) Store only distinct successors Space is now O(n log n) Problem: non-constant time not rank-sensitive 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 15

Ranked tree Store predecessors, not successors There is now need to store 1 st, 2 nd, 4 th, 8 th,..., 2 l-th – best ranking descendents instead of just the first O(log n) per node Store only distinct predecessors O(log n) per node Augment list with pointers to quickly access any light depth O(log n) per node 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 16

Example 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 17 5 142163 9 4 1 7 6 12 15 11 138 10 k = 5 8 9 10 9

Complexity Space: O(n log n) Query time: O(k) Amortized over the k elements reported No additional search cost if pointer to node given (e.g. suffix trees) 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 18

Experimental results Used various texts and queries random, English, DNA up to 2×10 6 characters long Query time depends only on k For total results < 10000 even faster than unsorted subtree traversal 4–5 times faster than traversal + sorting For all values tested faster than traversal + sorting 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 19

Part III Rank-sensitivity – a general framework

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 21 Generic solution Tree data structures Result set is obtained from An interval of consecutive leaves or O(polylog n) such disjoint intervals Examples Suffix trees Range trees Hierarchy

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 22 Our results in this model Static version Query time: O(t(n)+k) Space: |D|+O(s(n)log  n) for any  0  1 Dynamic version Query time: O(t(n)+k) +O(log n / log log n)∗interval Space: |D|+O(s(n)log n/log log n) Update: O(log n) per copy D – output-sensitive data structure Query time: O(t(n)+l) Space: |D| in memory words s(n) – number of items stored in D (incl. copies) D – rank-sensitive version of D

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 23 Basic idea 76 67 81 18 43 34 52 25 16782345 12345678 O(n log n) space

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 24 Query 76 67 81 18 43 34 52 25 16782345 12345678 Reduced to merging O(log n) lists

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 25 Space reduction in static case 76 67 81 18 43 34 52 25 16782345 12345678 Chazelle 1988 O(n log  n) space 10101010 10011001 01111000

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 26 Dynamic case Store explicit values in lists Weight-balanced B-tree Degree proportional to log n/log log n Dynamic fractional cascading Multi-Q-heaps Constant-depth hierarchical pipeline of heaps

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 27 Multi-Q-heaps Similar to Q-heap Stores up to O(log N/log log N) integers The integers are from 0...O(N) Search, find-min, insert, delete takes O(1) Requires lookup tables of O(N) space Performs operations on any subset of items Simple implementation, no special instructions

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 28 Multi-Q-heaps in our solution 3 11 2 6 7 1 26 13 16 2 1 13... 1 Constant depth O(log N) Multi-Q ______ log log N log N  ______ log log N log N  ______ log log N log N 

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 29 Multi-Q-heaps in our solution Nodes have non-constant degree 2139 115 Multi-Q 3

Part IV Dynamic Cartesian Trees

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 31 Cartesian trees Vuillemin 1980 Nodes store points 〈x, y〉 y value can be viewed as priority Recursive definition Root stores point with greatest y value x value partitions remaining points (left and right subtrees)

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 32 Cartesian tree example 〈2, 22〉 〈21, 20〉 〈18, 19〉 〈6, 17〉 〈20, 16〉 〈7, 15〉 〈8, 13〉 〈5, 12〉 〈9, 11〉 〈17, 10〉 〈3, 9〉 〈16, 8〉 〈10, 7〉 〈22, 6〉 〈15, 4〉 〈12, 3〉 〈1, 2〉 〈11, 1〉 〈19, 21〉 〈4, 18〉 〈14, 5〉

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 33 Applications Priority queue Randomized searching (treaps) Range and dominance searching RMQ (Range Maximum Query) LCA (Least Common Ancestor) Integer sorting Memory management Suffix trees...

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 34 From RMQ to LCA 2, 22, 9, 18, 12, 17, 15, 13, 11, 7, 1, 3, 5, 4, 8, 10, 19, 21, 16, 20, 6 22 20 19 17 16 15 13 12 11 10 9 8 7 6 4 3 2 1 21 18 5 9, 18, 12, 17, 15, 13, 11, 7, 1 17 15 13 12 11 9 7 1 18

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 35 From LCP array to suffix tree $ I$ IPPI$ ISSIPPI$ ISSISSIPPI$ MISSISSIPPI$ PI$ PPI$ SIPPI$ SISSIPPI$ SSIPPI$ SSISSIPPI$ 0 1 1 4 0 0 1 0 2 1 3 $ IM...P S $P...SSI P...S... I$PI$ISI P...S... P...S... 1 2 3 4 5 6 7 8 910 11 12 11 8 5 2 1 10 9 7 4 6 3 0114001021301140010213

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 36 History Static setting O(n) construction time, provided elements already sorted Randomized Random priority values – treaps O(log n) expected height O(log n) expected update time Non-uniform probability distributions yield O(√n) or even O(n) height Dynamic and deterministic ???

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 37 Our result Dynamic Cartesian tree Supports insertion Supports weak deletion Maintains actual tree structure between each operation O(log n) amortized time per operation

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 38 Solution outline Combinatorial analysis How many tree elements change due to n insertions? Notion of entropy is exploited Auxiliary structure for accessing tree Needed to quickly access tree elements which need to change Based on the interval tree

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 39 Insertion 〈2, 22〉 〈21, 20〉 〈18, 19〉 〈6, 17〉 〈20, 16〉 〈7, 15〉 〈8, 13〉 〈5, 12〉 〈9, 11〉 〈17, 10〉 〈3, 9〉 〈16, 8〉 〈10, 7〉 〈22, 6〉 〈15, 4〉 〈12, 3〉 〈1, 2〉 〈11, 1〉 〈19, 21〉 〈4, 18〉 〈14, 5〉 〈13, 14〉

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 40 Insertion – worst case 〈1, 16〉 〈7, 4〉 〈8, 2〉 〈17, 15〉 〈16, 13〉 〈15, 11〉 〈14, 9〉 〈13, 7〉 〈12, 5〉 〈11, 3〉 〈10, 1〉 〈2, 14〉 〈3, 12〉 〈4, 10〉 〈5, 8〉 〈6, 6〉 〈9, 17〉

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 41 Analysis – main idea Inserting new elements does not require comparing y coordinates of existing points In turn, deleting points does Conclusion: insertions reduce tree information content... so information entropy can be used as a potential function in an amortized analysis

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 42 > > > > > > > > > > > Insertion revisited

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 43 Insertion reversed (deletion) ???????????

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 44 Formally... Tree T induces partial order ≺ T on nodes Defined by the heap condition Partial order ≺ T has ℒ(T) linear extensions Linear extensions are permutations satisfying the order, i.e. P[i] ≺ T P[j] ⇒ i < j We define missing entropy: ℋ(T)=log ℒ(T) Information needed to sort nodes given tree topology > > > > > > > > > A B C D E I G J FH A B H J F G I D E C H A J F G I D B E C A J D F H G I B E C D J I A H F G E B C H J D A F G I E B C A D H F G J I B E C...

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 45 Missing entropy Can be zero ℋ(T)=0 Or can be up to ℋ(T)=O(n log n) When an insertion affects k edges, ℋ(T) increases by at least Ω(k)

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 46 So what now? Amortized number of edge modifications is O(log n) per insertion into an initially empty tree Node modifications are always constant But how to access the edges to modify? Without increasing the complexity

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 47 Implementation overview Companion interval tree stores tree edges Edges in Cartesian tree are either disjoint or nested So the interval tree has additional properties Operations are tailored to the special case of the Cartesian tree

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 48 Insertion once again 1. Find parent 2. Edges affected 4. Shrink k 3a. Delete 2 3b. Insert 3 1. Find parent 2. Edges affected 3a. Delete 2 3b. Insert 3 4. Shrink k

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 49 Action implementations 1. Find parent Uses the interval tree as a search tree 2. Edges affected Special kind of stabbing query 3. Insert and delete Standard interval tree operations 4. Shrink Emulating using inserts and deletes would yield O(k∗log n) Amortized argument based on the fact that shrinking edge travels down O(log n) O(log n+k) O(1)∗O(log n) k∗O(1)

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 50 Summary Rank-sensitivity Rank-sensitive suffix trees + experimental results A general framework Dynamic Cartesian trees

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 51 Thank you! Questions?

PhD Thesis Iwona Bialynicka-Birula Ranked Queries in Index Data Structures.

Similar presentations

Presentation on theme: "PhD Thesis Iwona Bialynicka-Birula Ranked Queries in Index Data Structures."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

PhD Thesis Iwona Bialynicka-Birula Ranked Queries in Index Data Structures.

Similar presentations

Presentation on theme: "PhD Thesis Iwona Bialynicka-Birula Ranked Queries in Index Data Structures."— Presentation transcript:

Similar presentations

About project

Feedback