PhD Thesis Iwona Bialynicka-Birula Ranked Queries in Index Data Structures.

Slides:



Advertisements
Similar presentations
Comp 122, Spring 2004 Binary Search Trees. btrees - 2 Comp 122, Spring 2004 Binary Trees  Recursive definition 1.An empty tree is a binary tree 2.A node.
Advertisements

Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
Suffix Sorting & Related Algoritmics Martin Farach-Colton Rutgers University USA.
Suffix Sorting & Related Algoritmics Martin Farach-Colton Rutgers University USA.
I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
I/O-Algorithms Lars Arge Spring 2011 March 8, 2011.
Rank-Sensitive Data Structures Iwona Bialynicka-Birula and Roberto Grossi (Università di Pisa) 12 th Symposium on String Processing and Information Retrieval.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
More sorting algorithms: Heap sort & Radix sort. Heap Data Structure and Heap Sort (Chapter 7.6)
Lists A list is a finite, ordered sequence of data items. Two Implementations –Arrays –Linked Lists.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
Amortized Rigidness in Dynamic Cartesian Trees Iwona Białynicka-Birula and Roberto Grossi Università di Pisa STACS 2006.
Lec 15 April 9 Topics: l binary Trees l expression trees Binary Search Trees (Chapter 5 of text)
Course Review COMP171 Spring Hashing / Slide 2 Elementary Data Structures * Linked lists n Types: singular, doubly, circular n Operations: insert,
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
Priority Queues1 Part-D1 Priority Queues. Priority Queues2 Priority Queue ADT (§ 7.1.3) A priority queue stores a collection of entries Each entry is.
Fundamentals of Python: From First Programs Through Data Structures
Important Problem Types and Fundamental Data Structures
Bioinformatics Programming 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Advanced Data Structures and Algorithms COSC-600 Lecture presentation-6.
Orthogonal Range Searching I Range Trees. Range Searching S = set of geometric objects Q = query object Report/Count objects in S that intersect Q Query.
COSC2007 Data Structures II
Chapter 19 - basic definitions - order statistics ( findkth( ) ) - balanced binary search trees - Java implementations Binary Search Trees 1CSCI 3333 Data.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Chapter 19: Binary Trees. Objectives In this chapter, you will: – Learn about binary trees – Explore various binary tree traversal algorithms – Organize.
CS Data Structures Chapter 15 Trees Mehmet H Gunes
CS Data Structures Chapter 5 Trees. Chapter 5 Trees: Outline  Introduction  Representation Of Trees  Binary Trees  Binary Tree Traversals 
Chapter 19 Implementing Trees and Priority Queues Fundamentals of Java.
Chapter 19 Implementing Trees and Priority Queues Fundamentals of Java.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Sorting Fun1 Chapter 4: Sorting     29  9.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Binary SearchTrees [CLRS] – Chap 12. What is a binary tree ? A binary tree is a linked data structure in which each node is an object that contains following.
A Study of Balanced Search Trees: Brainstorming a New Balanced Search Tree Anthony Kim, 2005 Computer Systems Research.
Outline Binary Trees Binary Search Tree Treaps. Binary Trees The empty set (null) is a binary tree A single node is a binary tree A node has a left child.
Symbol Tables and Search Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
1 Searching Searching in a sorted linked list takes linear time in the worst and average case. Searching in a sorted array takes logarithmic time in the.
Suffix trees. Trie A tree representing a set of strings. a b c e e f d b f e g { aeef ad bbfe bbfg c }
Data Structure II So Pak Yeung Outline Review  Array  Sorted Array  Linked List Binary Search Tree Heap Hash Table.
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
Tree Data Structures. Heaps for searching Search in a heap? Search in a heap? Would have to look at root Would have to look at root If search item smaller.
CIS 068 Welcome to CIS 068 ! Lesson 12: Data Structures 3 Trees.
Internal and External Sorting External Searching
CMPS 3130/6130 Computational Geometry Spring 2015
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Trees CSIT 402 Data Structures II 1. 2 Why Do We Need Trees? Lists, Stacks, and Queues are linear relationships Information often contains hierarchical.
DATA STRUCURES II CSC QUIZ 1. What is Data Structure ? 2. Mention the classifications of data structure giving example of each. 3. Briefly explain.
Advanced Data Structures Lecture 8 Mingmin Xie. Agenda Overview Trie Suffix Tree Suffix Array, LCP Construction Applications.
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Multiway Search Trees Data may not fit into main memory
CMPS 3130/6130 Computational Geometry Spring 2017
Priority Queues © 2010 Goodrich, Tamassia Priority Queues 1
B+ Tree.
Tree data structure.
Binary Trees, Binary Search Trees
Chapter 22 : Binary Trees, AVL Trees, and Priority Queues
Tree data structure.
Ch. 8 Priority Queues And Heaps
Important Problem Types and Fundamental Data Structures
Presentation transcript:

PhD Thesis Iwona Bialynicka-Birula Ranked Queries in Index Data Structures

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 2 Outline Background The problem State of the art Rank-sensitivity Making suffix trees rank-sensitive Experimental results A general framework Dynamic Cartesian trees

Part I Introduction and background

Rank-sensitivity Output-sensitive l – size of output set Query time: O(s(n) + l) s(n) = o(n) Rank-sensitive k – runtime parameter Query time: O(s(n) + k) k  l Results in rank order 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 4

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 5 Motivation Output-sensitive data structures can still be too costly Most often additional criteria exist Examples Web pages – PageRank or similar Geometrical objects – Z-order Various databases – physical location News items – time stamp Biological databases – biological relevance Real-time systems

Suffix trees 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 6 $ens senselessness$ l[7–14] n[4–14] ness$ ess$ss$es $ness$$

Range trees 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 7

Priority search trees Heap with respect to y coordinate Left subtree < right subtree inorder not necessarily x order (balanced) Three-sided query Input: x 0, x 1, y 1 Output: 〈x, y〉 : x 0 ≤ x ≤ x 1, y ≤ y 1 Dynamic version All points in leaves + possibly on root path Red-black rotations + „pushing down” 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 8

Dynamic priority search tree 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 9 〈11, 1〉 〈1, 2〉 〈12, 3〉 〈3, 9〉 〈1, 2〉 〈3, 9〉 〈4, 18〉 〈10, 7〉〈15, 4〉 〈5, 12〉 〈14, 5〉 〈10, 7〉 〈8, 13〉 〈5, 12〉 〈15, 4〉〈14, 5〉〈11, 1〉 〈12, 3〉 (y)(y)(x)(x) (y)(y)(x)(x) 4 ≤ x ≤ 13 y ≤ 11

Cartesian trees Heap with respect to y coordinate Inorder matches x order Set of points uniquely determines tree shape Dynamic version presented in Part IV 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 10

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 11 Cartesian tree example 〈2, 22〉 〈21, 20〉 〈18, 19〉 〈6, 17〉 〈20, 16〉 〈7, 15〉 〈8, 13〉 〈5, 12〉 〈9, 11〉 〈17, 10〉 〈3, 9〉 〈16, 8〉 〈10, 7〉 〈22, 6〉 〈15, 4〉 〈12, 3〉 〈1, 2〉 〈11, 1〉 〈19, 21〉 〈4, 18〉 〈14, 5〉

Part II Adding rank-sensitivity to suffix trees

The problem 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis

Naive solution (1) For each node, store best-ranking descendant For each leaf– ancestor pair, store successor Problem: quadratic space 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 14

Naive solution (2) Store only distinct successors Space is now O(n log n) Problem: non-constant time not rank-sensitive 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 15

Ranked tree Store predecessors, not successors There is now need to store 1 st, 2 nd, 4 th, 8 th,..., 2 l-th – best ranking descendents instead of just the first O(log n) per node Store only distinct predecessors O(log n) per node Augment list with pointers to quickly access any light depth O(log n) per node 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 16

Example 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis k =

Complexity Space: O(n log n) Query time: O(k) Amortized over the k elements reported No additional search cost if pointer to node given (e.g. suffix trees) 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 18

Experimental results Used various texts and queries random, English, DNA up to 2×10 6 characters long Query time depends only on k For total results < even faster than unsorted subtree traversal 4–5 times faster than traversal + sorting For all values tested faster than traversal + sorting 21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 19

Part III Rank-sensitivity – a general framework

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 21 Generic solution Tree data structures Result set is obtained from An interval of consecutive leaves or O(polylog n) such disjoint intervals Examples Suffix trees Range trees Hierarchy

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 22 Our results in this model Static version Query time: O(t(n)+k) Space: |D|+O(s(n)log  n) for any  0  1 Dynamic version Query time: O(t(n)+k) +O(log n / log log n)∗interval Space: |D|+O(s(n)log n/log log n) Update: O(log n) per copy D – output-sensitive data structure Query time: O(t(n)+l) Space: |D| in memory words s(n) – number of items stored in D (incl. copies) D – rank-sensitive version of D

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 23 Basic idea O(n log n) space

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 24 Query Reduced to merging O(log n) lists

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 25 Space reduction in static case Chazelle 1988 O(n log  n) space

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 26 Dynamic case Store explicit values in lists Weight-balanced B-tree Degree proportional to log n/log log n Dynamic fractional cascading Multi-Q-heaps Constant-depth hierarchical pipeline of heaps

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 27 Multi-Q-heaps Similar to Q-heap Stores up to O(log N/log log N) integers The integers are from 0...O(N) Search, find-min, insert, delete takes O(1) Requires lookup tables of O(N) space Performs operations on any subset of items Simple implementation, no special instructions

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 28 Multi-Q-heaps in our solution Constant depth O(log N) Multi-Q ______ log log N log N  ______ log log N log N  ______ log log N log N 

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 29 Multi-Q-heaps in our solution Nodes have non-constant degree Multi-Q 3

Part IV Dynamic Cartesian Trees

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 31 Cartesian trees Vuillemin 1980 Nodes store points 〈x, y〉 y value can be viewed as priority Recursive definition Root stores point with greatest y value x value partitions remaining points (left and right subtrees)

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 32 Cartesian tree example 〈2, 22〉 〈21, 20〉 〈18, 19〉 〈6, 17〉 〈20, 16〉 〈7, 15〉 〈8, 13〉 〈5, 12〉 〈9, 11〉 〈17, 10〉 〈3, 9〉 〈16, 8〉 〈10, 7〉 〈22, 6〉 〈15, 4〉 〈12, 3〉 〈1, 2〉 〈11, 1〉 〈19, 21〉 〈4, 18〉 〈14, 5〉

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 33 Applications Priority queue Randomized searching (treaps) Range and dominance searching RMQ (Range Maximum Query) LCA (Least Common Ancestor) Integer sorting Memory management Suffix trees...

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 34 From RMQ to LCA 2, 22, 9, 18, 12, 17, 15, 13, 11, 7, 1, 3, 5, 4, 8, 10, 19, 21, 16, 20, , 18, 12, 17, 15, 13, 11, 7,

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 35 From LCP array to suffix tree $ I$ IPPI$ ISSIPPI$ ISSISSIPPI$ MISSISSIPPI$ PI$ PPI$ SIPPI$ SISSIPPI$ SSIPPI$ SSISSIPPI$ $ IM...P S $P...SSI P...S... I$PI$ISI P...S... P...S

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 36 History Static setting O(n) construction time, provided elements already sorted Randomized Random priority values – treaps O(log n) expected height O(log n) expected update time Non-uniform probability distributions yield O(√n) or even O(n) height Dynamic and deterministic ???

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 37 Our result Dynamic Cartesian tree Supports insertion Supports weak deletion Maintains actual tree structure between each operation O(log n) amortized time per operation

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 38 Solution outline Combinatorial analysis How many tree elements change due to n insertions? Notion of entropy is exploited Auxiliary structure for accessing tree Needed to quickly access tree elements which need to change Based on the interval tree

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 39 Insertion 〈2, 22〉 〈21, 20〉 〈18, 19〉 〈6, 17〉 〈20, 16〉 〈7, 15〉 〈8, 13〉 〈5, 12〉 〈9, 11〉 〈17, 10〉 〈3, 9〉 〈16, 8〉 〈10, 7〉 〈22, 6〉 〈15, 4〉 〈12, 3〉 〈1, 2〉 〈11, 1〉 〈19, 21〉 〈4, 18〉 〈14, 5〉 〈13, 14〉

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 40 Insertion – worst case 〈1, 16〉 〈7, 4〉 〈8, 2〉 〈17, 15〉 〈16, 13〉 〈15, 11〉 〈14, 9〉 〈13, 7〉 〈12, 5〉 〈11, 3〉 〈10, 1〉 〈2, 14〉 〈3, 12〉 〈4, 10〉 〈5, 8〉 〈6, 6〉 〈9, 17〉

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 41 Analysis – main idea Inserting new elements does not require comparing y coordinates of existing points In turn, deleting points does Conclusion: insertions reduce tree information content... so information entropy can be used as a potential function in an amortized analysis

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 42 > > > > > > > > > > > Insertion revisited

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 43 Insertion reversed (deletion) ???????????

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 44 Formally... Tree T induces partial order ≺ T on nodes Defined by the heap condition Partial order ≺ T has ℒ(T) linear extensions Linear extensions are permutations satisfying the order, i.e. P[i] ≺ T P[j] ⇒ i < j We define missing entropy: ℋ(T)=log ℒ(T) Information needed to sort nodes given tree topology > > > > > > > > > A B C D E I G J FH A B H J F G I D E C H A J F G I D B E C A J D F H G I B E C D J I A H F G E B C H J D A F G I E B C A D H F G J I B E C...

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 45 Missing entropy Can be zero ℋ(T)=0 Or can be up to ℋ(T)=O(n log n) When an insertion affects k edges, ℋ(T) increases by at least Ω(k)

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 46 So what now? Amortized number of edge modifications is O(log n) per insertion into an initially empty tree Node modifications are always constant But how to access the edges to modify? Without increasing the complexity

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 47 Implementation overview Companion interval tree stores tree edges Edges in Cartesian tree are either disjoint or nested So the interval tree has additional properties Operations are tailored to the special case of the Cartesian tree

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 48 Insertion once again 1. Find parent 2. Edges affected 4. Shrink k 3a. Delete 2 3b. Insert 3 1. Find parent 2. Edges affected 3a. Delete 2 3b. Insert 3 4. Shrink k

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 49 Action implementations 1. Find parent Uses the interval tree as a search tree 2. Edges affected Special kind of stabbing query 3. Insert and delete Standard interval tree operations 4. Shrink Emulating using inserts and deletes would yield O(k∗log n) Amortized argument based on the fact that shrinking edge travels down O(log n) O(log n+k) O(1)∗O(log n) k∗O(1)

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 50 Summary Rank-sensitivity Rank-sensitive suffix trees + experimental results A general framework Dynamic Cartesian trees

21 September 2008 Iwona Bialynicka-Birula – PhD Thesis 51 Thank you! Questions?