Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm.

Slides:



Advertisements
Similar presentations
all-pairs shortest paths in undirected graphs
Advertisements

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
§6 Leftist Heaps CHAPTER 5 Graph Algorithms  Heap: Structure Property + Order Property Target : Speed up merging in O(N). Leftist Heap: Order Property.
Fast Algorithms For Hierarchical Range Histogram Constructions
Transform and Conquer Chapter 6. Transform and Conquer Solve problem by transforming into: a more convenient instance of the same problem (instance simplification)
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Maintaining Sliding Widow Skylines on Data Streams.
CSCE 3110 Data Structures & Algorithm Analysis
Leftist Heaps Text Read Weiss, §23.1 (Skew Heaps) Leftist Heap Definition of null path length Definition of leftist heap Building a Leftist Heap Sequence.
July 29HDMS'08 Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management.
Balanced Search Trees. 2-3 Trees Trees Red-Black Trees AVL Trees.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
More sorting algorithms: Heap sort & Radix sort. Heap Data Structure and Heap Sort (Chapter 7.6)
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Algorithm Design Techniques: Induction Chapter 5 (Except Section 5.6)
Dynamic Sets and Data Structures Over the course of an algorithm’s execution, an algorithm may maintain a dynamic set of objects The algorithm will perform.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Techniques and Data Structures for Efficient Multimedia Similarity Search.
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part II Prof. Dr. Th. Ottmann Summer Semester 2006.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
1 Chapter 8 Priority Queues. 2 Implementations Heaps Priority queues and heaps Vector based implementation of heaps Skew heaps Outline.
Advanced Algorithm Design and Analysis (Lecture 9) SW5 fall 2004 Simonas Šaltenis E1-215b
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Chapter 2 Graph Algorithms.
CSCE 3110 Data Structures & Algorithm Analysis Sorting (I) Reading: Chap.7, Weiss.
Chapter 3 Sec 3.3 With Question/Answer Animations 1.
Jessie Zhao Course page: 1.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
1 The MV3R-Tree: A Spatio- Temporal Access Method for Timestamp and Interval Queries Yufei Tao and Dimitris Papadias Hong Kong University of Science and.
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
CSED101 INTRODUCTION TO COMPUTING TREE 2 Hwanjo Yu.
How to Summarize the Universe: Dynamic Maintenance of Quantiles Gilbert, Kotidis, Muthukrishnan, Strauss Presented by Itay Malinger December 2003.
Segment Trees Basic data structure in computational geometry. Computational geometry.  Computations with geometric objects.  Points in 1-, 2-, 3-, d-space.
Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.
5/29/2008AI UEC in Japan Chapter 12 Clustering: Large Databases Written by Farial Shahnaz Presented by Zhao Xinyou Data Mining Technology.
Review for Exam 2 Topics covered (since exam 1): –Splay Tree –K-D Trees –RB Tree –Priority Queue and Binary Heap –B-Tree For each of these data structures.
CSC 413/513: Intro to Algorithms Solving Recurrences Continued The Master Theorem Introduction to heapsort.
Quick sort, lower bound on sorting, bucket sort, radix sort, comparison of algorithms, code, … Sorting: part 2.
Online Interval Skyline Queries on Time Series ICDE 2009.
Online Interval Skyline Queries on Time Series. I. Introduction.
Bin Jiang, Jian Pei ICDE 2009 Online Interval Skyline Queries on Time Series 1.
Lecture 9COMPSCI.220.FS.T Lower Bound for Sorting Complexity Each algorithm that sorts by comparing only pairs of elements must use at least 
Space Efficient and Output Sensitive Greedy Algorithms on Intervals Toshiki Saitoh (Kobe University) Joint work with ・ Takashi Horiyama (Saitama University)
David Luebke 1 2/5/2016 CS 332: Algorithms Introduction to heapsort.
Prof. Amr Goneid, AUC1 CSCE 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 8b. Sorting(2): (n log n) Algorithms.
Lecture 6 Sorting II Divide-and-Conquer Algorithms.
(c) University of Washington20c-1 CSC 143 Binary Search Trees.
Lec 13 Oct 17, 2011 AVL tree – height-balanced tree Other options:
Updating SF-Tree Speaker: Ho Wai Shing.
Database Management System
School of Computing Clemson University Fall, 2013
Topics covered (since exam 1):
Introduction to Algorithms
Heaps © 2010 Goodrich, Tamassia Heaps Heaps
Chapter 6 Transform and Conquer.
Topics covered (since exam 1):
Segment Trees Basic data structure in computational geometry.
Xu Zhou Kenli Li Yantao Zhou Keqin Li
Database Design and Programming
Dr.Surasak Mungsing CSE 221/ICT221 Analysis and Design of Algorithms Lecture 05-2: Analysis of time Complexity of Priority.
Leftist Heaps Text Leftist Heap Building a Leftist Heap
Topics covered (since exam 1):
Priority Queue and Heap
Efficient Processing of Top-k Spatial Preference Queries
Asst. Prof. Dr. İlker Kocabaş
Presentation transcript:

Bin Jiang, Jian Pei

 Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm  Radix Priority Search Tree  A View-Materialization Method ◦ Non-redundant skyline time series---NRSky[i:j]  Experiments

 Notions ◦ Time Series: A time series s consists of a set of ( value, timestamp) pairs.Here we denote the value of s at timestamp I by s[i], and s as a sequence of values s[1],s[2],… ◦ Time Interval: a range in time, denoted as [i : j]. We write if ; if. Some Notions in This Paper

 Interval Skyline ◦ Given a set S of time series and interval[i:j], the interval skyline is the set of time series that are not dominated by any other time series in [i:j], denoted by Suppose S={S 1, S 2, S 3 } S 1 and S 2 are in Sky[16:22], while S 3 is doninated by S 2. S2S1S3S2S1S3

 Interval Skyline Property 1:If there exist timestamps k 1, …,k l (i≤k 1 < … <k l ≤j) such that and s is the only such a time series, then time series is in.

 Problem Definition ◦ Given a set of time series S such that each time series is in the base interval,we want to maintain a data structure D such that any interval skyline queries in interval can be answered efficiently using D.  Methods ◦ An On-The-Fly Method  Original Interval Skyline Query Algorithm  Online Interval Skyline Query Algorithm ◦ A View-Materialization Method

 Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm  Radix Priority Search Tree  A View-Materialization Method ◦ Non-redundant skyline time series---NRSky[i:j]  Experiments

 Idea Using the maximum value and minimum value of the time series, we can determine the domination of some time series without checking the details.

 Algorithm 1. Set current Skyline Set Sky is null; 2. Sort the time series in a list L in the descending order of their maximum value; 3. Set the maximum value of the minimum value of the time series in Sky 4. For each time series s that satisfies in L, determine whether it can dominate or be dominated by time series in Sky; If it can not be dominated: 5. add it into Sky ; 6. delete its dominance in Sky ; 7. update ; 8. Return Sky;

 Example Goal: compute the skyline in interval [2:3] Steps: 1. s2->Sky, maxmin =1 2. s3->Sky, maxmin =2 3. s5->Sky, maxmin =4 4. s5->s1, s1 is discarded, maxmin =4 5. s4.min=3<4=maxmin, s4 is discarded. Return Sky={s2,s3,s5}

 Disadvantage Checking the max value for each time series and the min[i:j] for the query interval [i:j] is costly.  Improvement Idea Utilize Radix Priority Search Tree to maintain the min[i:j] Use a sketch to keep the max value for each time series

 Radix Priority Search Tree Radix Priority Search Tree is a two-dimensional data structure, a hybrid of a heap on one dimension and a binary search tree on the other dimension. Advantages: Insertion in O(h) Deletion in O(h) Query in O(h) h: the height of the tree

 Radix Priority Search Tree ◦ Build Use the timestamps as the binary tree dimension X and the data value as the heap dimension Y; Map W into a fixed domain of X, {0,1,...,w-1}; The height of the tree is O(logw) ◦ Update → One insertion s[ ] One deletion s[ ] : the most recent timestamp

 Sketches ◦ A pair (v,t) is maintained if no other pair (v1,t1) such that v1>v, t1>t; ◦ These pairs form the skyline of points in the interval; ◦ The expected number of points in the skyline is O(logw); ◦ With the sketches, finding the maximum value in W costs O(1) time ; W=[1,3] Sketches : (4,1),(3,2),(2,3) W=[1,4] Sketches : (5,4)

 Complexity ◦ Space  Radix priority search tree O(w)  Sketch of the max values O(logw) Total: O(nw) ◦ Time  Radix priority search tree O(logw)  Sketch of the max values O(logw) Total: O(nlogw)

 Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm  Radix Priority Search Tree  A View-Materialization Method ◦ Non-redundant skyline time series---NRSky[i:j]  Experiments

 Non-redundant interval skylines A time series s is called a non-redundant skyline time series in interval [i:j] if 1)S is in the skyline in interval[i:j] 2)S is not in the skyline in any subinterval[i׳:j׳] [i:j] It can be proved by pigeonhole principle, if there are more than w skyline intervals, at least two of them will share the same starting timestamps, then one of them is not a minimum skyline interval.

 Idea Suppose all non-redundant interval skylines are materialized, we can union all these skylines over all intervals in [i:j] and remove those fail Lemma 2.  Algorithm

 Example W= [2:4] Goal: compute the interval skyline in [3:4] Steps: 1. s3->Sky 2. s4->Sky 3. s1->Sky(s2 is dominated by s1) Return Sky={s1,s3,s4} How to maintain the non- redundant skylines ?

 Steps

 Step1 ◦ Use the on-the-fly algorithm to obtain the interval skyline in the new interval W ׳. ◦ Find possible false negatives.

 Step2-Shared Divide-and-Conquer Algorithm ◦ This algorithm is an extension of the divide-and conquer algorithm(DC). ◦ In SDC, a space is defined as a time interval. Each timestamp represents a dimension. ◦ The related spaces(intervals) are organized as a path, eg. [j:j],[j-1,j],...,[i,j](i<j).

P3P3 A B P5P5 P4P4 P1P1 P2P2 P3P3 A B P5P5 P4P4 P1P1 P2P2 mAmA S1S1 S2S2 P3P3 A B P5P5 P1P1 P2P2 mAmA mBmB S 12 S 22 S 11 S 21 Divide Step Merge Step

 Comparisons  Results

 Step3-Remove “redundant time series”

 Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm  Radix Priority Search Tree  A View-Materialization Method ◦ Non-redundant skyline time series---NRSky[i:j]  Experiments

 Parameters

 Synthetic Data Sets ◦ Data Sets Properties ◦ Query Efficiency

 Synthetic Data Sets ◦ Update Efficiency ◦ Space Cost

 Stock Data Sets ◦ Query Time