Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm.

Similar presentations


Presentation on theme: "Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm."— Presentation transcript:

1 Bin Jiang, Jian Pei

2  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm  Radix Priority Search Tree  A View-Materialization Method ◦ Non-redundant skyline time series---NRSky[i:j]  Experiments

3  Notions ◦ Time Series: A time series s consists of a set of ( value, timestamp) pairs.Here we denote the value of s at timestamp I by s[i], and s as a sequence of values s[1],s[2],… ◦ Time Interval: a range in time, denoted as [i : j]. We write if ; if. Some Notions in This Paper

4  Interval Skyline ◦ Given a set S of time series and interval[i:j], the interval skyline is the set of time series that are not dominated by any other time series in [i:j], denoted by Suppose S={S 1, S 2, S 3 } S 1 and S 2 are in Sky[16:22], while S 3 is doninated by S 2. S2S1S3S2S1S3

5  Interval Skyline Property 1:If there exist timestamps k 1, …,k l (i≤k 1 < … <k l ≤j) such that and s is the only such a time series, then time series is in.

6  Problem Definition ◦ Given a set of time series S such that each time series is in the base interval,we want to maintain a data structure D such that any interval skyline queries in interval can be answered efficiently using D.  Methods ◦ An On-The-Fly Method  Original Interval Skyline Query Algorithm  Online Interval Skyline Query Algorithm ◦ A View-Materialization Method

7  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm  Radix Priority Search Tree  A View-Materialization Method ◦ Non-redundant skyline time series---NRSky[i:j]  Experiments

8  Idea Using the maximum value and minimum value of the time series, we can determine the domination of some time series without checking the details.

9  Algorithm 1. Set current Skyline Set Sky is null; 2. Sort the time series in a list L in the descending order of their maximum value; 3. Set the maximum value of the minimum value of the time series in Sky 4. For each time series s that satisfies in L, determine whether it can dominate or be dominated by time series in Sky; If it can not be dominated: 5. add it into Sky ; 6. delete its dominance in Sky ; 7. update ; 8. Return Sky;

10  Example Goal: compute the skyline in interval [2:3] Steps: 1. s2->Sky, maxmin =1 2. s3->Sky, maxmin =2 3. s5->Sky, maxmin =4 4. s5->s1, s1 is discarded, maxmin =4 5. s4.min=3<4=maxmin, s4 is discarded. Return Sky={s2,s3,s5}

11  Disadvantage Checking the max value for each time series and the min[i:j] for the query interval [i:j] is costly.  Improvement Idea Utilize Radix Priority Search Tree to maintain the min[i:j] Use a sketch to keep the max value for each time series

12  Radix Priority Search Tree Radix Priority Search Tree is a two-dimensional data structure, a hybrid of a heap on one dimension and a binary search tree on the other dimension. Advantages: Insertion in O(h) Deletion in O(h) Query in O(h) h: the height of the tree

13  Radix Priority Search Tree ◦ Build Use the timestamps as the binary tree dimension X and the data value as the heap dimension Y; Map W into a fixed domain of X, {0,1,...,w-1}; The height of the tree is O(logw) ◦ Update → One insertion s[ ] One deletion s[ ] : the most recent timestamp

14  Sketches ◦ A pair (v,t) is maintained if no other pair (v1,t1) such that v1>v, t1>t; ◦ These pairs form the skyline of points in the interval; ◦ The expected number of points in the skyline is O(logw); ◦ With the sketches, finding the maximum value in W costs O(1) time ; W=[1,3] Sketches : (4,1),(3,2),(2,3) W=[1,4] Sketches : (5,4)

15  Complexity ◦ Space  Radix priority search tree O(w)  Sketch of the max values O(logw) Total: O(nw) ◦ Time  Radix priority search tree O(logw)  Sketch of the max values O(logw) Total: O(nlogw)

16  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm  Radix Priority Search Tree  A View-Materialization Method ◦ Non-redundant skyline time series---NRSky[i:j]  Experiments

17  Non-redundant interval skylines A time series s is called a non-redundant skyline time series in interval [i:j] if 1)S is in the skyline in interval[i:j] 2)S is not in the skyline in any subinterval[i׳:j׳] [i:j] It can be proved by pigeonhole principle, if there are more than w skyline intervals, at least two of them will share the same starting timestamps, then one of them is not a minimum skyline interval.

18

19  Idea Suppose all non-redundant interval skylines are materialized, we can union all these skylines over all intervals in [i:j] and remove those fail Lemma 2.  Algorithm

20  Example W= [2:4] Goal: compute the interval skyline in [3:4] Steps: 1. s3->Sky 2. s4->Sky 3. s1->Sky(s2 is dominated by s1) Return Sky={s1,s3,s4} How to maintain the non- redundant skylines ?

21  Steps

22  Step1 ◦ Use the on-the-fly algorithm to obtain the interval skyline in the new interval W ׳. ◦ Find possible false negatives.

23  Step2-Shared Divide-and-Conquer Algorithm ◦ This algorithm is an extension of the divide-and conquer algorithm(DC). ◦ In SDC, a space is defined as a time interval. Each timestamp represents a dimension. ◦ The related spaces(intervals) are organized as a path, eg. [j:j],[j-1,j],...,[i,j](i<j).

24 P3P3 A B P5P5 P4P4 P1P1 P2P2 P3P3 A B P5P5 P4P4 P1P1 P2P2 mAmA S1S1 S2S2 P3P3 A B P5P5 P1P1 P2P2 mAmA mBmB S 12 S 22 S 11 S 21 Divide Step Merge Step

25  Comparisons  Results

26  Step3-Remove “redundant time series”

27  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm  Radix Priority Search Tree  A View-Materialization Method ◦ Non-redundant skyline time series---NRSky[i:j]  Experiments

28  Parameters

29  Synthetic Data Sets ◦ Data Sets Properties ◦ Query Efficiency

30  Synthetic Data Sets ◦ Update Efficiency ◦ Space Cost

31  Stock Data Sets ◦ Query Time

32


Download ppt "Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm."

Similar presentations


Ads by Google