Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Sampling-based Estimator for Top-k Selection Query Chung-Min ChenYibei Ling ICDE 2002 Presented by Kan Kin Fai.

Similar presentations


Presentation on theme: "A Sampling-based Estimator for Top-k Selection Query Chung-Min ChenYibei Ling ICDE 2002 Presented by Kan Kin Fai."— Presentation transcript:

1 A Sampling-based Estimator for Top-k Selection Query Chung-Min ChenYibei Ling ICDE 2002 Presented by Kan Kin Fai

2 Outline Introduction Histogram-based Method Sampling-based Method Experimental Results Conclusion

3 Introduction Given a distance function and a query point q, the top-k query is to find the top k points from the dataset that are closest to q. Example: searching an apartment by specifying a price and a location

4 Introduction Goal: find a good approximation of the top- k points quickly Approach: translate a top-k query into a range query Distance Functions: –Euclidean distance (L 2 -norm distance) –Summation distance (L 1 -norm distance) –Maximum distance (L  -norm distance)

5 Histogram-based Method To determine the range query for a top-k query with query point q using histograms Drawbacks –poor scalability of histograms with data dimensionality –non-trivial maintenance overhead of multidimensional histograms

6 Histogram-based Method Strategies: NoRestart, Start, Inter1 and Inter2

7 Sampling-based Method Main idea –take a random sample S of size s from the dataset D of size n. (sampling rate r = s / n) –given a query point q, compute the distances between q and all the points in S; sort the sample points in ascending order of the computed distance. –take the first l points from the sorted sequence where l = k · r and determine the range query from them.

8 Sampling-based Method Determining the range query –the Minimum Bounding Rectangle (MBR) –Sym: set the side length on the i’th dimension to 2δ i, where δ i = max(|q i - x i | | for all (x 1,…,x m )  the l points). –Squ: set the side length on the i’th dimension to 2δ, where δ= max(δ i ) for 1  i  m. –the Minimum Bounding Square on Shape (MBSS)

9 Sampling-based Method

10 –Para use L  to sort the sampling points regardless of the distance function take l = c  r  k + 1 points from the sorted sequence; c is the magnification factor (MF) set the range query to be the smallest square centered at q that encloses the l points. Pros: give accurate result size

11 Sampling-based Method Let Q(D) be the result of the range query Q and top(D,q,k) be the set containing the k closet points to q.

12 Sampling-based Method Deciding the magnification factor c for a given recall –fixing k, plot a graph with recall vs. MF –use linear interpolation to compute the needed magnification factor c from the graph

13 Experimental Results

14

15

16

17 Conclusions This paper presents a sampling-based method to process approximate top-k queries. Experimental results show that –the proposed method outperforms the histogram-based method; –the mapping scheme scales well for high-dimensional data. Easy to implement and maintain!


Download ppt "A Sampling-based Estimator for Top-k Selection Query Chung-Min ChenYibei Ling ICDE 2002 Presented by Kan Kin Fai."

Similar presentations


Ads by Google