A Sampling-based Estimator for Top-k Selection Query Chung-Min ChenYibei Ling ICDE 2002 Presented by Kan Kin Fai.

Slides:



Advertisements
Similar presentations
Object Recognition Using Locality-Sensitive Hashing of Shape Contexts Andrea Frome, Jitendra Malik Presented by Ilias Apostolopoulos.
Advertisements

Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia.
Retrieval Evaluation J. H. Wang Mar. 18, Outline Chap. 3, Retrieval Evaluation –Retrieval Performance Evaluation –Reference Collections.
Lesson 2-4 Finding Maximums and Minimums of Polynomial Functions.
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Indexing and Range Queries in Spatio-Temporal Databases
A CTION R ECOGNITION FROM V IDEO U SING F EATURE C OVARIANCE M ATRICES Kai Guo, Prakash Ishwar, Senior Member, IEEE, and Janusz Konrad, Fellow, IEEE.
Introduction to Summary Statistics
Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering.
1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)
1 CS 361 Lecture 5 Approximate Quantiles and Histograms 9 Oct 2002 Gurmeet Singh Manku
Da Yan, Zhou Zhao and Wilfred Ng The Hong Kong University of Science and Technology.
Two Technique Papers on High Dimensionality Allan Rempel December 5, 2005.
Yung-Lin Huang, Yi-Nung Liu, and Shao-Yi Chien Media IC and System Lab Graduate Institute of Networking and Multimedia National Taiwan University Signal.
Deterministic Wavelet Thresholding for Maximum-Error Metrics Minos Garofalakis Bell Laboratories Lucent Technologies 600 Mountain Avenue Murray Hill, NJ.
Liang Jin (UC Irvine) Nick Koudas (AT&T) Chen Li (UC Irvine)
Localization from Mere Connectivity Yi Shang (University of Missouri - Columbia); Wheeler Ruml (Palo Alto Research Center ); Ying Zhang; Markus Fromherz.
Probabilistic Similarity Search for Uncertain Time Series Presented by CAO Chen 21 st Feb, 2011.
Indexing of Network Constrained Moving Objects Dieter Pfoser Christian S. Jensen Chia-Yu Chang.
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
1 Spatial Localization Light-Seminar Spring 2005.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Dimensionality Reduction
Exposure In Wireless Ad-Hoc Sensor Networks S. Megerian, F. Koushanfar, G. Qu, G. Veltri, M. Potkonjak ACM SIG MOBILE 2001 (Mobicom) Journal version: S.
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
On comparison of different approaches to the stability radius calculation Olga Karelkina Department of Mathematics University of Turku MCDM 2011.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
A Quantitative Analysis and Performance Study For Similar- Search Methods In High- Dimensional Space Presented By Umang Shah Koushik.
Reference-Based Indexing of Sequence Databases (VLDB ’ 06) Jayendra Venkateswaran Deepak Lachwani Tamer Kahveci Christopher Jermaine Presented by Angela.
3D polygonal meshes watermarking using normal vector distributions Suk-Hawn Lee, Tae-su Kim, Byung-Ju Kim, Seong-Geun Kwon.
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
A genetic approach to the automatic clustering problem Author : Lin Yu Tseng Shiueng Bien Yang Graduate : Chien-Ming Hsiao.
Histograms for Selectivity Estimation
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.
ICDE, San Jose, CA, 2002 Discovering Similar Multidimensional Trajectories Michail VlachosGeorge KolliosDimitrios Gunopulos UC RiversideBoston UniversityUC.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
3D Game Engine Design 1 3D Game Engine Design Ch D MAP LAB.
Exact indexing of Dynamic Time Warping
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
Clustering of Uncertain data objects by Voronoi- diagram-based approach Speaker: Chan Kai Fong, Paul Dept of CS, HKU.
University “Ss. Cyril and Methodus” SKOPJE Cluster-based MDS Algorithm for Nodes Localization in Wireless Sensor Networks Ass. Biljana Stojkoska.
CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases.
Data and Knowledge Engineering Laboratory Clustered Segment Indexing for Pattern Searching on the Secondary Structure of Protein Sequences Minkoo Seo Sanghyun.
Continual Neighborhood Tracking for Moving Objects Yoshiharu Ishikawa Hiroyuki Kitagawa Tooru Kawashima University of Tsukuba, Japan
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
Spatial Range Querying for Gaussian-Based Imprecise Query Objects Yoshiharu Ishikawa, Yuichi Iijima Nagoya University Jeffrey Xu Yu The Chinese University.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Optimization Problems Section 4-4. Example  What is the maximum area of a rectangle with a fixed perimeter of 880 cm? In this instance we want to optimize.
2.7 Mathematical Models. Optimization Problems 1)Solve the constraint for one of the variables 2)Substitute for the variable in the objective Function.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
Dense-Region Based Compact Data Cube
Where Are You? Children Adults.
Fast Subsequence Matching in Time-Series Databases.
Time Series Filtering Time Series
ISOMAP TRACKING WITH PARTICLE FILTERING
K Nearest Neighbor Classification
Enumerating Distances Using Spanners of Bounded Degree
Bin Fu Department of Computer Science
Adaptive Interpolation of Multidimensional Scaling
3.7 Optimization Problems
Perimeter.
Multidimensional Scaling
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Donghui Zhang, Tian Xia Northeastern University
Perimeter.
Presentation transcript:

A Sampling-based Estimator for Top-k Selection Query Chung-Min ChenYibei Ling ICDE 2002 Presented by Kan Kin Fai

Outline Introduction Histogram-based Method Sampling-based Method Experimental Results Conclusion

Introduction Given a distance function and a query point q, the top-k query is to find the top k points from the dataset that are closest to q. Example: searching an apartment by specifying a price and a location

Introduction Goal: find a good approximation of the top- k points quickly Approach: translate a top-k query into a range query Distance Functions: –Euclidean distance (L 2 -norm distance) –Summation distance (L 1 -norm distance) –Maximum distance (L  -norm distance)

Histogram-based Method To determine the range query for a top-k query with query point q using histograms Drawbacks –poor scalability of histograms with data dimensionality –non-trivial maintenance overhead of multidimensional histograms

Histogram-based Method Strategies: NoRestart, Start, Inter1 and Inter2

Sampling-based Method Main idea –take a random sample S of size s from the dataset D of size n. (sampling rate r = s / n) –given a query point q, compute the distances between q and all the points in S; sort the sample points in ascending order of the computed distance. –take the first l points from the sorted sequence where l = k · r and determine the range query from them.

Sampling-based Method Determining the range query –the Minimum Bounding Rectangle (MBR) –Sym: set the side length on the i’th dimension to 2δ i, where δ i = max(|q i - x i | | for all (x 1,…,x m )  the l points). –Squ: set the side length on the i’th dimension to 2δ, where δ= max(δ i ) for 1  i  m. –the Minimum Bounding Square on Shape (MBSS)

Sampling-based Method

–Para use L  to sort the sampling points regardless of the distance function take l = c  r  k + 1 points from the sorted sequence; c is the magnification factor (MF) set the range query to be the smallest square centered at q that encloses the l points. Pros: give accurate result size

Sampling-based Method Let Q(D) be the result of the range query Q and top(D,q,k) be the set containing the k closet points to q.

Sampling-based Method Deciding the magnification factor c for a given recall –fixing k, plot a graph with recall vs. MF –use linear interpolation to compute the needed magnification factor c from the graph

Experimental Results

Conclusions This paper presents a sampling-based method to process approximate top-k queries. Experimental results show that –the proposed method outperforms the histogram-based method; –the mapping scheme scales well for high-dimensional data. Easy to implement and maintain!