Danzhou Liu Ee-Peng Lim Wee-Keong Ng

Slides:



Advertisements
Similar presentations
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Advertisements

Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks Jie BaoChi-Yin ChowMohamed F. Mokbel Department of Computer Science and Engineering.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Indexing and Range Queries in Spatio-Temporal Databases
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)
Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:
Spatial Mining.
A Crowd-Enabled Approach for Efficient Processing of Nearest Neighbor Queries in Incomplete Databases Samia Kabir, Mehnaz Tabassum Mahin Department of.
Indexing Network Voronoi Diagrams*
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
Spatio-temporal Databases Time Parameterized Queries.
Liang Jin (UC Irvine) Nick Koudas (AT&T) Chen Li (UC Irvine)
An Efficient and Scalable Approach to CNN Queries in a Road Network Hyung-Ju Cho and Chin-Wan Chung Dept. of EECS, KAIST VLDB 2005.
Optimization of Spatial Joins on Mobile Devices N. Mamoulis 1, P. Kalnis 2, S. Bakiras 3, X. Li 2 1 Department of Computer Science and Information Systems,
1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.
Indexing of Network Constrained Moving Objects Dieter Pfoser Christian S. Jensen Chia-Yu Chang.
An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.
Presented by Zeehasham Rasheed
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
Ad-hoc Distributed Spatial Joins on Mobile Devices Panos Kalnis, Xiaochen Li National University of Singapore Nikos Mamoulis The University of Hong Kong.
Efficient Processing of k Nearest Neighbor Joins using MapReduce.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Efficient Metric Index For Similarity Search Lu Chen, Yunjun Gao, Xinhan Li, Christian S. Jensen, Gang Chen.
Stratified K-means Clustering Over A Deep Web Data Source Tantan Liu, Gagan Agrawal Dept. of Computer Science & Engineering Ohio State University Aug.
MySQL spatial indexing for GIS data in a web 2.0 internet application Brian Toone Samford University
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Reporter : Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.
Efficient Processing of Top-k Spatial Preference Queries
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.
Group 8: Denial Hess, Yun Zhang Project presentation.
On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:
Exact indexing of Dynamic Time Warping
Clustering of Uncertain data objects by Voronoi- diagram-based approach Speaker: Chan Kai Fong, Paul Dept of CS, HKU.
Multi-object Similarity Query Evaluation Michal Batko.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
A New Spatial Index Structure for Efficient Query Processing in Location Based Services Speaker: Yihao Jhang Adviser: Yuling Hsueh 2010 IEEE International.
Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,
Continual Neighborhood Tracking for Moving Objects Yoshiharu Ishikawa Hiroyuki Kitagawa Tooru Kawashima University of Tsukuba, Japan
Tomáš Skopal 1, Benjamin Bustos 2 1 Charles University in Prague, Czech Republic 2 University of Chile, Santiago, Chile On Index-free Similarity Search.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
23 1 Christian Böhm 1, Florian Krebs 2, and Hans-Peter Kriegel 2 1 University for Health Informatics and Technology, Innsbruck 2 University of Munich Optimal.
Cost Modeling of Spatial Query Operators Using Nonparametric Regression Songtao Jiang Department of Computer Science University of Vermont October 10,
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Rethinking Choices for Multi-dimensional Point Indexing You Jung Kim and Jignesh M. Patel University of Michigan.
Jeremy Iverson & Zhang Yun 1.  Chapter 6 Key Concepts ◦ Structures and access methods ◦ R-Tree  R*-Tree  Mobile Object Indexing  Questions 2.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Spatial Approximate String Search. Abstract This work deals with the approximate string search in large spatial databases. Specifically, we investigate.
Data Transformation: Normalization
Query Processing in Databases Dr. M. Gavrilova
SpatialHadoop: A MapReduce Framework for Spatial Data
Sameh Shohdy, Yu Su, and Gagan Agrawal
K Nearest Neighbor Classification
Efficient Evaluation of k-NN Queries Using Spatial Mashups
Finding Fastest Paths on A Road Network with Speed Patterns
Probabilistic Data Management
Distributed Probabilistic Range-Aggregate Query on Uncertain Data
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

Danzhou Liu Ee-Peng Lim Wee-Keong Ng Efficient k Nearest Neighbor Queries on Remote Spatial Databases Using Range Estimation Danzhou Liu Ee-Peng Lim Wee-Keong Ng Center for Advanced Information Systems, School of Computer Engineering Nanyang Technological University, Nanyang Ave, Singapore 639798, Singapore

Outline Introduction Related work k-NN query algorithm based on range estimation Range estimation methods Experiments Conclusions SSDBM2002

Introduction Spatial database provides persistent storage for spatial objects (e.g., points, polylines, polygons) Spatial database supports Representation of spatial attributes Storage/indexing of spatial data values using some spatial indices (e.g., R-tree and Quadtree) Queries involving spatial attributes SSDBM2002

k-Nearest Neighbor Queries Definition k-Nearest Neighbor (k-NN) query: locating k spatial objects nearest to a given query point Wide range of applications: Geographic Information Systems (GIS), e.g., finding the nearest two hospitals Computer Aided Design (CAD), e.g, finding the nearest three resistors in a circuit board SSDBM2002

Motivation Large volume of spatial data on WWW Geospatial Data Clearinghouse (a collection of over 250 spatial database servers) Yahoo, Tiger and other map services Limited Web-based query interfaces Support simple spatial queries (e.g., window queries) No support for remote index access SSDBM2002

The Geospatial Data Clearinghouse Large amount of useful geospatial information on WWW SSDBM2002

The Geospatial Data Clearinghouse Limited Web-based query interface; supports only window queries SSDBM2002

Objective Develop efficient algorithms to evaluate k-NN queries on remote spatial databases using window queries: Propose a generic k-NN query processing algorithm that accommodates different range estimation methods Develop efficient range estimation methods Conduct experiments to evaluate performance of proposed range estimation methods Develop sampling methods to obtain statistical knowledge of remote databases needed for range estimation methods SSDBM2002

Related Work Algorithms for simple k-NN queries may be divided into three major groups: Partition-based algorithms Graph-based algorithms Range-based algorithms SSDBM2002

Partition-based Algorithms Retrieve k nearest neighbors from spatial indices by pruning away nodes that cannot lead to k nearest neighbors Examples Branch-and-bound R-tree traversal algorithm Pipelined fashion algorithm Not applicable to Web environment Spatial indices are usually not available to non-local applications Creating local indices is infeasible due to large amount of data SSDBM2002

Graph-based Algorithms Pre-compute nearest neighbors of spatial objects; create new index structures for pre-computed nearest neighbor information to support search Example Voronoi-based algorithm Not applicable to Web environment Retrieving all spatial objects on remote database servers is sometimes impractical Creating local indices is infeasible due to large amount of data SSDBM2002

Range-based Algorithms Use range queries to retrieve k nearest neighbors Examples Use sampling for range estimation Use distance distributions for range estimation Use reference points for range estimation Not applicable to Web environment Determining sample size and selecting samples of spatial objects properly are still a challenge Creating local indices is infeasible due to large amount of data SSDBM2002

Proposed k-NN Algorithm Based on range estimation New strategies for k-NN query evaluation in Web environment are required Use window queries for probing spatial database SSDBM2002

Density-based Range Estimation Method Based on uniform spatial object distribution assumption Range estimated by EstiRange1 function is Ranges estimated by EstiRange2 function are SSDBM2002

Bucket-based Range Estimation Method Use summary information about partitions or buckets of spatial objects for range estimation Summary information Bucket MBB, number of spatial objects in bucket Buckets are created using different strategies [1] Sort the set of max distance between buckets and query point Range estimated is the minimal bucket-query point max distance that contains at least k nearest neighbor objects Use one window query SSDBM2002

Example: k = 5 SSDBM2002

Experiments New Jersey road dataset from TIGER [30] SSDBM2002

Performance measures: Number of iterations h A SSDBM2002

Experimental Results Minimum, maximum and upper bounds on the number of iterations of the density-based range estimation method SSDBM2002

Iteration and accuracy of the density-based range estimation method SSDBM2002

Experimental Results Efficiency of density-based and bucket-based range estimation methods SSDBM2002

Conclusions A window query approach to evaluate k-NN queries on remote spatial databases motivated by Large amount of spatial information on the Web Limited query interface Proposed range estimation methods Performances increase with k. No a clear winner SSDBM2002

SSDBM2002

Types of Range Estimation Methods Tight estimation methods Estimated range is not large enough; i.e., both EstiRange1 and EstiRange2 functions may be invoked e.g., density-based method Loose estimation methods Estimated range is large enough; i.e., only the EstiRange1 function is invoked e.g., bucket-based method SSDBM2002

Future Work Extending range estimation methods with sampling techniques to determine data distribution Current range estimation methods depend on statistical knowledge provided by database owners Investigate how the statistical knowledge can be approximated through sampling Developing strategies to select the appropriate range estimation methods for evaluating k-NN queries. Developing Web applications of k-NN queries. SSDBM2002

Four Strategies to Create Buckets Equi-Count, Equi-Area, Min-Skew, and Min-Overlap partitioning strategies [1] Charminar Dataset Spatial Densities in Charminar Equi-Area Partitioning Equi-Count Partitioning Min-Skew Partitioning Min-Overlap Partitioning SSDBM2002