Efficient Progressive Processing of Skyline Queries in Peer-to-Peer Systems INFOSCALE’06.

Slides:



Advertisements
Similar presentations
Semantic Small World (SSW) An Overlay Network and Index Structure for Semantic based P2P Search Presented by : Raj Kumar Rasam.
Advertisements

Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order 1.
Finding Skyline Nodes in Large Networks. Evaluation Metrics:  Distance from the query node. (John)  Coverage of the Query Topics. (Big Data, Cloud Computing,
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Searching on Multi-Dimensional Data
Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Attribute-based Indexing Overlay Apr Outline Introduction Basic Idea Advantage Challenge Conclusion.
Thomas ZahnCST1 Seminar: Information Management in the Web Query Processing Over Peer- to-Peer Data Sharing Systems (UC Santa Barbara)
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Probabilistic Similarity Search for Uncertain Time Series Presented by CAO Chen 21 st Feb, 2011.
Continuous Data Stream Processing MAKE Lab Date: 2006/03/07 Post-Excellence Project Subproject 6.
Cluster Analysis (1).
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
What is Cluster Analysis?
San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized.
Branch and Bound Algorithm for Solving Integer Linear Programming
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
CS Instance Based Learning1 Instance Based Learning.
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Short Text Understanding Through Lexical-Semantic Analysis
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
Document retrieval Similarity –Vector space model –Multi dimension Search –Range query –KNN query Query processing example.
Clustering Spatial Data Using Random Walks Author : David Harel Yehuda Koren Graduate : Chien-Ming Hsiao.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
PMLAB Finding Similar Image Quickly Using Object Shapes Heng Tao Shen Dept. of Computer Science National University of Singapore Presented by Chin-Yi Tsai.
Approximate XML Joins Huang-Chun Yu Li Xu. Introduction XML is widely used to integrate data from different sources. Perform join operation for XML documents:
Bin Cui, Hua Lu, Quanqing Xu, Lijiang Chen, Yafei Dai, Yongluan Zhou ICDE 08 Parallel Distributed Processing of Constrained Skyline Queries by Filtering.
Optimal Base Station Selection for Anycast Routing in Wireless Sensor Networks 指導教授 : 黃培壝 & 黃鈴玲 學生 : 李京釜.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
On Reducing Broadcast Redundancy in Wireless Ad Hoc Network Author: Wei Lou, Student Member, IEEE, and Jie Wu, Senior Member, IEEE From IEEE transactions.
PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
Efficient Processing of Top-k Spatial Preference Queries
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Challenges in Mining Large Image Datasets Jelena Tešić, B.S. Manjunath University of California, Santa Barbara
A NOVEL SOCIAL CLUSTER-BASED P2P FRAMEWORK FOR INTEGRATING VANETS WITH THE INTERNET Chien-Chun Hung CMLab, CSIE, NTU, Taiwan.
Information Retrieval and Organisation Chapter 16 Flat Clustering Dell Zhang Birkbeck, University of London.
A New Spatial Index Structure for Efficient Query Processing in Location Based Services Speaker: Yihao Jhang Adviser: Yuling Hsueh 2010 IEEE International.
Universitatea Politehnica Bucureşti - Facultatea de Automatică şi Calculatoare TOWARDS A SECURE DATA SHARING PEER-TO-PEER NETWORK BASED ON GEOMETRIC AND.
The σ-neighborhood skyline queries Chen, Yi-Chung; LEE, Chiang. The σ-neighborhood skyline queries. Information Sciences, 2015, 322: 張天彥 2015/12/05.
Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
An Energy-Efficient Geographic Routing with Location Errors in Wireless Sensor Networks Julien Champ and Clement Saad I-SPAN 2008, Sydney (The international.
New Algorithms for Efficient High-Dimensional Nonparametric Classification Ting Liu, Andrew W. Moore, and Alexander Gray.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
1 Along & across algorithm for routing events and queries in wireless sensor networks Tat Wing Chim Department of Electrical and Electronic Engineering.
SimRank: A Measure of Structural-Context Similarity Glen Jeh and Jennifer Widom Stanford University ACM SIGKDD 2002 January 19, 2011 Taikyoung Kim SNU.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University.
Probabilistic Data Management
Spatio-temporal Pattern Queries
Instance Based Learning (Adapted from various sources)
K Nearest Neighbor Classification
Nearest-Neighbor Classifiers
Probabilistic Data Management
A Restaurant Recommendation System Based on Range and Skyline Queries
Xu Zhou Kenli Li Yantao Zhou Keqin Li
Skyline query with R*-Tree: Branch and Bound Skyline (BBS) Algorithm
Nearest Neighbors CSC 576: Data Mining.
Statistical Models and Machine Learning Algorithms --Review
The Skyline Query in Databases Which Objects are the Most Important?
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

Efficient Progressive Processing of Skyline Queries in Peer-to-Peer Systems INFOSCALE’06

Outline Introduction Algo Evaluation Conclusion

Introduction Finding a hotel with nearest distance to a beach and a lower price Distance Price

Semantic Small Word Peer Choose the centroid of its largest data cluster as its semantic label Each node in the network knows its local neighbors, called short range contacts. Each node knows a small number of randomly chosen nodes,called long range contacts Peer is responsible for management of data objects and the location information of data objects stored at other peers referred as foreign indexes

Cont. SSW Overlay Structure Foreign Indexes short long

Problem Definition For a 4-Dimension SSW {a0,a1,a2,a3} A Skyline Query={a0:min, a2:max} Q is only related to attribute dimension a0 and a2 only.

Algo. Exact Algo. Step: Locate the Origin Cluster Find the boundary value in the skyline query(v bound ) Inter-Cluster Pruning Forwarded to peers in neighboring cluster as long as the cluster is not dominated by v bound. Intra-Cluster Pruning prune irrelevant peers Skyline Computing

Exact Algo.

Approximate Algo.(Single-Path) In cases where a semantic overlay network does not exist. Receiving an incoming skyline query Q, the initial peer must decide the next candidate peer to which the skyline query is forwarded from its knowledge of contact Semantic Distance : attribute of candidate peer : attribute of current peer

Single-Path PriceDistance to Beach Current peer10166

Discussion and Improvement Consider A,B in the candidate list A is cheaper. B is near to the beach. Case: Choose A If B contains many hotel records that are near to the beach Therefore, an import portion of a good skyline is neglected

Multi-path Semantic Distance The score function return a j-tuple set instead of a single result

Cont. PriceDistance to Beach Current peer10166 Peer IDPriceDistance to BeachScore A ,-1 B73 -28,7 C ,-7 D ,-1 E103702,4 F ,18 G ,2 F,C will be selected.

Evaluation Result Quality Return the area between an approximate skyline with a complex exact one that takes all the data objects in the network in to consideration.

Cont.

Conclusion