Probabilistic Data Management

Slides:

Advertisements

Similar presentations

Indexing DNA Sequences Using q-Grams

Advertisements

Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.

Spatio-temporal Databases

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.

1 Chapter 5 : Query Processing and Optimization Group 4: Nipun Garg, Surabhi Mithal

Danzhou Liu Ee-Peng Lim Wee-Keong Ng

School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.

Thomas Bernecker, Tobias Emrich, Hans-Peter Kriegel,

Efﬁcient Reverse k-Nearest Neighbors Retrieval with Local kNN-Distance Estimation Mike Lin.

1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.

Indexing the imprecise positions of moving objects Xiaofeng Ding and Yansheng Lu Department of Computer Science Huazhong University of Science & Technology.

Effectively Indexing Uncertain Moving Objects for Predictive Queries School of Computing National University of Singapore Department of Computer Science.

Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:

A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.

Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.

Spatio-temporal Databases Time Parameterized Queries.

The Fourth WIM Meeting 1 Active Nearest Neighbor Queries for Moving Objects Jan Kolar, Igor Timko.

Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.

Spatial Queries Nearest Neighbor Queries.

Efficient Join Processing over Uncertain Data - By Reynold Cheng, et all. Presented By Lydia & Usha.

Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.

Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.

Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.

Efficient Case Retrieval Sources: –Chapter 7 – –

Wei Cheng 1, Xiaoming Jin 1, and Jian-Tao Sun 2 Intelligent Data Engineering Group, School of Software, Tsinghua University 1 Microsoft Research Asia 2.

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.

Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),

Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.

Influence Zone: Efficiently Processing Reverse k Nearest Neighbors Queries Presented By: Muhammad Aamir Cheema Joint work with Xuemin Lin, Wenjie Zhang,

Efficient Processing of Top-k Spatial Preference Queries

Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.

Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.

On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:

Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.

Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.

Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.

Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.

Spatial Range Querying for Gaussian-Based Imprecise Query Objects Yoshiharu Ishikawa, Yuichi Iijima Nagoya University Jeffrey Xu Yu The Chinese University.

1 Reverse Nearest Neighbor Queries for Dynamic Databases SHOU Yu Tao Jan. 10 th, 2003 SIGMOD 2000.

Graph Indexing From managing and mining graph data.

Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.

1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.

Probabilistic Skylines on Uncertain Data (VLDB2007) Jian Pei et al Supervisor: Dr Benjamin Kao Presenter: For Date: 22 Feb 2008 ??: the possible world.

Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.

Fast Subsequence Matching in Time-Series Databases.

Spatial Data Management

SIMILARITY SEARCH The Metric Space Approach

Database Management System

Influence sets based on Reverse Nearest Neighbor Queries

A paper on Join Synopses for Approximate Query Answering

Probabilistic Data Management

Probabilistic Data Management

Chapter 12: Query Processing

Sameh Shohdy, Yu Su, and Gagan Agrawal

Nearest Neighbor Queries using R-trees

Spatio-temporal Pattern Queries

Probabilistic Data Management

Chapter 4: Probabilistic Query Answering (2)

Probabilistic Data Management

Probabilistic Data Management

Efficient Evaluation of k-NN Queries Using Spatial Mashups

Probabilistic Data Management

Distributed Probabilistic Range-Aggregate Query on Uncertain Data

Xu Zhou Kenli Li Yantao Zhou Keqin Li

Range Queries on Uncertain Data

Uncertain Data Mobile Group 报告人：郝兴.

Efficient Processing of Top-k Spatial Preference Queries

Presentation transcript:

Probabilistic Data Management Chapter 5: Probabilistic Query Answering (3)

Objectives In this chapter, you will: Learn the definition and query processing techniques of a probabilistic query type Probabilistic Reverse Nearest Neighbor Query

Recall: Probabilistic Query Types Probabilistic Spatial Query Uncertain/probabilistic database Probabilistic range query Probabilistic k-nearest neighbor query Probabilistic group nearest neighbor (PGNN) query Probabilistic reverse k-nearest neighbor query Probabilistic spatial join /similarity join Probabilistic top-k query (or ranked query) Probabilistic skyline query Probabilistic reverse skyline query Probabilistic Preference Query 3 3

Probabilistic Reverse Nearest Neighbor Queries in Uncertain Databases Very Large Data Bases Journal (VLDBJ), 2009

Outline Introduction Related Work Problem Definition PRNN Query Processing Experimental Evaluation Summary

Reverse Nearest Neighbor Query (RNN) Rescue tasks in oceans In the case of emergency, a ship will ask its nearest ship for help A rescue ship needs to monitor those ships that have itself as their nearest neighbors In other words, the rescue ship needs to obtain its reverse nearest neighbors (RNNs)

Introduction Reverse Nearest Neighbor Query (RNN) Given a database D and a query object q, a RNN query retrieves those data objects o D that have q as nearest neighbor q o5 o4 o2 o1 o3

RNN Processing on Certain Data Points TPL Approach [VLDB'04] q RNN candidate o5 o4 o2 o1 o3 pruning region 8

RNN Processing on Certain Data Points TPL Approach [VLDB'04] RNN candidate q RNN candidate o5 o4 o2 o1 o3 pruning region 9

Probabilistic Reverse Nearest Neighbor Query (PRNN) Due to the accuracy of positioning devices (e.g. GPS) or their movement, the reported positions of ships are imprecise Therefore, it is important to answer RNN queries over uncertain data effectively and efficiently

Other Application of PRNN Mixed-reality game Each player tend to shoot his/her nearest neighbor A query player needs to monitor those players (RNNs) who have himself/herself as their nearest neighbors Due to movement of players, positions of players can be imprecise and uncertain, and RNN is conducted on uncertain objects

RNN Queries in Uncertain Databases

PRNN Definition Probabilistic Reverse Nearest Neighbor (PRNN) Queries an uncertain database D a query object q, and a probabilistic threshold   (0, 1] To retrieve uncertain objects o D that are RNNs of q with probabilities PPRNN(q, o) greater than or equal to, that is, where r1 and r2 are min and max distances from q to o, respectively

A Straightforward Method For every uncertain object o in the database Sequentially scan all the objects in the database Calculate the PRNN probability, PPRNN (q, o), that o is an RNN of q If PPRNN (q, o) is greater than or equal to probabilistic threshold a, then o is the answer; otherwise, o is discarded Analysis Complexity: O(N2), where N is the database size The computation of probability PPRNN (q, o) is very costly

Pruning Techniques Geometric Pruning (GP) GP0 method The object distribution in the uncertainty region can be either known or unknown Prune those data objects that definitely cannot be RNN of q GPb method (b  (0, 1]) The object distribution in uncertainty region is known and the pre-computation is allowed Prune those objects with the PRNN probability smaller than b

Heuristics of GP0 Method Data objects always reside within uncertainty regions conservative pruning region (CPR)

Heuristics of GP0 Method (cont.) no false dismissals are introduced with hypersphere approximation candidate o

Conditions of GP0 Method Pruning Conditions dist(P, q) - dist(P, Co) > ro mindist(P, D)  rp In other words, if object p is fully contained in the pruning region CPR'(q, o), then p can be safely pruned

Heuristics of GPb Method (b  (0, 1]) GPb prunes those objects with the PRNN probability smaller than b (< a) p can be pruned by GPb candidate o

Refinement Phase After applying geometric pruning methods, we can obtain a candidate set For each candidate o, we retrieve those uncertain objects p' intersecting with PR and compute the probability that o is an RNN of q

PRNN Query Processing Maintain a multidimensional index structure over uncertain database // indexing phase For each PRNN query Apply geometric pruning methods during the index traversal // pruning phase Refine candidates and return the answer set // refinement phase

PRNN Query Processing Index uncertain data with an R-tree

PRNN Query Procedure Traverse the R-tree index by maintaining a minimum heap (with key the minimum distance from query point to node) For each node/object Ni we encounter Check whether or not Ni can be pruned by GP methods If the answer is no, then we either further check the children of node Ni, or add it to a PRNN candidate set Scand in case Ni is an object After the index traversal, we refine candidates in Scand by calculating their actual PRNN probabilities

PRNN Query Processing (cont'd)

Experimental Evaluation Experimental Settings Real data sets: LB, MG, TCB, and CAR Synthetic data sets: Generate center location Co of uncertain object o in a data space [0, 1,000]d Produce radius ro  [rmin, rmax] for uncertainty region UR(o) Four types of data sets: lUrU, lUrG, lSrU, and lSrG Competitors: Linear scan (worse than ours by 5-9 orders of magnitude) Naïve pruning (pruning condition: given a PRNN candidate o, a node/object e can be pruned if maxdist(o, e) < mindist(q, e))

Performance vs. b data size N = 100K, dimensionality d = 3, radius range [rmin, rmax] = [0, 5], and probabilistic threshold a = 1

Summary We formulate the problem of probabilistic queries over uncertain databases We propose effective pruning methods to reduce the search space of probabilistic queries We integrate pruning methods into an efficient query procedure We verify the efficiency of our proposed approaches through extensive experiments