Xu Zhou Kenli Li Yantao Zhou Keqin Li

Slides:



Advertisements
Similar presentations
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Advertisements

指導教授:陳良弼 老師 報告者:鄧雅文  Introduction  Related Work  Problem Formulation  Future Work.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Indexing and Range Queries in Spatio-Temporal Databases
Maintaining Sliding Widow Skylines on Data Streams.
Similarity Search on Bregman Divergence, Towards Non- Metric Indexing Zhenjie Zhang, Beng Chi Ooi, Srinivasan Parthasarathy, Anthony K. H. Tung.
EE 553 Integer Programming
SPARK: Top-k Keyword Query in Relational Databases Yi Luo, Xuemin Lin, Wei Wang, Xiaofang Zhou Univ. of New South Wales, Univ. of Queensland SIGMOD 2007.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.
Using Trees to Depict a Forest Bin Liu, H. V. Jagadish EECS, University of Michigan, Ann Arbor Presented by Sergey Shepshelvich 1.
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
Deployment Strategies for Differentiated Detection in Wireless Sensor Network Jingbin Zhang, Ting Yan, and Sang H. Son University of Virginia From SECON.
Cache Placement in Sensor Networks Under Update Cost Constraint Bin Tang, Samir Das and Himanshu Gupta Department of Computer Science Stony Brook University.
Spatial Queries Nearest Neighbor Queries.
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks Maurice Chu, Horst Haussecker and Feng Zhao Xerox Palo.
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
1 Tree Searching Strategies. 2 The procedure of solving many problems may be represented by trees. Therefore the solving of these problems becomes a tree.
Presented by: Duong, Huu Kinh Luan March 14 th, 2011.
Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm.
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.
AAU A Trajectory Splitting Model for Efficient Spatio-Temporal Indexing Presented by YuQing Zhang  Slobodan Rasetic Jorg Sander James Elding Mario A.
Efficient Gathering of Correlated Data in Sensor Networks
Da Yan and Wilfred Ng The Hong Kong University of Science and Technology.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
The Application of The Improved Hybrid Ant Colony Algorithm in Vehicle Routing Optimization Problem International Conference on Future Computer and Communication,
Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.
Efficient Processing of Top-k Spatial Preference Queries
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:
1 Online Computation and Continuous Maintaining of Quantile Summaries Tian Xia Database CCIS Northeastern University April 16, 2004.
Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.
Energy-Efficient Data Caching and Prefetching for Mobile Devices Based on Utility Huaping Shen, Mohan Kumar, Sajal K. Das, and Zhijun Wang P 邱仁傑.
On Reducing Mesh Delay for Peer- to-Peer Live Streaming Dongni Ren, Y.-T. Hillman Li, S.-H. Gary Chan Department of Computer Science and Engineering The.
1 30 November 2006 An Efficient Nearest Neighbor (NN) Algorithm for Peer-to-Peer (P2P) Settings Ahmed Sabbir Arif Graduate Student, York University.
Online Interval Skyline Queries on Time Series ICDE 2009.
Efficient Resource Allocation for Wireless Multicast De-Nian Yang, Member, IEEE Ming-Syan Chen, Fellow, IEEE IEEE Transactions on Mobile Computing, April.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Chance Constrained Robust Energy Efficiency in Cognitive Radio Networks with Channel Uncertainty Yongjun Xu and Xiaohui Zhao College of Communication Engineering,
Toward Reliable and Efficient Reporting in Wireless Sensor Networks Authors: Fatma Bouabdallah Nizar Bouabdallah Raouf Boutaba.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
1 VLDB, Background What is important for the user.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Probabilistic Skylines on Uncertain Data (VLDB2007) Jian Pei et al Supervisor: Dr Benjamin Kao Presenter: For Date: 22 Feb 2008 ??: the possible world.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
SIMILARITY SEARCH The Metric Space Approach
Data Structure Interview Question and Answers
Database Management System
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
A paper on Join Synopses for Approximate Query Answering
Stochastic Skyline Operator
Chapter 12: Query Processing
Chapter 15 QUERY EXECUTION.
On Spatial Joins in MapReduce
Range-Aggregate Query on Distributed Uncertain Database
Probabilistic Data Management
Distributed Probabilistic Range-Aggregate Query on Uncertain Data
Skyline query with R*-Tree: Branch and Bound Skyline (BBS) Algorithm
Korea University of Technology and Education
Greedy Algorithms: Introduction
Continuous Density Queries for Moving Objects
The Skyline Query in Databases Which Objects are the Most Important?
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

Xu Zhou Kenli Li Yantao Zhou Keqin Li Adaptive Processing for Distributed Skyline Queries over Uncertain Data Xu Zhou Kenli Li Yantao Zhou Keqin Li

Motivation Increased data uncertainty Increased distributed data storage systems More attention towards skyline queries over uncertain data in distributed environments(DSUD query) Also, DSUD(Distributed Skyline query over Uncertain Data) is a vital research topic with many potential real-life application

Problem Statement Many Research achievement on Uncertain Data. But Most of them focused on single and centralized storage database Lack adaptations or optimization specific to Distributed environment Distributed skyline queries are available but mostly in certain data DSUD query and enhanced-DSUD was first formulated by Ding and Jin[1] with minimized bandwidth consumption and progressiveness But DSUD query still needs to be improved in three aspects PROGRESSIVENESS, EFFICIENCY, AND UNIVERSALITY

Contribution by Authors Review DSUD query and summarize its objectives Propose an improved framework for DSUD query with local site pruning Present an adaptive ( ADSUD) algorithm based on IDSUD framework Present evaluation of ADSUD algorithm which showed much better efficiency and progressiveness compared to e-DSUD.

Skyline query In a database, a Skyline = set of points which stand out among the others because are of special interest to us Skyline: those points which are not dominated by any other point. A point dominates another point if it is as good or better in all dimensions and better in at least one dimension

Uncertain Data Data which either exist or doesn’t exist in the real world Has an existential probability value based on which we decide whether it exist or not in the real world Here, world1 has existential probability of 0.001. 0.001% chance it exists actual database 99.99% chance it does not exist in the actual database. Fig 1

DSUD Query At first, each local site computes its local skylines, respectively. Given two local skyline tuples t1 and t2 from local site S1 and t1≻t2, local skyline probability of tuple t2 is PrLSky(t2)= Pr(t2)(1- (Pr(t1))∏t≻t2,t∈UDBi-t1(1-Pr(t)) Representative Skyline tuple Uncertain DataBase UDB1 UDB2 UDBk

DSUD Query In server H, let tuple t1≻ t1’in priority queue, then Local skyline probability of tuple t1’ is refreshed as PrLSky(t1’)= PrLSky(t1’)(1-(Pr(t1))∏t≻t1’,t∈L-t1(1-Pr(t)) At the start of the second iteration,assume that tuple t2 is sent to H and t2≻t1’. The approximate global probability of tuple t1’ can be computed by Pr’GSky(t1’)= PrLSky(t1’) x (1- PrLSky(t1))2 x (1-Pr(t2)) x ∏t∈L,t≻t1(1-Pr(t)) x ∏t≻t2,t∈UDBi-t1(1-Pr(t)) x ∏t≻t1’,t∈UDBx∩L-t2[PrLSky(t)(1 - Pr (t) / Pr(t))] Representative Skyline tuple

DSUD Query Finally, DSUD query returns the tuples at H whose Global Skyline probabilities are not less than α. Problem with DSUD It doesn’t take Total Query Time into consideration Its progressiveness also has room to be improved Representative Skyline tuple

The Improved-DSUD(IDSUD) Framework GOAL: Minimize TQT(Total Query Time) & Perform better progressiveness for DSUD query Improvements: First, Query-Routing Phase Introduced. Includes: Site pruning in Query-Routing Phase Progressive Pruning at each local site Second, improved PR-tree (IPR-tree) To boost DSUD Finally, in To-Server Phase, only one local site representative tuple each time. New local tuple choosing strategy(MPBR) MPBR selects multiple representative tuple for each site

Adaptive-DSUD(ADSUD) Algorithm Local Sorting Strategies Old method to calculate GSky effective to choose most dominant tuple Less effective for Local Skyline answers have dominant relationship New method to calculate approx. global skyline probability Let UGPrune be the set of unqualified tuples at H PrNewGSky(t)= PrLSky(t,UDBi) x ∏t’≻t [(1-Pr(t’)) x PrLSky(t’,UDBk) x ∏t”≻t’ 1/(1-Pr(t”))] Where, t’ ∊ UDBk∩L, t" ∊ UDBi∩(UGSky ∪ UGPrune) t1 t2 t3 Probability Decreases If PrNewGSky(t) < α , prune t and refresh the probabilities at H and arrange in descending order. tN Local Site

Minimum Probabilistic Bounding Rectangle(MPBR) Good skyline algorithm = minm. transfer of unqualified skyline points Therefore, selection of Local skyline algorithm is very important MBR of R-tree MPBR(their local algorithm) Created usually by clustering the near points Generated according to the probability threshold 2. Utilized in local-computation phase for improving pruning capacity 2. Used to help choosing the local multiple representative tuples & helps to gain the abstracted info for the site pruning in Query-Routing Phase

MPBR Definition: Set of tuples that satisfy the condition Prnonexist(BR) = ∏tj∊BR(1-Pr(tj)) < α MPBR-Dominance: Given two MPBRs BR= (tmin,tmax) and BR’ = (t’min, t’max), we have BR ≻ BR’, if tmax ≻ t’min.(tmin,tmax are the minimum and maximum corner of BR) Lemma 1: Given a MPBR BR = (tmin,tmax) and a tuple t, if tmax ≻ t, then t can be safely pruned. Lemma 2: Given two MPBRs BR =(tmin, tmax) and BR’ =(t’min, t’max), if BR ≻BR’ , then the tuples contained in BR’ can be safely pruned.

MPBR Constrained Space(MPCS) Definition: For a MPBR set BRS = {1 ≤ i ≤ |BRS||BRi = (timin, timax)} , its MPCS consists of the union of all the regions which are dominated by ti max. Based on the property of MPCS: Lemma 3: Given a MPCS CS of a MPBR set BRS = {1 ≤ i ≤ |BRS||BRi = (timin, timax)} and a tuple t, if t ∈ CS, the tuple t can be safely pruned Lemma 4: Given a MPCS CS and a MPBR BR = (tmin,tmax), if BR ∈ CS, the tuples contained in BR can be safely pruned

The Local Algorithms Reduced Local individual processing time = reduced TQT State-of-the-art Centralized algorithm The Reuse Mechanism Reduces I/O operation Boosts performance Traditional Methods: Applies window query over R-tree to find skyline result each time Results in visiting same node many times(larger I/O operation) Reuse Mechanism: Maintains a Reuse Heap (set of already examined node)

The IPR-Tree Indices are built on UDB to improve Query Efficiency by max(Prmax(PE3),Prmax(PE4)) Prnonexist(PE3) x Prnonexist(PE4) The IPR-Tree Indices are built on UDB to improve Query Efficiency by reducing the Processing time In this solution, IPR-Tree is used Skyline Probability of tuple depends on: Its existential probability Non-existential probability of each entry that dominates it Existential Probabilities Fig: Example of an IPR-Tree

The LUSQ ALgorithm Computes Skyline at each local site Uses Progressive pruning Strategy to reduce the Search Space Two Phases: Pruning & Refining Pruning: Uses different Lemma for pruning the tuples: Lemma 5 Given an entity E, if Prmax(E) < α, then tuples within E can be safely pruned. Lemma 6 Given two entries Ei and Ej, if Ei ≻ Ej and Prmax(Ej) Prnonexist(Ei) < α, the tuples contained in Ej can be safely pruned. Lemma 7. Given an entry E, in case PrUBLSky(E) < α, tuples within E can be safely pruned. And others. Refining: Use the Reuse-Heap(IPR-Tree) to Find all tuples in Heap that dominates the remaining tuples after pruning And recalculate their skyline probabilities

- used to find global candidate skylines The GUSQ Algorithm - used to find global candidate skylines Uncertain database UDB0 Representative Skyline tuple

Performance Evaluation

Performance Evaluation

Performance Evaluation

Conclusion To accelerate DSUD, improved DSUD framework and new algorithm ADSUD In ADSUD, several efficient technologies: IPR-Tree Reuse Technology MPBR Collecting global abstract information Selecting local representative tuples Future work: DSUD queries under MapReduce