Xu Zhou Kenli Li Yantao Zhou Keqin Li

Xu Zhou Kenli Li Yantao Zhou Keqin Li
Adaptive Processing for Distributed Skyline Queries over Uncertain Data Xu Zhou Kenli Li Yantao Zhou Keqin Li

Motivation Increased data uncertainty
Increased distributed data storage systems More attention towards skyline queries over uncertain data in distributed environments(DSUD query) Also, DSUD(Distributed Skyline query over Uncertain Data) is a vital research topic with many potential real-life application

Problem Statement Many Research achievement on Uncertain Data. But
Most of them focused on single and centralized storage database Lack adaptations or optimization specific to Distributed environment Distributed skyline queries are available but mostly in certain data DSUD query and enhanced-DSUD was first formulated by Ding and Jin[1] with minimized bandwidth consumption and progressiveness But DSUD query still needs to be improved in three aspects PROGRESSIVENESS, EFFICIENCY, AND UNIVERSALITY

Contribution by Authors
Review DSUD query and summarize its objectives Propose an improved framework for DSUD query with local site pruning Present an adaptive ( ADSUD) algorithm based on IDSUD framework Present evaluation of ADSUD algorithm which showed much better efficiency and progressiveness compared to e-DSUD.

Skyline query In a database, a Skyline = set of points which stand out among the others because are of special interest to us Skyline: those points which are not dominated by any other point. A point dominates another point if it is as good or better in all dimensions and better in at least one dimension

Uncertain Data Data which either exist or doesn’t exist
in the real world Has an existential probability value based on which we decide whether it exist or not in the real world Here, world1 has existential probability of 0.001% chance it exists actual database 99.99% chance it does not exist in the actual database. Fig 1

DSUD Query At first, each local site computes its local skylines, respectively. Given two local skyline tuples t1 and t2 from local site S1 and t1≻t2, local skyline probability of tuple t2 is PrLSky(t2)= Pr(t2)(1- (Pr(t1))∏t≻t2,t∈UDBi-t1(1-Pr(t)) Representative Skyline tuple Uncertain DataBase UDB1 UDB2 UDBk

DSUD Query In server H, let tuple t1≻ t1’in priority queue,
then Local skyline probability of tuple t1’ is refreshed as PrLSky(t1’)= PrLSky(t1’)(1-(Pr(t1))∏t≻t1’,t∈L-t1(1-Pr(t)) At the start of the second iteration,assume that tuple t2 is sent to H and t2≻t1’. The approximate global probability of tuple t1’ can be computed by Pr’GSky(t1’)= PrLSky(t1’) x (1- PrLSky(t1))2 x (1-Pr(t2)) x ∏t∈L,t≻t1(1-Pr(t)) x ∏t≻t2,t∈UDBi-t1(1-Pr(t)) x ∏t≻t1’,t∈UDBx∩L-t2[PrLSky(t)(1 - Pr (t) / Pr(t))] Representative Skyline tuple

DSUD Query Finally, DSUD query returns the tuples at H
whose Global Skyline probabilities are not less than α. Problem with DSUD It doesn’t take Total Query Time into consideration Its progressiveness also has room to be improved Representative Skyline tuple

The Improved-DSUD(IDSUD) Framework
GOAL: Minimize TQT(Total Query Time) & Perform better progressiveness for DSUD query Improvements: First, Query-Routing Phase Introduced. Includes: Site pruning in Query-Routing Phase Progressive Pruning at each local site Second, improved PR-tree (IPR-tree) To boost DSUD Finally, in To-Server Phase, only one local site representative tuple each time. New local tuple choosing strategy(MPBR) MPBR selects multiple representative tuple for each site

Adaptive-DSUD(ADSUD) Algorithm
Local Sorting Strategies Old method to calculate GSky effective to choose most dominant tuple Less effective for Local Skyline answers have dominant relationship New method to calculate approx. global skyline probability Let UGPrune be the set of unqualified tuples at H PrNewGSky(t)= PrLSky(t,UDBi) x ∏t’≻t [(1-Pr(t’)) x PrLSky(t’,UDBk) x ∏t”≻t’ 1/(1-Pr(t”))] Where, t’ ∊ UDBk∩L, t" ∊ UDBi∩(UGSky ∪ UGPrune) t1 t2 t3 Probability Decreases If PrNewGSky(t) < α , prune t and refresh the probabilities at H and arrange in descending order. tN Local Site

Minimum Probabilistic Bounding Rectangle(MPBR)
Good skyline algorithm = minm. transfer of unqualified skyline points Therefore, selection of Local skyline algorithm is very important MBR of R-tree MPBR(their local algorithm) Created usually by clustering the near points Generated according to the probability threshold 2. Utilized in local-computation phase for improving pruning capacity 2. Used to help choosing the local multiple representative tuples & helps to gain the abstracted info for the site pruning in Query-Routing Phase

MPBR Definition: Set of tuples that satisfy the condition
Prnonexist(BR) = ∏tj∊BR(1-Pr(tj)) < α MPBR-Dominance: Given two MPBRs BR= (tmin,tmax) and BR’ = (t’min, t’max), we have BR ≻ BR’, if tmax ≻ t’min.(tmin,tmax are the minimum and maximum corner of BR) Lemma 1: Given a MPBR BR = (tmin,tmax) and a tuple t, if tmax ≻ t, then t can be safely pruned. Lemma 2: Given two MPBRs BR =(tmin, tmax) and BR’ =(t’min, t’max), if BR ≻BR’ , then the tuples contained in BR’ can be safely pruned.

MPBR Constrained Space(MPCS)
Definition: For a MPBR set BRS = {1 ≤ i ≤ |BRS||BRi = (timin, timax)} , its MPCS consists of the union of all the regions which are dominated by ti max. Based on the property of MPCS: Lemma 3: Given a MPCS CS of a MPBR set BRS = {1 ≤ i ≤ |BRS||BRi = (timin, timax)} and a tuple t, if t ∈ CS, the tuple t can be safely pruned Lemma 4: Given a MPCS CS and a MPBR BR = (tmin,tmax), if BR ∈ CS, the tuples contained in BR can be safely pruned

The Local Algorithms Reduced Local individual processing time = reduced TQT State-of-the-art Centralized algorithm The Reuse Mechanism Reduces I/O operation Boosts performance Traditional Methods: Applies window query over R-tree to find skyline result each time Results in visiting same node many times(larger I/O operation) Reuse Mechanism: Maintains a Reuse Heap (set of already examined node)

The IPR-Tree Indices are built on UDB to improve Query Efficiency by
max(Prmax(PE3),Prmax(PE4)) Prnonexist(PE3) x Prnonexist(PE4) The IPR-Tree Indices are built on UDB to improve Query Efficiency by reducing the Processing time In this solution, IPR-Tree is used Skyline Probability of tuple depends on: Its existential probability Non-existential probability of each entry that dominates it Existential Probabilities Fig: Example of an IPR-Tree

The LUSQ ALgorithm Computes Skyline at each local site
Uses Progressive pruning Strategy to reduce the Search Space Two Phases: Pruning & Refining Pruning: Uses different Lemma for pruning the tuples: Lemma 5 Given an entity E, if Prmax(E) < α, then tuples within E can be safely pruned. Lemma 6 Given two entries Ei and Ej, if Ei ≻ Ej and Prmax(Ej) Prnonexist(Ei) < α, the tuples contained in Ej can be safely pruned. Lemma 7. Given an entry E, in case PrUBLSky(E) < α, tuples within E can be safely pruned. And others. Refining: Use the Reuse-Heap(IPR-Tree) to Find all tuples in Heap that dominates the remaining tuples after pruning And recalculate their skyline probabilities

- used to find global candidate skylines
The GUSQ Algorithm - used to find global candidate skylines Uncertain database UDB0 Representative Skyline tuple

Performance Evaluation

Conclusion To accelerate DSUD, improved DSUD framework and new algorithm ADSUD In ADSUD, several efficient technologies: IPR-Tree Reuse Technology MPBR Collecting global abstract information Selecting local representative tuples Future work: DSUD queries under MapReduce

Xu Zhou Kenli Li Yantao Zhou Keqin Li

Similar presentations

Presentation on theme: "Xu Zhou Kenli Li Yantao Zhou Keqin Li"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Xu Zhou Kenli Li Yantao Zhou Keqin Li

Similar presentations

Presentation on theme: "Xu Zhou Kenli Li Yantao Zhou Keqin Li"— Presentation transcript:

Similar presentations

About project

Feedback