The 𝜏-Skyline for Uncertain Data Haitao Wang (Utah State University) Wuzhou Zhang Presented by Matt Gibson (Duke University) (University of Texas at San Antonio)
Skyline 𝑝 dominates 𝑞, denoted as 𝑝≽𝑞, if 𝑥 𝑝 ≥𝑥(𝑞) and 𝑦 𝑝 ≥𝑦(𝑞) Here a point dominates itself for simplicity of discussion Given a set 𝑃 of (exact) points, a point 𝑝∈𝑃 is a skyline point of 𝑃 if 𝑝 is not dominated by any other point of 𝑃.
Uncertain Data 𝒫={ 𝑃 1 , …, 𝑃 𝑛 } : a set of 𝑛 uncertain points 𝑃 𝑖 = 𝑝 𝑖1 , …, 𝑝 𝑖𝑘 Pr[ 𝑃 𝑖 =𝑝 𝑖𝑗 ]= 𝑤 𝑖𝑗 𝑗=1 𝑘 𝑤 𝑖𝑗 =1 For a location 𝑝∈ 𝑃 𝑖 , we also use 𝑤(𝑝) to denote the probability of 𝑃 𝑖 being at 𝑝. Assume: 𝑃 𝑖 ’s are pairwise independent Set 𝑚=𝑛𝑘. 0.2 0.3 0.4 0.1
𝜏-Skyline of 𝑃 𝑖 Given a point 𝑞, the probability that 𝑃 𝑖 dominates 𝑞: 𝛿 𝑖 𝑞 = 𝑝∈ 𝑃 𝑖 , 𝑝≽𝑞 𝑤(𝑝) Given 𝜏∈ 0, 1 , the 𝜏-skyline region of 𝑃 𝑖 : 𝑅 𝑖 ={𝑞∣ 𝛿 𝑖 𝑞 ≤𝜏} i.e., the set of points 𝑞 such that the probability of 𝑃 𝑖 dominating 𝑞 is at most 𝜏. The 𝜏-skyline of 𝑃 𝑖 , denoted by 𝜋 𝑖 , is the boundary of 𝑅 𝑖 . 0-skyline of 𝑃 𝑖 corresponds to the (conventional) skyline of 𝑃 𝑖 .
𝜏-Skyline of 𝒫 Given 𝜏∈ 0, 1 , the 𝜏-skyline region of 𝒫= 𝑃 1 , …, 𝑃 𝑛 : 𝑅={𝑞∣ 𝛿 𝑖 𝑞 ≤𝜏, ∀𝑖≤𝑛} i.e., the set of points 𝑞 such that the probability of any 𝑃 𝑖 dominating 𝑞 is at most 𝜏. The 𝜏-skyline of 𝒫, denoted by 𝜋, is the boundary of 𝑅. 0-skyline of 𝒫 corresponds to the (conventional) skyline of 𝑖=1 𝑛 𝑃 𝑖 . The 𝜏-skyline probability of each 𝑃 𝑖 of 𝒫 is defined to be the probability of 𝑃 𝑖 lying inside the 𝜏-skyline region of 𝒫.
Related Work 𝜌-skyline Given 𝑞 and 𝒫={ 𝑃 1 , …, 𝑃 𝑛 } , the skyline probability of 𝑞: 𝛼 𝑞,𝒫 = 𝑖=1 𝑛 (1− 𝛿 𝑖 (𝑞) ) The skyline probability of 𝑃 𝑖 : 𝛼 𝑃 𝑖 ,𝒫 = 𝑗=1 𝑘 𝑤 𝑖𝑗 𝛼( 𝑝 𝑖𝑗 , 𝒫 ≠𝑖 ) where 𝒫 ≠𝑖 =𝒫− 𝑃 𝑖 . Given a parameter 𝜌, the goal is to compute 𝜌-skyline of 𝒫: { 𝑃 𝑖 ∣𝛼 𝑃 𝑖 ,𝒫 ≥𝜌} First sub-quadratic: 𝑂( 𝑚 5/3 poly(log 𝑛)) [Atallah et al. 2011] 𝑂( 𝑚 3/2 ), and 𝑂(𝑚𝑘log 𝑚) when 𝑘≪𝑛 [Afshani et al. 2011] Heuristic algorithms [Pei et al. 2007]
Relate Work (cnt.) Other variants of probabilistic skylines 𝐾-skyband [Lian et al. 2008] [Zhang et al. 2013] 𝐾-skyband Given a set 𝑃 of 𝑛 (exact) points and a parameter 𝐾≤𝑛, the 𝐾-skyband of 𝑃 asks for the set of points in 𝑃 which are dominated by at most 𝐾 points of 𝑃. 0-skyband of 𝑃 corresponds the conventional skyline of 𝑃. Our 𝜏-skyline of 𝑃 𝑖 can be used to answer the weighted skyband of 𝑃 𝑖 . As a byproduct, we obtain the first algorithm for computing the skyband of a set of weighted points!
Our Results We first show that The 𝜏-skyline 𝜋 𝑖 of 𝑃 𝑖 has complexity 𝑂 𝑘 , and can be computed in 𝑂(𝑘log 𝑘) time. Using this as a subroutine: The 𝜏-skyline 𝜋 of 𝒫 has complexity 𝑂 𝑚 , and can be computed in 𝑂(𝑚log 𝑚) time, where 𝑚=𝑛𝑘. After which, The 𝜏-skyline probabilities of all 𝑃 𝑖 ’s can be computed in 𝑂(𝑚log 𝑚) time. Our method is very simple and easy to implement!
Our Algorithm We first compute, for each 𝑃 𝑖 of 𝒫, the 𝜏-skyline 𝜋 𝑖 of 𝑃 𝑖 , in 𝑂(𝑘log 𝑘) time. Sweeping horizontally vertically Two movements: move downwards move rightwards Zig-Zag path We then show that 𝜋 is the upper envelope of 𝜋 1 , …, 𝜋 𝑛 , and can be computed in 𝑂(𝑚log 𝑚) time, where 𝑚=𝑛𝑘.
Computing the 𝜏-skyline 𝜋 𝑖 of 𝑃 𝑖 The goal is to find the set of points 𝑞 such that the probability of 𝑃 𝑖 dominating 𝑞 is at most 𝜏. Sweeping in both horizontal and vertical directions! As a preprocessing step, we sort all the locations of 𝑃 𝑖 into two sorted lists 𝐿 𝑥 and 𝐿 𝑦 : 𝐿 𝑥 by increasing 𝑥-coordinate 𝐿 𝑦 by decreasing 𝑦-coordinate Maintain two invariants during the sweeping process
Two Invariants During Sweeping We say that 𝑞′ strictly dominates 𝑞, denoted as 𝑞 ′ ≻𝑞, if 𝑥 𝑞 ′ >𝑥 𝑞 and 𝑦 𝑞 ′ >𝑦(𝑞) Given a point 𝑞, the probability that 𝑃 𝑖 strictly dominates 𝑞: 𝛿 𝑖 + 𝑞 = 𝑝∈ 𝑃 𝑖 , 𝑝≻𝑞 𝑤(𝑝) Let 𝑞 be any point on the 𝜏-skyline 𝜋 𝑖 of 𝑃 𝑖 . Two invariants: 𝛿 𝑖 𝑞 >𝜏 𝛿 𝑖 + 𝑞 ≤𝜏
Let’s Do The Sweeping! Initially, we sweep vertically along the line 𝑥=−∞ by scanning the sorted list 𝐿 𝑦 to find 𝑞=(−∞, 𝑦(𝑝)) for some 𝑝∈ 𝐿 𝑦 such that two invariants hold. (trivial) Then we sweep horizontally. An event happens if Either 𝑥 𝑞 =𝑥(𝑝) for some 𝑝∈ 𝐿 𝑥 Or 𝑦 𝑞 =𝑦(𝑝) for some 𝑝∈ 𝐿 𝑦 At each event, we decide to move rightwards or downwards. Terminate when either 𝐿 𝑥 or 𝐿 𝑦 becomes empty.
Analysis We have 𝑂 𝑘 events: 𝐿 𝑥 + 𝐿 𝑦 =2𝑘 At each event, 𝑂(1) time: Two very simple invariant checks We update 𝛿 𝑖 (𝑞) and 𝛿 𝑖 + (𝑞) 𝑂 𝑘 events also implies 𝑂 𝑘 complexity of 𝜋 𝑖 Correctness is also easy to verify (see our paper) Theorem: The 𝜏-skyline 𝜋 𝑖 of 𝑃 𝑖 has complexity 𝑂 𝑘 , and can be computed in 𝑂(𝑘log 𝑘) time.
Computing the 𝜏-skyline 𝜋 of 𝒫 An easy observation is that the 𝜏-skyline region 𝑅 of 𝒫 is the common intersection of the 𝜏-skyline region 𝑅 1 ,…, 𝑅 𝑛 , i.e., 𝑅= 𝑖=1 𝑛 𝑅 𝑖 And the 𝜏-skyline 𝜋 of 𝒫 is simply the upper envelope of 𝜋 1 , …, 𝜋 𝑛 . Equivalently, 𝜋 is the (conventional) skyline of the 𝑂(𝑛𝑘) turning points of all the 𝜋 𝑖 ’s. Theorem: The 𝜏-skyline 𝜋 of 𝒫 has complexity 𝑂 𝑚 , and can be computed in 𝑂(𝑚log 𝑚) time, where 𝑚=𝑛𝑘.