Stochastic Skyline Operator

Slides:



Advertisements
Similar presentations
Lazy Updates: An Efficient Technique to Continuously Monitoring Reverse kNN Presented By: Ying Zhang Joint work with Muhammad Aamir Cheema, Xuemin Lin,
Advertisements

Multi-Guarded Safe Zone: An Effective Technique to Monitor Moving Circular Range Queries Presented By: Muhammad Aamir Cheema 1 Joint work with Ljiljana.
Computer Science and Engineering Diversified Spatial Keyword Search On Road Networks Chengyuan Zhang 1,Ying Zhang 2,1,Wenjie Zhang 1, Xuemin Lin 3,1, Muhammad.
Weiren Yu 1, Jiajin Le 2, Xuemin Lin 1, Wenjie Zhang 1 On the Efficiency of Estimating Penetrating Rank on Large Graphs 1 University of New South Wales.
1 Weiren Yu 1,2, Xuemin Lin 1, Wenjie Zhang 1 1 University of New South Wales 2 NICTA, Australia Towards Efficient SimRank Computation over Large Networks.
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Click to edit Present’s Name SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries Shiyu Yang 1, Muhammad Aamir Cheema 2,1, Xuemin.
VLDB’2007 review Denis Mindolin. VLDB’07 program.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.
Effectively Indexing Uncertain Moving Objects for Predictive Queries School of Computing National University of Singapore Department of Computer Science.
Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:
July 29HDMS'08 Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management.
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Deterministic Wavelet Thresholding for Maximum-Error Metrics Minos Garofalakis Bell Laboratories Lucent Technologies 600 Mountain Avenue Murray Hill, NJ.
Limitations of VCG-Based Mechanisms Shahar Dobzinski Joint work with Noam Nisan.
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
A Unified Approach for Computing Top-k Pairs in Multidimensional Space Presented By: Muhammad Aamir Cheema 1 Joint work with Xuemin Lin 1, Haixun Wang.
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
Presented by: Duong, Huu Kinh Luan March 14 th, 2011.
Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The University of New South Wales & NICTA Sydney, Australia.
Monté Carlo Simulation MGS 3100 – Chapter 9. Simulation Defined A computer-based model used to run experiments on a real system.  Typically done on a.
Computer Science and Engineering Loyalty-based Selection: Retrieving Objects That Persistently Satisfy Criteria Presented By: Zhitao Shen Joint work with.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Click to edit Present’s Name Xiaoyang Zhang 1, Jianbin Qin 1, Wei Wang 1, Yifang Sun 1, Jiaheng Lu 2 HmSearch: An Efficient Hamming Distance Query Processing.
Complexity of algorithms Algorithms can be classified by the amount of time they need to complete compared to their input size. There is a wide variety:
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
Research and Practice at University of Queensland Wei Lu ( 卢卫 ) 2/19/2009.
RELAXED REVERSE NEAREST NEIGHBORS QUERIES Arif Hidayat Muhammad Aamir Cheema David Taniar.
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
Influence Zone: Efficiently Processing Reverse k Nearest Neighbors Queries Presented By: Muhammad Aamir Cheema Joint work with Xuemin Lin, Wenjie Zhang,
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
Efficient Processing of Top-k Spatial Preference Queries
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.
Information Technology (Some) Research Trends in Location-based Services Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia.
Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams.
Probabilistic Skylines on Uncertain Data (VLDB2007) Jian Pei et al Supervisor: Dr Benjamin Kao Presenter: For Date: 22 Feb 2008 ??: the possible world.
Computer Science and Engineering Ranking Complex Objects in a Multi-dimensional Space Wenjie Zhang, Ying Zhang, Xuemin Lin The University of New South.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Click to edit Present’s Name AP-Tree: Efficiently Support Continuous Spatial-Keyword Queries Over Stream Xiang Wang 1*, Ying Zhang 2, Wenjie Zhang 1, Xuemin.
Introduction to genetic algorithm
A Unified Algorithm for Continuous Monitoring of Spatial Queries
Data Driven Resource Allocation for Distributed Learning
A Unified Framework for Efficiently Processing Ranking Related Queries
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Progressive Computation of The Min-Dist Optimal-Location Query
TT-Join: Efficient Set Containment Join
Stochastic Order and Skyline
Objective of This Course
Probabilistic Data Management
Xu Zhou Kenli Li Yantao Zhou Keqin Li
Probabilistic n-of-N Skyline Computation over Uncertain Data Streams
Efficient Subgraph Similarity All-Matching
Presented by: Mahady Hasan Joint work with
Publishing in Top Venues
for Vision-Based Navigation
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Uncertain Data Mobile Group 报告人:郝兴.
The Skyline Query in Databases Which Objects are the Most Important?
Efficient Processing of Top-k Spatial Preference Queries
An Efficient Partition Based Method for Exact Set Similarity Joins
Efficient Aggregation over Objects with Extent
Presentation transcript:

Stochastic Skyline Operator Xuemin Lin School of Computer Science University of New South Wales Australia Joint Work with: Ying Zhang (UNSW), Wenjie Zhang (UNSW), Muhammad Aamir Cheema (UNSW)

Introduction: Skyline a user preference ≺ is given on each dimension of Rd. two points in Rd, u dominates v (u ≺ v)  i (1 ≤ i ≤ d), u.i ≺= v.i;  j (1 ≤ j ≤ d), u.j ≺ v.j Skyline: Points not dominated by another point. Multiple criteria optimal decision making: minimum set of candidates of best options regarding any monotonic functions.

Skyline of Uncertain Objects Probabilistic Skyline: (VLDB07, PODS09, etc) Skyline probabilities by possible worlds. Providing the probabilities not worse than any other objects. Provide minimal candidate set of optimal solutions? How to define optimal options? How to characterize the minimum candidate set?

Expected Utility & Stochastic Order Expected Utility Principle: Given a set U of uncertain objects and a decreasing utility function f, select U in U to maxmize E[f (U)]. Stochastic Order: Given a family ℱ of utility functions, U ≺ℱ V if for each f in ℱ E[f(U)] ≥ E [f(V)] Decreasing Multiplicative Functions: ℱ= where fi is nonnegative decreasing. Low orthant order: the stochastic order is defined over the family of decreasing multiplicative functions.

Example 1. B never preferred by the expected utility principle! Athlete Instance 1 /probability Instance 2 /probability A (1,4) / 0.5 (3,2) / 0.5 B (2,5) / 0.5 (4,3) / 0.5 C (5,1) / 0.01 (3,4) / 0.99 Utility function: : nonnegative decreasing e.g. ; ; 1. B never preferred by the expected utility principle! 2. Psky (A) = 1, Psky (B) = 0.5, Psky (C) = 0.01

Contributions Introduce a novel skyline operator: stochastic skyline. Guarantee the minimal candidate set to the optimal solutions regarding decreasing multiplicative functions. NP-Completeness of computing stochastic skyline regarding dimensionality d. Novel statistic base pruning techniques. Efficient partition base verification algorithms: polynomial if d is fixed.

Problem Statement Stochastic Order (lower orthant order): Given U & V, U stochastically dominates V (U ≺sd V) if for any x, U.cdf (x) ≥ V.cdf (x) and exists y such that U.cdf (y) > V.cdf (y). U.cdf (x): probability mass of U in the rectangular region R ((0,0,…0), x); see the shaded region. Stochastic Skyline: the objects in U not stochastically dominated by any others, called stochastic skyline. Problem Statement: efficiently compute stochastic skyline regarding discrete cases.

Minimality of stochastic skyline Stochastic skyline removes all objects not preferred by any non-negative decreasing functions!

Framework Phase 1: filtering. Remove non-promising objects. Phase 2: verification. Test stochastic dominance between two objects. BBS combing with a heap: the “near” progressiveness only need to test either U ≺sd V or V ≺sd U in most cases (but not both).

Testing if U ≺sd V Violation point: a point x in Rd+ is a violation point regarding U ≺sd V if U.cdf (x) < V.cdf (x). Testing algorithm: if no violation points, then U ≺sd V. Not enough to test instances.

Reduce to Grid Points Test if U.cdf ≥ V.cdf against grid points only (see (a)). Testing the switching grid points only (see solid lines (b)).

Algorithm Given a rectangular region R (x, y), if U.cdf (x) ≥ V.cdf (y), then no violation point in R (x, y). Partition base testing algorithm: Get switching points Initial check Iteratively partition the grid to throw away non-promising sub-grids

Complexity The algorithm runs O (dm log m + md (T (Uartree) + T (Vartree))) where m is the number of instances in V. NP-Complete regarding d. Covert (the decision version of) the minimal set cover problem to a special case of the testing problem.

Filtering Techniques Pruning Rule 1: throw away fully dominated entries.

Filtering Techniques Pruning Rules 2: applying Cantelli’s Inequality to get upper-bonds.

Size Estimation: Expected size: size of stochastic skyline in Rd is bounded by that of conventional skyline in Rd+1; i.e., lnd (n)/(d+1)!

Empirical Study C++ with STL compiled with GNU GCC on 2.4GHz Debian Real data set: NBA player’s game-by-game statistics Synthetic dataset: anti-correlated, correlated, independent

Summary a novel skyline operator: stochastic skyline guarantee minimality . NP-complete to test stochastic order (lower orthant order) . novel efficient algorithms to compute stochastic order. Future work: F is a set of all decreasing functions?

Thank you!