Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Identifying the Most Influential Data Objects with Reverse Top-k Queries By Akrivi Vlachou 1, Christos Doulkeridis 1, Kjetil Nørvag 1 and Yannis Kotidis.
Minimality Attack in Privacy Preserving Data Publishing Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Ada Wai-Chee Fu (the Chinese University.
The Skyline Operator (Stephan Borzsonyi, Donald Kossmann, Konrad Stocker) Presenter: Shehnaaz Yusuf March 2005.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Maintaining Sliding Widow Skylines on Data Streams.
1 Finding Shortest Paths on Terrains by Killing Two Birds with One Stone Manohar Kaul (Aarhus University) Raymond Chi-Wing Wong (Hong Kong University of.
1 Profit Mining: From Patterns to Action Ke Wang, Senqiang Zhou, Jiawei Han Simon Fraser University.
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
Project topics – Private data management Nov
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Preference Queries from OLAP and Data Mining Perspective
Using Trees to Depict a Forest Bin Liu, H. V. Jagadish EECS, University of Michigan, Ann Arbor Presented by Sergey Shepshelvich 1.
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
Probabilistic Inference Protection on Anonymized Data
On Efficient Spatial Matching Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Yufei Tao (the Chinese University of Hong Kong) Ada Wai-Chee.
1 Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor Raymond Chi-Wing Wong (Hong Kong University of Science and Technology) M. Tamer.
Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,
1 Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University.
Indexing of Network Constrained Moving Objects Dieter Pfoser Christian S. Jensen Chia-Yu Chang.
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,
Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm.
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.
Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science.
1 Efficient Algorithms for Optimal Location Queries in Road Networks Zitong Chen (Sun Yat-Sen University) Yubao Liu (Sun Yat-Sen University) Raymond Chi-Wing.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
1 Hypersphere Dominance: An Optimal Approach Cheng Long, Raymond Chi-Wing Wong, Bin Zhang, Min Xie The Hong Kong University of Science and Technology Prepared.
Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces Jian Pei # Wen Jin # Martin Ester # Yufei Tao + # Simon Fraser University,
SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:
Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science.
1 Exact Top-k Nearest Keyword Search in Large Networks Minhao Jiang†, Ada Wai-Chee Fu‡, Raymond Chi-Wing Wong† † The Hong Kong University of Science and.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
SFU Pushing Sensitive Transactions for Itemset Utility (IEEE ICDM 2008) Presenter: Yabo, Xu Authors: Yabo Xu, Benjam C.M. Fung, Ke Wang, Ada. W.C. Fu,
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
Efficient Progressive Processing of Skyline Queries in Peer-to-Peer Systems INFOSCALE’06.
Efficient Computation of Reverse Skyline Queries VLDB 2007.
K-Hit Query: Top-k Query Processing with Probabilistic Utility Function SIGMOD2015 Peng Peng, Raymond C.-W. Wong CSE, HKUST 1.
Efficient Processing of Top-k Spatial Preference Queries
DB Seminar Schedule Seminar Schedule ================================================================= Chui Chun Kit30/11/07 Gong Jian Jim7/12/07 Loo Kin.
The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.
1 On Optimal Worst-Case Matching Cheng Long (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and.
The σ-neighborhood skyline queries Chen, Yi-Chung; LEE, Chiang. The σ-neighborhood skyline queries. Information Sciences, 2015, 322: 張天彥 2015/12/05.
1 Efficient Computation of Diverse Query Results Erik Vee joint work with Utkarsh Srivastava, Jayavel Shanmugasundaram, Prashant Bhat, Sihem Amer Yahia.
Efficient Computation of Combinatorial Skyline Queries Author: Yu-Chi Chung, I-Fang Su, and Chiang Lee Source: Information Systems, 38(2013), pp
D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.
On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU.
Online Interval Skyline Queries on Time Series ICDE 2009.
1 Finding Competitive Price Yu Peng (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and Technology)
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
Bin Jiang, Jian Pei ICDE 2009 Online Interval Skyline Queries on Time Series 1.
Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
1 Efficient Computation of Diverse Query Results Erik Vee joint work with Utkarsh Srivastava, Jayavel Shanmugasundaram, Prashant Bhat, Sihem Amer Yahia.
A Spatial Index Structure for High Dimensional Point Data Wei Wang, Jiong Yang, and Richard Muntz Data Mining Lab Department of Computer Science University.
HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Da Yan, Raymond Chi-Wing Wong, and Wilfred Ng The Hong Kong University of Science and Technology.
Differential Analysis on Deep Web Data Sources Tantan Liu, Fan Wang, Jiedan Zhu, Gagan Agrawal December.
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Mining Association Rules from Stars
Objectives Data Mining Course
Report Writing.
Uncertain Data Mobile Group 报告人:郝兴.
Efficient Processing of Top-k Spatial Preference Queries
Fraction-Score: A New Support Measure for Co-location Pattern Mining
Design and Analysis of Algorithms
Presentation transcript:

Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University of Hong Kong) Ke Wang (Simon Fraser University) KDD ’ 07, August 12-15, 2007, San Jose, California, USA

Outline 1.Introduction 2.Skyline 3.Algorithm 4.Empirical Study 5.Conclusion

1. Introduction Suppose we want to look for a vacation package 3 packages Package IDPriceHotel-class a4 b24001 c3000 Suppose we compare package a and b We want to have cheaper price. We want have a higher hotel-class. We know that package a is “better” than package b because 1.Price of package a is smaller 2.Hotel-class of package a is higher Package a “ dominates ” package b

1. Introduction Package IDPriceHotel-class a10004 b24001 c30005 Thus, we do not need to consider package b. We know that 1.Package a has a cheapest price 2.Package c has a highest hotel-class Packge a and c don’t dominate by other points Thus, package a and package c are all of the “best” possible choices. We call that package a and package c are skyline points.

Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) 6 packages Suppose we want to look for a vacation package Different customers may have different preferences on Hotel-group. Suppose a customer have the following preferences. H < T < M The skyline points are packages a and c. Suppose another customer have the following preferences. H < M < T The skyline points are packages a, c and e. In other words, different preferences give differentn skyline points.

1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003 CustomerPreference on Hotel- group Skyline AliceT < M{a, c} BobNo special preference f {a, c, e, f} ChrisH < M {a, c, e} DavidH < M < T {a, c, e} EmilyH < T < M{a, c} FredM < T f {a, c, e, f} What preferences make package f a skyline point? Suppose hotel-group Mozilla wants to promote its own packages (e.g., package f) to potential customers. Bob and Fred are the potential customers.

1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or conditions that this package is a skyline point? Favorable facets

1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? {} SKY={a, c, e, f} SKY={a,c} {T < M} {H < M} SKY={a,c,e} {T < H} SKY={a,c,e,f} {H < T} SKY={a,c,e,f} {M < T} SKY={a,c,e,f} {M < H} {T < M, H < M}{T < M, T < M}{H < T, H < M}{T < H, M < H} … SKY={a,c} SKY={a,c,e}SKY={a,c,e,f} {T < M, T < M, H < M}{T < M, T < M, M < H} SKY={a,c} T SKY={} We can solve the problem by a naive method: Lattice Search

1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? {} SKY={a, c, e, f} SKY={a,c} {T < M} {H < M} SKY={a,c,e} SKY={a,c,e,f} {T < M, H < M}{T < M, T < M}{H < T, H < M} … SKY={a,c} SKY={a,c,e} {T < M, T < M, H < M}{T < M, T < M, M < H} SKY={a,c} T SKY={} We can solve the problem by a naive method: Lattice Search Consider package f Preferences: {}, {T < H}, {H < T}, {M < T} {M < H},, {T < H, M < H} SKY={a,c,e,f} {T < H} {H < T}{M < T}{M < H} {T < H, M < H}

We need to compute all skyline points for each possible preference There are many preferences which qualify package f as a skyline point This approach has two disadvantages. 1. Computation is costly. 2. It is difficult to interpret the results.

1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? {} SKY={a, c, e, f} SKY={a,c} {T < M} {H < M} SKY={a,c,e} SKY={a,c,e,f} {T < M, H < M}{T < M, T < M}{H < T, H < M} … SKY={a,c} SKY={a,c,e} {T < M, T < M, H < M}{T < M, T < M, M < H} SKY={a,c} T SKY={} We can solve the problem by a naive method: Lattice Search Consider package f SKY={a,c,e,f} {T < H} {H < T}{M < T}{M < H} {T < H, M < H} border for f We find that whenever the preference contains “ T < M ” or “ H < M ”, package f is not a skyline point. We can say that “ T < M ” or “ H < M ” is a minimal disqualifying condition (MDC).

3. Algorithm How to find MDCs of a point? Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?

3. Algorithm Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Point q is said to quasi-dominate point p if all attributes of point q are NOT worse than those of point p. e.g. Package a quasi-dominates package f because 1. Package a has a lower (or better) price than package f 2. Package a has a higher (or better) hotel-class than package f If package a quasi-dominates package f, we define R a  f as follows. {T < M}

3. Algorithm Two Algorithms MDC-O: Computing MDC On-the-fly Does not store MDCs of points Compute MDC of a given points on-the-fly MDC-M: A Materialization Method Store MDCs of all points Indexing Method for Speed-up R*-tree Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?

3.1 MDC-O: Computing MDC On-the-fly On-the-fly Algorithm Given data point p Variable MDC(p): minimal disqualifying condition Algorithm MDC(p)   For each data point q which quasi-dominates p if MDC(p) does not contain R q  p insert R q  p to MDC(p) Return MDC(p) Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?

3.2 MDC-M: A Materialization Method Materialization Algorithm Variable MDC(p): minimal disqualifying condition Algorithm MDC(p)   For each data point p For each data point q which quasi-dominates p if MDC(p) does not contain R q  p then insert R q  p to MDC(p) Store MDC(p) Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?

4. Empirical Study Datasets Synthetic Dataset Real Dataset (from UCI) Nursery Dataset Automobile Dataset Default Values (Synthetic) No. of tuples = 500K No. of numeric dimensions = 3 No. of categorical dimensions = 1 No. of values in a nominal dimension = 20

4. Empirical Study Without indexing: MDC-O: Slowest Search Time MDC-M: Faster Search Time Storage of MDC: 8MB With indexing: MDC-O and MDC-M: Fast Search Time

4. Empirical Study Automobile Three car models CarMDC Honda “ Toyota < Honda ” Mitsubishi “ Honda < Mitsubishi ” or “ Toyota < Mitsubishi ” Toyota- A salesperson should NOT promote this car to the customer who prefers Toyota to Honda. A salesperson should promote this car to ANY customers.

5. Conclusion Skyline Favorable Facets Minimal Disqualifying Condition Algorithm On-the-fly Materialization Empirical Study