Download presentation
Presentation is loading. Please wait.
Published byEugenia Stewart Modified over 9 years ago
1
Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University of Hong Kong) Ke Wang (Simon Fraser University) KDD ’ 07, August 12-15, 2007, San Jose, California, USA
2
Outline 1.Introduction 2.Skyline 3.Algorithm 4.Empirical Study 5.Conclusion
3
1. Introduction Suppose we want to look for a vacation package 3 packages Package IDPriceHotel-class a4 b24001 c3000 Suppose we compare package a and b We want to have cheaper price. We want have a higher hotel-class. We know that package a is “better” than package b because 1.Price of package a is smaller 2.Hotel-class of package a is higher Package a “ dominates ” package b 5 1000
4
1. Introduction Package IDPriceHotel-class a10004 b24001 c30005 Thus, we do not need to consider package b. We know that 1.Package a has a cheapest price 2.Package c has a highest hotel-class Packge a and c don’t dominate by other points Thus, package a and package c are all of the “best” possible choices. We call that package a and package c are skyline points.
5
Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) 6 packages Suppose we want to look for a vacation package Different customers may have different preferences on Hotel-group. Suppose a customer have the following preferences. H < T < M The skyline points are packages a and c. Suppose another customer have the following preferences. H < M < T The skyline points are packages a, c and e. In other words, different preferences give differentn skyline points.
6
1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003 CustomerPreference on Hotel- group Skyline AliceT < M{a, c} BobNo special preference f {a, c, e, f} ChrisH < M {a, c, e} DavidH < M < T {a, c, e} EmilyH < T < M{a, c} FredM < T f {a, c, e, f} What preferences make package f a skyline point? Suppose hotel-group Mozilla wants to promote its own packages (e.g., package f) to potential customers. Bob and Fred are the potential customers.
7
1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or conditions that this package is a skyline point? Favorable facets
8
1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? {} SKY={a, c, e, f} SKY={a,c} {T < M} {H < M} SKY={a,c,e} {T < H} SKY={a,c,e,f} {H < T} SKY={a,c,e,f} {M < T} SKY={a,c,e,f} {M < H} {T < M, H < M}{T < M, T < M}{H < T, H < M}{T < H, M < H} … SKY={a,c} SKY={a,c,e}SKY={a,c,e,f} {T < M, T < M, H < M}{T < M, T < M, M < H} SKY={a,c} T SKY={} We can solve the problem by a naive method: Lattice Search
9
1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? {} SKY={a, c, e, f} SKY={a,c} {T < M} {H < M} SKY={a,c,e} SKY={a,c,e,f} {T < M, H < M}{T < M, T < M}{H < T, H < M} … SKY={a,c} SKY={a,c,e} {T < M, T < M, H < M}{T < M, T < M, M < H} SKY={a,c} T SKY={} We can solve the problem by a naive method: Lattice Search Consider package f Preferences: {}, {T < H}, {H < T}, {M < T} {M < H},, {T < H, M < H} SKY={a,c,e,f} {T < H} {H < T}{M < T}{M < H} {T < H, M < H}
10
We need to compute all skyline points for each possible preference There are many preferences which qualify package f as a skyline point This approach has two disadvantages. 1. Computation is costly. 2. It is difficult to interpret the results.
11
1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? {} SKY={a, c, e, f} SKY={a,c} {T < M} {H < M} SKY={a,c,e} SKY={a,c,e,f} {T < M, H < M}{T < M, T < M}{H < T, H < M} … SKY={a,c} SKY={a,c,e} {T < M, T < M, H < M}{T < M, T < M, M < H} SKY={a,c} T SKY={} We can solve the problem by a naive method: Lattice Search Consider package f SKY={a,c,e,f} {T < H} {H < T}{M < T}{M < H} {T < H, M < H} border for f We find that whenever the preference contains “ T < M ” or “ H < M ”, package f is not a skyline point. We can say that “ T < M ” or “ H < M ” is a minimal disqualifying condition (MDC).
12
3. Algorithm How to find MDCs of a point? Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?
13
3. Algorithm Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Point q is said to quasi-dominate point p if all attributes of point q are NOT worse than those of point p. e.g. Package a quasi-dominates package f because 1. Package a has a lower (or better) price than package f 2. Package a has a higher (or better) hotel-class than package f If package a quasi-dominates package f, we define R a f as follows. {T < M}
14
3. Algorithm Two Algorithms MDC-O: Computing MDC On-the-fly Does not store MDCs of points Compute MDC of a given points on-the-fly MDC-M: A Materialization Method Store MDCs of all points Indexing Method for Speed-up R*-tree Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?
15
3.1 MDC-O: Computing MDC On-the-fly On-the-fly Algorithm Given data point p Variable MDC(p): minimal disqualifying condition Algorithm MDC(p) For each data point q which quasi-dominates p if MDC(p) does not contain R q p insert R q p to MDC(p) Return MDC(p) Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?
16
3.2 MDC-M: A Materialization Method Materialization Algorithm Variable MDC(p): minimal disqualifying condition Algorithm MDC(p) For each data point p For each data point q which quasi-dominates p if MDC(p) does not contain R q p then insert R q p to MDC(p) Store MDC(p) Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?
17
4. Empirical Study Datasets Synthetic Dataset Real Dataset (from UCI) Nursery Dataset Automobile Dataset Default Values (Synthetic) No. of tuples = 500K No. of numeric dimensions = 3 No. of categorical dimensions = 1 No. of values in a nominal dimension = 20
18
4. Empirical Study Without indexing: MDC-O: Slowest Search Time MDC-M: Faster Search Time Storage of MDC: 8MB With indexing: MDC-O and MDC-M: Fast Search Time
19
4. Empirical Study Automobile Three car models CarMDC Honda “ Toyota < Honda ” Mitsubishi “ Honda < Mitsubishi ” or “ Toyota < Mitsubishi ” Toyota- A salesperson should NOT promote this car to the customer who prefers Toyota to Honda. A salesperson should promote this car to ANY customers.
20
5. Conclusion Skyline Favorable Facets Minimal Disqualifying Condition Algorithm On-the-fly Materialization Empirical Study
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.