Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University.

Similar presentations


Presentation on theme: "Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University."— Presentation transcript:

1 Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University of Hong Kong) Ke Wang (Simon Fraser University) KDD ’ 07, August 12-15, 2007, San Jose, California, USA

2 Outline 1.Introduction 2.Skyline 3.Algorithm 4.Empirical Study 5.Conclusion

3 1. Introduction Suppose we want to look for a vacation package 3 packages Package IDPriceHotel-class a4 b24001 c3000 Suppose we compare package a and b We want to have cheaper price. We want have a higher hotel-class. We know that package a is “better” than package b because 1.Price of package a is smaller 2.Hotel-class of package a is higher Package a “ dominates ” package b 5 1000

4 1. Introduction Package IDPriceHotel-class a10004 b24001 c30005 Thus, we do not need to consider package b. We know that 1.Package a has a cheapest price 2.Package c has a highest hotel-class Packge a and c don’t dominate by other points Thus, package a and package c are all of the “best” possible choices. We call that package a and package c are skyline points.

5 Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) 6 packages Suppose we want to look for a vacation package Different customers may have different preferences on Hotel-group. Suppose a customer have the following preferences. H < T < M The skyline points are packages a and c. Suppose another customer have the following preferences. H < M < T The skyline points are packages a, c and e. In other words, different preferences give differentn skyline points.

6 1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003 CustomerPreference on Hotel- group Skyline AliceT < M{a, c} BobNo special preference f {a, c, e, f} ChrisH < M {a, c, e} DavidH < M < T {a, c, e} EmilyH < T < M{a, c} FredM < T f {a, c, e, f} What preferences make package f a skyline point? Suppose hotel-group Mozilla wants to promote its own packages (e.g., package f) to potential customers. Bob and Fred are the potential customers.

7 1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or conditions that this package is a skyline point? Favorable facets

8 1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? {} SKY={a, c, e, f} SKY={a,c} {T < M} {H < M} SKY={a,c,e} {T < H} SKY={a,c,e,f} {H < T} SKY={a,c,e,f} {M < T} SKY={a,c,e,f} {M < H} {T < M, H < M}{T < M, T < M}{H < T, H < M}{T < H, M < H} … SKY={a,c} SKY={a,c,e}SKY={a,c,e,f} {T < M, T < M, H < M}{T < M, T < M, M < H} SKY={a,c} T SKY={} We can solve the problem by a naive method: Lattice Search

9 1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? {} SKY={a, c, e, f} SKY={a,c} {T < M} {H < M} SKY={a,c,e} SKY={a,c,e,f} {T < M, H < M}{T < M, T < M}{H < T, H < M} … SKY={a,c} SKY={a,c,e} {T < M, T < M, H < M}{T < M, T < M, M < H} SKY={a,c} T SKY={} We can solve the problem by a naive method: Lattice Search Consider package f Preferences: {}, {T < H}, {H < T}, {M < T} {M < H},, {T < H, M < H} SKY={a,c,e,f} {T < H} {H < T}{M < T}{M < H} {T < H, M < H}

10 We need to compute all skyline points for each possible preference There are many preferences which qualify package f as a skyline point This approach has two disadvantages. 1. Computation is costly. 2. It is difficult to interpret the results.

11 1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? {} SKY={a, c, e, f} SKY={a,c} {T < M} {H < M} SKY={a,c,e} SKY={a,c,e,f} {T < M, H < M}{T < M, T < M}{H < T, H < M} … SKY={a,c} SKY={a,c,e} {T < M, T < M, H < M}{T < M, T < M, M < H} SKY={a,c} T SKY={} We can solve the problem by a naive method: Lattice Search Consider package f SKY={a,c,e,f} {T < H} {H < T}{M < T}{M < H} {T < H, M < H} border for f We find that whenever the preference contains “ T < M ” or “ H < M ”, package f is not a skyline point. We can say that “ T < M ” or “ H < M ” is a minimal disqualifying condition (MDC).

12 3. Algorithm How to find MDCs of a point? Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?

13 3. Algorithm Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Point q is said to quasi-dominate point p if all attributes of point q are NOT worse than those of point p. e.g. Package a quasi-dominates package f because 1. Package a has a lower (or better) price than package f 2. Package a has a higher (or better) hotel-class than package f If package a quasi-dominates package f, we define R a  f as follows. {T < M}

14 3. Algorithm Two Algorithms MDC-O: Computing MDC On-the-fly Does not store MDCs of points Compute MDC of a given points on-the-fly MDC-M: A Materialization Method Store MDCs of all points Indexing Method for Speed-up R*-tree Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?

15 3.1 MDC-O: Computing MDC On-the-fly On-the-fly Algorithm Given data point p Variable MDC(p): minimal disqualifying condition Algorithm MDC(p)   For each data point q which quasi-dominates p if MDC(p) does not contain R q  p insert R q  p to MDC(p) Return MDC(p) Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?

16 3.2 MDC-M: A Materialization Method Materialization Algorithm Variable MDC(p): minimal disqualifying condition Algorithm MDC(p)   For each data point p For each data point q which quasi-dominates p if MDC(p) does not contain R q  p then insert R q  p to MDC(p) Store MDC(p) Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?

17 4. Empirical Study Datasets Synthetic Dataset Real Dataset (from UCI) Nursery Dataset Automobile Dataset Default Values (Synthetic) No. of tuples = 500K No. of numeric dimensions = 3 No. of categorical dimensions = 1 No. of values in a nominal dimension = 20

18 4. Empirical Study Without indexing: MDC-O: Slowest Search Time MDC-M: Faster Search Time Storage of MDC: 8MB With indexing: MDC-O and MDC-M: Fast Search Time

19 4. Empirical Study Automobile Three car models CarMDC Honda “ Toyota < Honda ” Mitsubishi “ Honda < Mitsubishi ” or “ Toyota < Mitsubishi ” Toyota- A salesperson should NOT promote this car to the customer who prefers Toyota to Honda. A salesperson should promote this car to ANY customers.

20 5. Conclusion Skyline Favorable Facets Minimal Disqualifying Condition Algorithm On-the-fly Materialization Empirical Study


Download ppt "Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University."

Similar presentations


Ads by Google