Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University.

Similar presentations


Presentation on theme: "1 Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University."— Presentation transcript:

1 1 Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University of Hong Kong) Ke Wang (Simon Fraser University) Prepared by Raymond Chi-Wing Wong Presented by Raymond Chi-Wing Wong

2 2 Outline 1.Introduction 2.Skyline 3.Algorithm 4.Empirical Study 5.Conclusion

3 3 1. Introduction Package IDPriceHotel-class a10004 b24001 c30005 3 packages Suppose we want to look for a vacation package Suppose we compare package a and package b We want to have a cheaper package. We want to have a higher hotel-class. We know that package a is “ better ” than package b because 1.Price of package a is smaller 2.Hotel-class of package a is higher Thus, we do not need to consider package b. Package a “ dominates ” package b

4 4 1. Introduction Package IDPriceHotel-class a10004 b24001 c30005 3 packages Suppose we want to look for a vacation package Suppose we compare package a and package b We want to have a cheaper package. We want to have a higher hotel-class. We know that package a is “ better ” than package b because 1.Price of package a is smaller 2.Hotel-class of package a is higher Thus, we do not need to consider package b. Package a “ dominates ” package b

5 5 1. Introduction Package IDPriceHotel-class a10004 b24001 c30005 3 packages Suppose we want to look for a vacation package Suppose we compare package a and package c We want to have a cheaper package. We want to have a higher hotel-class. We know that 1.Package a has a cheaper price 2.Package c has a higher hotel-class Package a “ dominates ” package b We cannot determine 1.whether package a is better than package c (i.e., package a dominates package c) 2.whether package c is better than package a (i.e., package c dominates package a) Package a is NOT dominated by any other packages. Package c is NOT dominated by any other packages. Thus, package a and package c are all of the “ best ” possible choices. We call that package a and package c are skyline points. Points are not dominated by any other points

6 6 1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) 6 packages Suppose we want to look for a vacation package We want to have a cheaper package. We want to have a higher hotel-class. How about this one? Different customers may have different preferences on Hotel-group. Suppose a customer have the following preferences. H < T < M The skyline points are packages a and c. Suppose another customer have the following preferences. H < M < T The skyline points are packages a, c and e. In other words, different preferences give different skyline points.

7 7 1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) 6 packages Suppose we want to look for a vacation package Suppose a customer have the following preferences. H < T < M The skyline points are packages a and c. Suppose another customer have the following preferences. H < M < T The skyline points are packages a, c and e. In other words, different preferences give different skyline points. Suppose hotel-group Mozilla wants to promote its own packages (e.g., package f) to potential customers.

8 8 1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Suppose hotel-group Mozilla wants to promote its own packages (e.g., package f) to potential customers. CustomerPreference on Hotel- group Skyline AliceT < M{a, c} BobNo special preference{a, c, e, f} ChrisH < M {a, c, e} DavidH < M < T {a, c, e} EmilyH < T < M{a, c} FredM < T {a, c, e, f} Bob and Fred are the potential customers. What preferences make package f a skyline point? Preferences: No special preference M < T …

9 9 1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Suppose hotel-group Mozilla wants to promote its own packages (e.g., package e) to potential customers. CustomerPreference on Hotel- group Skyline AliceT < M{a, c} BobNo special preference{a, c, e, f} ChrisH < M {a, c, e} DavidH < M < T {a, c, e} EmilyH < T < M{a, c} FredM < T {a, c, e, f} Bob, Chris, David and Fred are the potential customers. What preferences make package e a skyline point? Preferences: No special preference H < M H < M < T M < T … Problem: Given a package, we want to find what preferences or conditions that this package is a skyline point? Favorable facets

10 10 1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or conditions that this package is a skyline point? Favorable facets

11 11 1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or conditions that this package is a skyline point? Favorable facets

12 12 1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? {} SKY={a, c, e, f} SKY={a,c} {T < M} {H < M} SKY={a,c,e} {T < H} SKY={a,c,e,f} {H < T} SKY={a,c,e,f} {M < T} SKY={a,c,e,f} {M < H} {T < M, H < M}{T < M, T < M}{H < T, H < M}{T < H, M < H} … SKY={a,c} SKY={a,c,e}SKY={a,c,e,f} {T < M, T < M, H < M}{T < M, T < M, M < H} SKY={a,c} T SKY={} We can solve the problem by a naive method: Lattice Search

13 13 1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? {} SKY={a, c, e, f} SKY={a,c} {T < M} {H < M} SKY={a,c,e} {T < H} SKY={a,c,e,f} {H < T} SKY={a,c,e,f} {M < T} SKY={a,c,e,f} {M < H} {T < M, H < M}{T < M, T < M}{H < T, H < M}{T < H, M < H} … SKY={a,c} SKY={a,c,e}SKY={a,c,e,f} {T < M, T < M, H < M}{T < M, T < M, M < H} SKY={a,c} T SKY={} We can solve the problem by a naive method: Lattice Search Consider package f Preferences: {}, {T < H}, {H < T}, {M < T} {M < H},, {T < H, M < H} This approach has two disadvantages. 1. Computation is costly. 2. It is difficult to interpret the results. We need to compute all skyline points for each possible preference There are many preferences which qualify package f as a skyline point

14 14 1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? {} SKY={a, c, e, f} SKY={a,c} {T < M} {H < M} SKY={a,c,e} {T < H} SKY={a,c,e,f} {H < T} SKY={a,c,e,f} {M < T} SKY={a,c,e,f} {M < H} {T < M, H < M}{T < M, T < M}{H < T, H < M}{T < H, M < H} … SKY={a,c} SKY={a,c,e}SKY={a,c,e,f} {T < M, T < M, H < M}{T < M, T < M, M < H} SKY={a,c} T SKY={} We can solve the problem by a naive method: Lattice Search Consider package f border for f Skyline point Not skyline point We find that whenever the preference contains “ T < M ” or “ H < M ”, package f is not a skyline point.

15 15 1. Introduction Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a package, we want to find what preferences or favorable facets that this package is a skyline point? {} SKY={a, c, e, f} SKY={a,c} {T < M} {H < M} SKY={a,c,e} {T < H} SKY={a,c,e,f} {H < T} SKY={a,c,e,f} {M < T} SKY={a,c,e,f} {M < H} {T < M, H < M}{T < M, T < M}{H < T, H < M}{T < H, M < H} … SKY={a,c} SKY={a,c,e}SKY={a,c,e,f} {T < M, T < M, H < M}{T < M, T < M, M < H} SKY={a,c} T SKY={} We can solve the problem by a naive method: Lattice Search Consider package f border for f Skyline point Not skyline point We find that whenever the preference contains “ T < M ” or “ H < M ”, package f is not a skyline point. We can say that “ T < M ” or “ H < M ” is a minimal disqualifying condition (MDC). Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?

16 16 3. Algorithm How to find MDCs of a point? Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?

17 17 3. Algorithm Package IDPriceHotel-classHotel-group a10004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Point q is said to quasi-dominate point p if all attributes of point q are NOT worse than those of point p. e.g. Package a quasi-dominates package f because 1. Package a has a lower (or better) price than package f 2. Package a has a higher (or better) hotel-class than package f If package a quasi-dominates package f, we define R a  f as follows. {T < M}

18 18 3. Algorithm Two Algorithms MDC-O: Computing MDC On-the-fly Does not store MDCs of points Compute MDC of a given points on-the-fly MDC-M: A Materialization Method Store MDCs of all points Indexing Method for Speed-up R*-tree Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?

19 19 3.1 MDC-O: Computing MDC On-the-fly On-the-fly Algorithm Given data point p Variable MDC(p): minimal disqualifying condition Algorithm MDC(p)   For each data point q which quasi-dominates p if MDC(p) does not contain R q  p insert R q  p to MDC(p) Return MDC(p) Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?

20 20 3.2 MDC-M: A Materialization Method Materialization Algorithm Variable MDC(p): minimal disqualifying condition Algorithm MDC(p)   For each data point p For each data point q which quasi-dominates p if MDC(p) does not contain R q  p then insert R q  p to MDC(p) Store MDC(p) Query Algorithm Given A data point p Algorithm Return MDC(p) Problem: Given a package, we want to find what minimal conditions that this package is NOT a skyline point?

21 21 4. Empirical Study Datasets Synthetic Dataset Real Dataset (from UCI) Nursery Dataset Automobile Dataset Default Values (Synthetic) No. of tuples = 500K No. of numeric dimensions = 3 No. of categorical dimensions = 1 No. of values in a nominal dimension = 20

22 22 4. Empirical Study Without indexing: MDC-O: Slowest Search Time MDC-M: Faster Search Time Storage of MDC: 8MB With indexing: MDC-O and MDC-M: Fast Search Time

23 23 4. Empirical Study Automobile Three car models CarMDC Honda “ Toyota < Honda ” Mitsubishi “ Honda < Mitsubishi ” or “ Toyota < Mitsubishi ” Toyota- A salesperson should NOT promote this car to the customer who prefers Toyota to Honda. A salesperson should promote this car to the customer who prefers Mitsubishi to others. A salesperson should promote this car to ANY customers.

24 24 5. Conclusion Skyline Favorable Facets Minimal Disqualifying Condition Algorithm On-the-fly Materialization Empirical Study

25 25 Q&A Poster Board Title: Mining Favorable Facets Date: Monday, 13th August Place: Poster board carrying number 31

26 26 3.3 Speedup Build an R*-tree based on the totally- ordered attributes For each point p, MDC(p)   Perform a range search from 0 to the value of dimension D of p for each dimension D For each point q found in the range search insert R q  p into MDC(p) p q 0 a better value All points (e.g., point q) in this region quasi-dominate point p


Download ppt "1 Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University."

Similar presentations


Ads by Google