Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science and Technology [2] University of Waterloo Presented by Qian Wan Prepared by Qian Wan 1
Outline Background – Skyline, Related Work Motivation – Examples, Problem Definition Algorithm – Framework, Grouping, Pruning Experiments – Synthetic, Real data – 6 factors Conclusions 2
Skyline Definition – Skyline contains the points which are not dominated by others Hotel searching problem – Distance to beach VS Price – Dominance – Skyline 3 Dist Price H3H3 H5H5 H7H7 H9H9 H1H1 H2H2 H4H4 H6H6 H8H8 Dist Price H1H1 H2H2
Related Work Skyline Queries in DBMS [S.Borzsonyi, 2001] Single Table Skyline Queries – Bitmaps [K.L. Tan,2001], Nearest Neighbor [D.Kossomann, 2002], Branch and Bound Skylines [D.Papadias, 2005] Multi-Table Skyline Queries – Natural Join [W.Jin, 2007][D.Sun, 2008] – Our Work Join different source tables via a “Cartesian product” like procedure. 4
Outline Background – Skyline, Related Work Motivation – Examples, Problem Definition Algorithm – Framework, Grouping, Pruning Experiments – Synthetic, Real data – 6 factors Conclusions 5
A Travel Agency’s Database 6 PackageNo-of- stops Distance- to-beach Hotel-classPrice P P P P Existing Vacation Packages HotelDistance- to-beach Hotel- class Hotel- cost H11003 H H FlightNo-of- stops Flight- cost F10120 F21100 PackageNo-of- stops Distance-to- beach Hotel-classPrice Q1(F1:H1) Q2(F1,H2) Q3(F1, H3) …………… Q24(f4,h6) Newly Created Vacation Packages Source Tables 1.Direct attributes 2.Indirect attributes 3.One indirect attribute characteristic e.g. Travel Agency (Price), PC Manufacture(Price) and Logistic Transportation Service (Price) Skyline tuples
Finding Competitive Products Given a set of source tables Market packages New packages Then, a tuple q in T Q is said to be competitive product if q is in Skyline with respect to 7
Naïve Solution 8 HotelDistance-to- beach Hotel- class Hotel- cost H11003 H H H41502 H H FlightNo-of- stops Flight- cost F10120 F21100 F3280 F4290 PackageNo-of- stops Distance- to-beach Hotel- class Price Q1(f1:h 1) Q2(f1,h 2) Q3(f1, h3) …………… Q7(f2,h 1) …………… Q13(f3, h1) …………… Q24(f4, h6) Packag e No-of- stops Distanc e-to- beach Hotel- class Price P P P P Intra-dominance checking 2.Inter-dominance checking Source Tables Existing Vacation Packages Newly Created Vacation Packages Packag e No- of- stops Distan ce-to- beach Hotel- class Price Q1(f1 :h1) Q2(f1,h2) Q3(f1, h3) …………… Q7(f2,h1) …………… Q13(f 3,h1) Competitive Products
Outline Background – Skyline, Related Work Motivation – Examples, Problem Definition Algorithm – Framework, Grouping, Pruning Experiments – Synthetic, Real data – 6 factors Conclusions 9
Algorithm Overview Intra-dominance checking (Framework) – To Find Skyline in Source Tables Inter-dominance checking – Skyline in Existing Market Packages – R* Tree Indies in Existing Market Packages – Full Pruning – Partial Pruning Post-processing 10
Intra-dominance Checking 11 HotelDistance-to- beach Hotel- class Hotel- cost H11003 H H H41502 H H FlightNo-of- stops Flight- cost F10120 F21100 F3280 F4290 PackageNo-of- stops Distance- to-beach Hotel- class Price Q1(f1:h 1) Q2(f1,h 2) Q3(f1, h3) …………… Q7(f2,h 1) …………… Q13(f3, h1) …………… Q15(f3, h5) HotelDistance-to- beach Hotel- class Hotel- cost H11003 H H H41502 H FlightNo-of- stops Flight- cost F10120 F21100 F3280 Skyline Tuples of Source Tables Newly Created Vacation Packages 1.NO intra-dominance checking(one indirect attribute) 2.NO competitive products are missing PackageNo-of- stops Distanc e-to- beach Hotel- class Price Q1(f1: h1) Q2(f1, h2) Q3(f1, h3) …………… Q7(f2, h1) …………… Q13(f 3,h1) Competitive Products
Algorithm Overview Intra-dominance checking (Framework) – To Find Skyline in Source Tables Inter-dominance checking – Skyline in Existing Market Packages – R* Tree Indies in Existing Market Packages – Full Pruning – Partial Pruning Post-processing 12
Inter-dominance Checking PackageNo-of- stops Distance- to-beach Hotel- class Price P P P P PackageNo-of- stops Distance- to-beach Hotel- class Price P P P P PackageNo-of- stops Distance- to-beach Hotel- class Price P P P No Missing Competitive Products R* Tree will speedup the inter-dominance checking Existing Vacation Packages Skyline in Existing Vacation Packages R0R1R3R4R2R5 Inter-dominance Checking Range query
Algorithm Overview Intra-dominance checking (Framework) – To Find Skyline in Source Tables Inter-dominance checking – Skyline in Existing Market Packages – R* Tree Indies in Existing Market Packages – Full Pruning – Partial Pruning Post-processing 14
Grouping PackageNo-of- stops Distanc e-to- beach Hotel- class Price P P P PackageNo-of- stops Distance- to-beach Hotel- class Price Q1(f1:h 1) Q2(f1,h 2) Q3(f1, h3) …………… Q7(f2,h 1) …………… Q13(f3, h1) …………… Q15(f3, h5) HotelDistance-to- beach Hotel- class Hotel- cost H11003 H H H41502 H FlightNo-of- stops Flight- cost F10120 F21100 F3280 Skyline Tuples of Source Tables Newly Created Vacation Packages PackageNo-of- stops Distanc e-to- beach Hotel- class Price Q1(f1: h1) Q2(f1, h2) Q3(f1, h3) …………… Q7(f2, h1) …………… Q13(f 3,h1) Existing Vacation Packages Competitive Products A1 A2 B1 B2 C1={A1, B1} C4={A2, B2} Full Pruning
PackageNo-of- stops Distance- to-beach Hotel- class Price P P P Best Representative B1B1 B2B2 …………… BiBi …………… BjBj …………… BkBk Group C1C1 C2C2 …………… CiCi …………… CjCj …………… CkCk PackageNo-of- stops Distance- to-beach Hotel- class Price Q(f2:h4) Q’(f2,h5) PackageNo-of- stops Distance- to-beach Hotel- class Price Min Quality of Best Representative: tightness of each group (Clustering, e.g. KMeans) Best Representative
Algorithm Overview Intra-dominance checking (Framework) – To Find Skyline in Source Tables Inter-dominance checking – Skyline in Existing Market Packages – R* Tree Indies in Existing Market Packages – Full Pruning – Partial Pruning Post-processing 17
Partial Pruning Partial Pruning Full pruning prunes all members in the group Partial pruning prunes some members in the group Partial pruning is used when full pruning cannot be applied Idea Direct attribute does not change Estimate the best possible value for indirect attributes Eliminate a combination, if It is dominated on all direct attributes It is dominated on all indirect attributes according to their best estimation 18
Algorithm Overview Framework Intra-dominance checking – To Find Skyline in Source Tables Inter-dominance checking – Skyline in Existing Market Packages – R* Tree Indies in Existing Market Packages – Full Pruning – Partial Pruning Post-processing 19
Post-processing More than one indirect attributes – Calculation Previous algorithm Intra-dominance checking – Any existing Skyline algorithm – Post-processing cost depends on the size of Competitive Products 20
Outline Background – Skyline, Related Work Motivation – Examples, Problem Definition Algorithm – Framework, Grouping, Pruning Experiments – Synthetic, Real data – 6 factors Conclusions 21
Experiments Pentium IV 2.4GHz PC with 4GB memory, Linux platform, C++ Synthetic anti-correlated datasets Real datasets, Travel Agency A and Travel Agency B – A, 296 packages, 1014 hotels and 4394 flights – B, 149 packages, 995 hotels and 866 flights Implementation – Algorithm for Creating Competitive Products (ACCP) – Baseline algorithm – Naïve algorithm 22 PreprocessingR* TreePruning ACCPYes BaselineYes No NaïveNo
Synthetic Datasets ParametersDefault value No. of attributes in each source table4 No. of indirect attributes in a product table 1 No. of source tables2 No. of clusters in each source table2 Size of existing packages5M Size of each source table100k Schema is the same as example Anti-correlated 6 factors Measurement – Execution time – Pruning Power – Ratio of Competitive Products out of all combinations – Memory Usage 23
Experiments ParametersExecution timePruning PowerRatio of Competitive Products Memory Usage No. of attributes in each source table 1234 No. of indirect attributes in a product table 5678 No. of source tables No. of clusters in each source table Size of existing packages Size of each source table
Experiments 25 From 100k to 500k Full pruning & partial pruning T Q, T Q ’, and T R Pruning Power slightly increases ParametersDefault value No. of attributes in each source table4 No. of indirect attributes in a product table 1 No. of source tables2 No. of clusters in each source table6 Size of existing packages5M Size of each source table100k
Outline Background – Skyline Motivation – Examples & Problem Definition Algorithm – Framework, Partition, Pruning Experiments – On both synthetic and real data – Over 6 factors Conclusions 26
Conclusions Creating Competitive Products – Example – Problem Definition Algorithms – Framework – Intra-dominance checking – Inter-dominance checking – Post-processing Experiments – Synthetic anti-correlated datasets – Real datasets 27
THANK YOU ! Q&A 28
APPENDIX 29
Partial Pruning PackageNo-of- stops Distanc e-to- beach Hotel- class Price P P P PackageNo-of- stops Distance- to-beach Hotel- class Price Q1(f1:h 1) Q2(f1,h 2) Q3(f1, h3) …………… Q7(f2,h 1) …………… Q13(f3, h1) …………… Q15(f3, h5) HotelDistance-to- beach Hotel- class Hotel- cost H11003 H H H41502 H FlightNo-of- stops Flight- cost F10120 F21100 F3280 Skyline Tuples of Source Tables Newly Created Vacation Packages PackageNo-of- stops Distanc e-to- beach Hotel- class Price Q1(f1: h1) Q2(f1, h2) Q3(f1, h3) …………… Q7(f2, h1) …………… Q13(f 3,h1) Existing Vacation Packages Competitive Products A1 B1 C1={A1, B1} Full Pruning
Meta Transformation PackageNo-of- stops Distance- to-beach Hotel- class Price P P P PackageNo-of- stops Distance- to-beach Hotel- class Price P PackageNo-of- stops Price P21170 PackageDistance- to-beach Hotel-classPrice P HotelDistance-to- beach Hotel- class Hotel- cost H H H FlightNo-of- stops Flight- cost F10200 F21180 No inter-dominance checking for {F2} X{H2} Meta-Hotel Meta-Flight Min1100 Min HotelDistance- to-beach Hotel- class Hotel- cost H11003 H H FlightNo-of- stops Flight- cost F10120 F21100 A1 B1
Experiments 32 From 2.5M to 10M ParametersDefault value No. of attributes in each source table4 No. of indirect attributes in a product table 1 No. of source tables2 No. of clusters in each source table6 Size of existing packages5M Size of each source table100k More competitive Slightly decreases
Experiments 33 Travel Agency A Package Generation Set 1.A, 296 packages, 1014 hotels and 4394 flights. B, 149 packages, 995 hotels and 866 flights 2.Source tables from B, and Package from A 3.Vary discount from 0 to Efficiency ACCP(44.74s) and Baseline (84.47s) 5.|SKY|/|T Q | 6.|DOM|/|T E | DOM SKY