Download presentation
Presentation is loading. Please wait.
Published byMyles Jones Modified over 9 years ago
1
Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science and Technology [2] University of Waterloo Presented by Qian Wan Prepared by Qian Wan 1
2
Outline Background – Skyline, Related Work Motivation – Examples, Problem Definition Algorithm – Framework, Grouping, Pruning Experiments – Synthetic, Real data – 6 factors Conclusions 2
3
Skyline Definition – Skyline contains the points which are not dominated by others Hotel searching problem – Distance to beach VS Price – Dominance – Skyline 3 Dist Price H3H3 H5H5 H7H7 H9H9 H1H1 H2H2 H4H4 H6H6 H8H8 Dist Price H1H1 H2H2
4
Related Work Skyline Queries in DBMS [S.Borzsonyi, 2001] Single Table Skyline Queries – Bitmaps [K.L. Tan,2001], Nearest Neighbor [D.Kossomann, 2002], Branch and Bound Skylines [D.Papadias, 2005] Multi-Table Skyline Queries – Natural Join [W.Jin, 2007][D.Sun, 2008] – Our Work Join different source tables via a “Cartesian product” like procedure. 4
5
Outline Background – Skyline, Related Work Motivation – Examples, Problem Definition Algorithm – Framework, Grouping, Pruning Experiments – Synthetic, Real data – 6 factors Conclusions 5
6
A Travel Agency’s Database 6 PackageNo-of- stops Distance- to-beach Hotel-classPrice P101302250 P211402170 P313001150 P411504300 Existing Vacation Packages HotelDistance- to-beach Hotel- class Hotel- cost H11003 H2200290 H3400180 FlightNo-of- stops Flight- cost F10120 F21100 PackageNo-of- stops Distance-to- beach Hotel-classPrice Q1(F1:H1)01003220 Q2(F1,H2)02002210 Q3(F1, H3)04001200 …………… Q24(f4,h6)22003210 Newly Created Vacation Packages Source Tables 1.Direct attributes 2.Indirect attributes 3.One indirect attribute characteristic e.g. Travel Agency (Price), PC Manufacture(Price) and Logistic Transportation Service (Price) Skyline tuples
7
Finding Competitive Products Given a set of source tables Market packages New packages Then, a tuple q in T Q is said to be competitive product if q is in Skyline with respect to 7
8
Naïve Solution 8 HotelDistance-to- beach Hotel- class Hotel- cost H11003 H2200290 H3400180 H41502 H51702140 H62003120 FlightNo-of- stops Flight- cost F10120 F21100 F3280 F4290 PackageNo-of- stops Distance- to-beach Hotel- class Price Q1(f1:h 1) 01003220 Q2(f1,h 2) 02002210 Q3(f1, h3) 04001200 …………… Q7(f2,h 1) 11003200 …………… Q13(f3, h1) 21003180 …………… Q24(f4, h6) 22003210 Packag e No-of- stops Distanc e-to- beach Hotel- class Price P101302250 P211402170 P313001150 P411504300 1.Intra-dominance checking 2.Inter-dominance checking Source Tables Existing Vacation Packages Newly Created Vacation Packages Packag e No- of- stops Distan ce-to- beach Hotel- class Price Q1(f1 :h1) 01003220 Q2(f1,h2) 02002210 Q3(f1, h3) 04001200 …………… Q7(f2,h1) 11003200 …………… Q13(f 3,h1) 21003180 Competitive Products
9
Outline Background – Skyline, Related Work Motivation – Examples, Problem Definition Algorithm – Framework, Grouping, Pruning Experiments – Synthetic, Real data – 6 factors Conclusions 9
10
Algorithm Overview Intra-dominance checking (Framework) – To Find Skyline in Source Tables Inter-dominance checking – Skyline in Existing Market Packages – R* Tree Indies in Existing Market Packages – Full Pruning – Partial Pruning Post-processing 10
11
Intra-dominance Checking 11 HotelDistance-to- beach Hotel- class Hotel- cost H11003 H2200290 H3400180 H41502 H51702140 H62003120 FlightNo-of- stops Flight- cost F10120 F21100 F3280 F4290 PackageNo-of- stops Distance- to-beach Hotel- class Price Q1(f1:h 1) 01003220 Q2(f1,h 2) 02002210 Q3(f1, h3) 04001200 …………… Q7(f2,h 1) 11003200 …………… Q13(f3, h1) 21003180 …………… Q15(f3, h5) 21703200 HotelDistance-to- beach Hotel- class Hotel- cost H11003 H2200290 H3400180 H41502 H51702140 FlightNo-of- stops Flight- cost F10120 F21100 F3280 Skyline Tuples of Source Tables Newly Created Vacation Packages 1.NO intra-dominance checking(one indirect attribute) 2.NO competitive products are missing PackageNo-of- stops Distanc e-to- beach Hotel- class Price Q1(f1: h1) 01003220 Q2(f1, h2) 02002210 Q3(f1, h3) 04001200 …………… Q7(f2, h1) 11003200 …………… Q13(f 3,h1) 21003180 Competitive Products
12
Algorithm Overview Intra-dominance checking (Framework) – To Find Skyline in Source Tables Inter-dominance checking – Skyline in Existing Market Packages – R* Tree Indies in Existing Market Packages – Full Pruning – Partial Pruning Post-processing 12
13
Inter-dominance Checking PackageNo-of- stops Distance- to-beach Hotel- class Price P101302250 P211402170 P313001150 P411504300 PackageNo-of- stops Distance- to-beach Hotel- class Price P101302250 P211402170 P313001150 P411504300 13 PackageNo-of- stops Distance- to-beach Hotel- class Price P101302250 P211402170 P313001150 No Missing Competitive Products R* Tree will speedup the inter-dominance checking Existing Vacation Packages Skyline in Existing Vacation Packages R0R1R3R4R2R5 Inter-dominance Checking Range query
14
Algorithm Overview Intra-dominance checking (Framework) – To Find Skyline in Source Tables Inter-dominance checking – Skyline in Existing Market Packages – R* Tree Indies in Existing Market Packages – Full Pruning – Partial Pruning Post-processing 14
15
Grouping PackageNo-of- stops Distanc e-to- beach Hotel- class Price P101302250 P211402170 P313001150 15 PackageNo-of- stops Distance- to-beach Hotel- class Price Q1(f1:h 1) 01003220 Q2(f1,h 2) 02002210 Q3(f1, h3) 04001200 …………… Q7(f2,h 1) 11003200 …………… Q13(f3, h1) 21003180 …………… Q15(f3, h5) 21703200 HotelDistance-to- beach Hotel- class Hotel- cost H11003 H2200290 H3400180 H41502 H51702140 FlightNo-of- stops Flight- cost F10120 F21100 F3280 Skyline Tuples of Source Tables Newly Created Vacation Packages PackageNo-of- stops Distanc e-to- beach Hotel- class Price Q1(f1: h1) 01003220 Q2(f1, h2) 02002210 Q3(f1, h3) 04001200 …………… Q7(f2, h1) 11003200 …………… Q13(f 3,h1) 21003180 Existing Vacation Packages Competitive Products A1 A2 B1 B2 C1={A1, B1} C4={A2, B2} Full Pruning
16
PackageNo-of- stops Distance- to-beach Hotel- class Price P101302250 P211402170 P313001150 16 Best Representative B1B1 B2B2 …………… BiBi …………… BjBj …………… BkBk Group C1C1 C2C2 …………… CiCi …………… CjCj …………… CkCk PackageNo-of- stops Distance- to-beach Hotel- class Price Q(f2:h4)11504250 Q’(f2,h5)11704240 PackageNo-of- stops Distance- to-beach Hotel- class Price Min11504240 Quality of Best Representative: tightness of each group (Clustering, e.g. KMeans) Best Representative
17
Algorithm Overview Intra-dominance checking (Framework) – To Find Skyline in Source Tables Inter-dominance checking – Skyline in Existing Market Packages – R* Tree Indies in Existing Market Packages – Full Pruning – Partial Pruning Post-processing 17
18
Partial Pruning Partial Pruning Full pruning prunes all members in the group Partial pruning prunes some members in the group Partial pruning is used when full pruning cannot be applied Idea Direct attribute does not change Estimate the best possible value for indirect attributes Eliminate a combination, if It is dominated on all direct attributes It is dominated on all indirect attributes according to their best estimation 18
19
Algorithm Overview Framework Intra-dominance checking – To Find Skyline in Source Tables Inter-dominance checking – Skyline in Existing Market Packages – R* Tree Indies in Existing Market Packages – Full Pruning – Partial Pruning Post-processing 19
20
Post-processing More than one indirect attributes – Calculation Previous algorithm Intra-dominance checking – Any existing Skyline algorithm – Post-processing cost depends on the size of Competitive Products 20
21
Outline Background – Skyline, Related Work Motivation – Examples, Problem Definition Algorithm – Framework, Grouping, Pruning Experiments – Synthetic, Real data – 6 factors Conclusions 21
22
Experiments Pentium IV 2.4GHz PC with 4GB memory, Linux platform, C++ Synthetic anti-correlated datasets Real datasets, Travel Agency A and Travel Agency B – A, 296 packages, 1014 hotels and 4394 flights – B, 149 packages, 995 hotels and 866 flights Implementation – Algorithm for Creating Competitive Products (ACCP) – Baseline algorithm – Naïve algorithm 22 PreprocessingR* TreePruning ACCPYes BaselineYes No NaïveNo
23
Synthetic Datasets ParametersDefault value No. of attributes in each source table4 No. of indirect attributes in a product table 1 No. of source tables2 No. of clusters in each source table2 Size of existing packages5M Size of each source table100k Schema is the same as example Anti-correlated 6 factors Measurement – Execution time – Pruning Power – Ratio of Competitive Products out of all combinations – Memory Usage 23
24
Experiments ParametersExecution timePruning PowerRatio of Competitive Products Memory Usage No. of attributes in each source table 1234 No. of indirect attributes in a product table 5678 No. of source tables 9101112 No. of clusters in each source table 13141516 Size of existing packages 17181920 Size of each source table 21222324
25
Experiments 25 From 100k to 500k Full pruning & partial pruning T Q, T Q ’, and T R Pruning Power slightly increases ParametersDefault value No. of attributes in each source table4 No. of indirect attributes in a product table 1 No. of source tables2 No. of clusters in each source table6 Size of existing packages5M Size of each source table100k
26
Outline Background – Skyline Motivation – Examples & Problem Definition Algorithm – Framework, Partition, Pruning Experiments – On both synthetic and real data – Over 6 factors Conclusions 26
27
Conclusions Creating Competitive Products – Example – Problem Definition Algorithms – Framework – Intra-dominance checking – Inter-dominance checking – Post-processing Experiments – Synthetic anti-correlated datasets – Real datasets 27
28
THANK YOU ! Q&A 28
29
APPENDIX 29
30
Partial Pruning PackageNo-of- stops Distanc e-to- beach Hotel- class Price P101302250 P211402170 P313001150 30 PackageNo-of- stops Distance- to-beach Hotel- class Price Q1(f1:h 1) 01003220 Q2(f1,h 2) 02002210 Q3(f1, h3) 04001200 …………… Q7(f2,h 1) 11003200 …………… Q13(f3, h1) 21003180 …………… Q15(f3, h5) 21703200 HotelDistance-to- beach Hotel- class Hotel- cost H11003 H2200290 H3400180 H41502 H51702140 FlightNo-of- stops Flight- cost F10120 F21100 F3280 Skyline Tuples of Source Tables Newly Created Vacation Packages PackageNo-of- stops Distanc e-to- beach Hotel- class Price Q1(f1: h1) 01003220 Q2(f1, h2) 02002210 Q3(f1, h3) 04001200 …………… Q7(f2, h1) 11003200 …………… Q13(f 3,h1) 21003180 Existing Vacation Packages Competitive Products A1 B1 C1={A1, B1} Full Pruning
31
Meta Transformation PackageNo-of- stops Distance- to-beach Hotel- class Price P101302250 P211402170 P313001150 PackageNo-of- stops Distance- to-beach Hotel- class Price P211402170 PackageNo-of- stops Price P21170 PackageDistance- to-beach Hotel-classPrice P21402170 HotelDistance-to- beach Hotel- class Hotel- cost H11003200 H22002190 H34001180 FlightNo-of- stops Flight- cost F10200 F21180 No inter-dominance checking for {F2} X{H2} Meta-Hotel Meta-Flight Min1100 Min400180 HotelDistance- to-beach Hotel- class Hotel- cost H11003 H2200290 H3400180 FlightNo-of- stops Flight- cost F10120 F21100 A1 B1
32
Experiments 32 From 2.5M to 10M ParametersDefault value No. of attributes in each source table4 No. of indirect attributes in a product table 1 No. of source tables2 No. of clusters in each source table6 Size of existing packages5M Size of each source table100k More competitive Slightly decreases
33
Experiments 33 Travel Agency A Package Generation Set 1.A, 296 packages, 1014 hotels and 4394 flights. B, 149 packages, 995 hotels and 866 flights 2.Source tables from B, and Package from A 3.Vary discount from 0 to 0.50 4.Efficiency ACCP(44.74s) and Baseline (84.47s) 5.|SKY|/|T Q | 6.|DOM|/|T E | DOM SKY
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.