Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science.

Slides:



Advertisements
Similar presentations
1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology.
Advertisements

Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Modeling and Querying Possible Repairs in Duplicate Detection George Beskales Mohamed A. Soliman Ihab F. Ilyas Shai Ben-David.
13/04/20151 SPARK: Top- k Keyword Query in Relational Database Wei Wang University of New South Wales Australia.
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
1 Chapter 5 : Query Processing and Optimization Group 4: Nipun Garg, Surabhi Mithal
Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 gStore: Answering SPARQL Queries Via Subgraph Matching 1 Peking University, 2 Hong.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Frequent Closed Pattern Search By Row and Feature Enumeration
1 Finding Shortest Paths on Terrains by Killing Two Birds with One Stone Manohar Kaul (Aarhus University) Raymond Chi-Wing Wong (Hong Kong University of.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations Lu-An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han University of.
University of Minnesota CG_Hadoop: Computational Geometry in MapReduce Ahmed Eldawy* Yuan Li* Mohamed F. Mokbel*$ Ravi Janardan* * Department of Computer.
Da Yan, Zhou Zhao and Wilfred Ng The Hong Kong University of Science and Technology.
July 29HDMS'08 Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management.
Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research.
Spatio-temporal Databases Time Parameterized Queries.
Optimization of Spatial Joins on Mobile Devices N. Mamoulis 1, P. Kalnis 2, S. Bakiras 3, X. Li 2 1 Department of Computer Science and Information Systems,
1 Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor Raymond Chi-Wing Wong (Hong Kong University of Science and Technology) M. Tamer.
Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,
1 Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
1 Efficient Algorithms for Optimal Location Queries in Road Networks Zitong Chen (Sun Yat-Sen University) Yubao Liu (Sun Yat-Sen University) Raymond Chi-Wing.
1 Hypersphere Dominance: An Optimal Approach Cheng Long, Raymond Chi-Wing Wong, Bin Zhang, Min Xie The Hong Kong University of Science and Technology Prepared.
Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces Jian Pei # Wen Jin # Martin Ester # Yufei Tao + # Simon Fraser University,
1 On Querying Historical Evolving Graph Sequences Chenghui Ren $, Eric Lo *, Ben Kao $, Xinjie Zhu $, Reynold Cheng $ $ The University of Hong Kong $ {chren,
Skyline Queries Against Mobile Lightweight Devices in MANETs Zhiyong Huang 1 Christian S. Jensen 2 Hua Lu 1 Beng Chin Ooi 1 1 National University of Singapore,
SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:
Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science.
1 Exact Top-k Nearest Keyword Search in Large Networks Minhao Jiang†, Ada Wai-Chee Fu‡, Raymond Chi-Wing Wong† † The Hong Kong University of Science and.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Mining High Utility Itemset in Big Data
Efficient Progressive Processing of Skyline Queries in Peer-to-Peer Systems INFOSCALE’06.
RELAXED REVERSE NEAREST NEIGHBORS QUERIES Arif Hidayat Muhammad Aamir Cheema David Taniar.
Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University.
Efficient Processing of Top-k Spatial Preference Queries
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao Peking University, 2 Hong.
1 On Optimal Worst-Case Matching Cheng Long (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and.
Clustering of Uncertain data objects by Voronoi- diagram-based approach Speaker: Chan Kai Fong, Paul Dept of CS, HKU.
The σ-neighborhood skyline queries Chen, Yi-Chung; LEE, Chiang. The σ-neighborhood skyline queries. Information Sciences, 2015, 322: 張天彥 2015/12/05.
Efficient Computation of Combinatorial Skyline Queries Author: Yu-Chi Chung, I-Fang Su, and Chiang Lee Source: Information Systems, 38(2013), pp
On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU.
1 Finding Competitive Price Yu Peng (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and Technology)
Aggregate Function Computation and Iceberg Querying in Vertical Databases Yue (Jenny) Cui Advisor: Dr. William Perrizo Master Thesis Oral Defense Department.
Exploiting Multithreaded Architectures to Improve Data Management Operations Layali Rashid The Advanced Computer Architecture U of C (ACAG) Department.
Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree
Handling Data Skew in Parallel Joins in Shared-Nothing Systems Yu Xu, Pekka Kostamaa, XinZhou (Teradata) Liang Chen (University of California) SIGMOD’08.
HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Yue (Jenny) Cui and William Perrizo North Dakota State University
Yue (Jenny) Cui and William Perrizo North Dakota State University
Sameh Shohdy, Yu Su, and Gagan Agrawal
Spatio-temporal Pattern Queries
Conflict-Aware Event-Participant Arrangement
Relaxing Join and Selection Queries
The Skyline Query in Databases Which Objects are the Most Important?
Efficient Processing of Top-k Spatial Preference Queries
Fraction-Score: A New Support Measure for Co-location Pattern Mining
Presentation transcript:

Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science and Technology [2] University of Waterloo Presented by Qian Wan Prepared by Qian Wan 1

Outline Background – Skyline, Related Work Motivation – Examples, Problem Definition Algorithm – Framework, Grouping, Pruning Experiments – Synthetic, Real data – 6 factors Conclusions 2

Skyline Definition – Skyline contains the points which are not dominated by others Hotel searching problem – Distance to beach VS Price – Dominance – Skyline 3 Dist Price H3H3 H5H5 H7H7 H9H9 H1H1 H2H2 H4H4 H6H6 H8H8 Dist Price H1H1 H2H2

Related Work Skyline Queries in DBMS [S.Borzsonyi, 2001] Single Table Skyline Queries – Bitmaps [K.L. Tan,2001], Nearest Neighbor [D.Kossomann, 2002], Branch and Bound Skylines [D.Papadias, 2005] Multi-Table Skyline Queries – Natural Join [W.Jin, 2007][D.Sun, 2008] – Our Work Join different source tables via a “Cartesian product” like procedure. 4

Outline Background – Skyline, Related Work Motivation – Examples, Problem Definition Algorithm – Framework, Grouping, Pruning Experiments – Synthetic, Real data – 6 factors Conclusions 5

A Travel Agency’s Database 6 PackageNo-of- stops Distance- to-beach Hotel-classPrice P P P P Existing Vacation Packages HotelDistance- to-beach Hotel- class Hotel- cost H11003 H H FlightNo-of- stops Flight- cost F10120 F21100 PackageNo-of- stops Distance-to- beach Hotel-classPrice Q1(F1:H1) Q2(F1,H2) Q3(F1, H3) …………… Q24(f4,h6) Newly Created Vacation Packages Source Tables 1.Direct attributes 2.Indirect attributes 3.One indirect attribute characteristic e.g. Travel Agency (Price), PC Manufacture(Price) and Logistic Transportation Service (Price) Skyline tuples

Finding Competitive Products Given a set of source tables Market packages New packages Then, a tuple q in T Q is said to be competitive product if q is in Skyline with respect to 7

Naïve Solution 8 HotelDistance-to- beach Hotel- class Hotel- cost H11003 H H H41502 H H FlightNo-of- stops Flight- cost F10120 F21100 F3280 F4290 PackageNo-of- stops Distance- to-beach Hotel- class Price Q1(f1:h 1) Q2(f1,h 2) Q3(f1, h3) …………… Q7(f2,h 1) …………… Q13(f3, h1) …………… Q24(f4, h6) Packag e No-of- stops Distanc e-to- beach Hotel- class Price P P P P Intra-dominance checking 2.Inter-dominance checking Source Tables Existing Vacation Packages Newly Created Vacation Packages Packag e No- of- stops Distan ce-to- beach Hotel- class Price Q1(f1 :h1) Q2(f1,h2) Q3(f1, h3) …………… Q7(f2,h1) …………… Q13(f 3,h1) Competitive Products

Outline Background – Skyline, Related Work Motivation – Examples, Problem Definition Algorithm – Framework, Grouping, Pruning Experiments – Synthetic, Real data – 6 factors Conclusions 9

Algorithm Overview Intra-dominance checking (Framework) – To Find Skyline in Source Tables Inter-dominance checking – Skyline in Existing Market Packages – R* Tree Indies in Existing Market Packages – Full Pruning – Partial Pruning Post-processing 10

Intra-dominance Checking 11 HotelDistance-to- beach Hotel- class Hotel- cost H11003 H H H41502 H H FlightNo-of- stops Flight- cost F10120 F21100 F3280 F4290 PackageNo-of- stops Distance- to-beach Hotel- class Price Q1(f1:h 1) Q2(f1,h 2) Q3(f1, h3) …………… Q7(f2,h 1) …………… Q13(f3, h1) …………… Q15(f3, h5) HotelDistance-to- beach Hotel- class Hotel- cost H11003 H H H41502 H FlightNo-of- stops Flight- cost F10120 F21100 F3280 Skyline Tuples of Source Tables Newly Created Vacation Packages 1.NO intra-dominance checking(one indirect attribute) 2.NO competitive products are missing PackageNo-of- stops Distanc e-to- beach Hotel- class Price Q1(f1: h1) Q2(f1, h2) Q3(f1, h3) …………… Q7(f2, h1) …………… Q13(f 3,h1) Competitive Products

Algorithm Overview Intra-dominance checking (Framework) – To Find Skyline in Source Tables Inter-dominance checking – Skyline in Existing Market Packages – R* Tree Indies in Existing Market Packages – Full Pruning – Partial Pruning Post-processing 12

Inter-dominance Checking PackageNo-of- stops Distance- to-beach Hotel- class Price P P P P PackageNo-of- stops Distance- to-beach Hotel- class Price P P P P PackageNo-of- stops Distance- to-beach Hotel- class Price P P P No Missing Competitive Products R* Tree will speedup the inter-dominance checking Existing Vacation Packages Skyline in Existing Vacation Packages R0R1R3R4R2R5 Inter-dominance Checking  Range query

Algorithm Overview Intra-dominance checking (Framework) – To Find Skyline in Source Tables Inter-dominance checking – Skyline in Existing Market Packages – R* Tree Indies in Existing Market Packages – Full Pruning – Partial Pruning Post-processing 14

Grouping PackageNo-of- stops Distanc e-to- beach Hotel- class Price P P P PackageNo-of- stops Distance- to-beach Hotel- class Price Q1(f1:h 1) Q2(f1,h 2) Q3(f1, h3) …………… Q7(f2,h 1) …………… Q13(f3, h1) …………… Q15(f3, h5) HotelDistance-to- beach Hotel- class Hotel- cost H11003 H H H41502 H FlightNo-of- stops Flight- cost F10120 F21100 F3280 Skyline Tuples of Source Tables Newly Created Vacation Packages PackageNo-of- stops Distanc e-to- beach Hotel- class Price Q1(f1: h1) Q2(f1, h2) Q3(f1, h3) …………… Q7(f2, h1) …………… Q13(f 3,h1) Existing Vacation Packages Competitive Products A1 A2 B1 B2 C1={A1, B1} C4={A2, B2} Full Pruning

PackageNo-of- stops Distance- to-beach Hotel- class Price P P P Best Representative B1B1 B2B2 …………… BiBi …………… BjBj …………… BkBk Group C1C1 C2C2 …………… CiCi …………… CjCj …………… CkCk PackageNo-of- stops Distance- to-beach Hotel- class Price Q(f2:h4) Q’(f2,h5) PackageNo-of- stops Distance- to-beach Hotel- class Price Min Quality of Best Representative: tightness of each group (Clustering, e.g. KMeans) Best Representative

Algorithm Overview Intra-dominance checking (Framework) – To Find Skyline in Source Tables Inter-dominance checking – Skyline in Existing Market Packages – R* Tree Indies in Existing Market Packages – Full Pruning – Partial Pruning Post-processing 17

Partial Pruning  Partial Pruning  Full pruning prunes all members in the group  Partial pruning prunes some members in the group  Partial pruning is used when full pruning cannot be applied  Idea  Direct attribute does not change  Estimate the best possible value for indirect attributes  Eliminate a combination, if  It is dominated on all direct attributes  It is dominated on all indirect attributes according to their best estimation 18

Algorithm Overview Framework Intra-dominance checking – To Find Skyline in Source Tables Inter-dominance checking – Skyline in Existing Market Packages – R* Tree Indies in Existing Market Packages – Full Pruning – Partial Pruning Post-processing 19

Post-processing More than one indirect attributes – Calculation Previous algorithm  Intra-dominance checking – Any existing Skyline algorithm – Post-processing cost depends on the size of Competitive Products 20

Outline Background – Skyline, Related Work Motivation – Examples, Problem Definition Algorithm – Framework, Grouping, Pruning Experiments – Synthetic, Real data – 6 factors Conclusions 21

Experiments Pentium IV 2.4GHz PC with 4GB memory, Linux platform, C++ Synthetic anti-correlated datasets Real datasets, Travel Agency A and Travel Agency B – A, 296 packages, 1014 hotels and 4394 flights – B, 149 packages, 995 hotels and 866 flights Implementation – Algorithm for Creating Competitive Products (ACCP) – Baseline algorithm – Naïve algorithm 22 PreprocessingR* TreePruning ACCPYes BaselineYes No NaïveNo

Synthetic Datasets ParametersDefault value No. of attributes in each source table4 No. of indirect attributes in a product table 1 No. of source tables2 No. of clusters in each source table2 Size of existing packages5M Size of each source table100k Schema is the same as example Anti-correlated 6 factors Measurement – Execution time – Pruning Power – Ratio of Competitive Products out of all combinations – Memory Usage 23

Experiments ParametersExecution timePruning PowerRatio of Competitive Products Memory Usage No. of attributes in each source table 1234 No. of indirect attributes in a product table 5678 No. of source tables No. of clusters in each source table Size of existing packages Size of each source table

Experiments 25 From 100k to 500k Full pruning & partial pruning T Q, T Q ’, and T R Pruning Power slightly increases ParametersDefault value No. of attributes in each source table4 No. of indirect attributes in a product table 1 No. of source tables2 No. of clusters in each source table6 Size of existing packages5M Size of each source table100k

Outline Background – Skyline Motivation – Examples & Problem Definition Algorithm – Framework, Partition, Pruning Experiments – On both synthetic and real data – Over 6 factors Conclusions 26

Conclusions Creating Competitive Products – Example – Problem Definition Algorithms – Framework – Intra-dominance checking – Inter-dominance checking – Post-processing Experiments – Synthetic anti-correlated datasets – Real datasets 27

THANK YOU ! Q&A 28

APPENDIX 29

Partial Pruning PackageNo-of- stops Distanc e-to- beach Hotel- class Price P P P PackageNo-of- stops Distance- to-beach Hotel- class Price Q1(f1:h 1) Q2(f1,h 2) Q3(f1, h3) …………… Q7(f2,h 1) …………… Q13(f3, h1) …………… Q15(f3, h5) HotelDistance-to- beach Hotel- class Hotel- cost H11003 H H H41502 H FlightNo-of- stops Flight- cost F10120 F21100 F3280 Skyline Tuples of Source Tables Newly Created Vacation Packages PackageNo-of- stops Distanc e-to- beach Hotel- class Price Q1(f1: h1) Q2(f1, h2) Q3(f1, h3) …………… Q7(f2, h1) …………… Q13(f 3,h1) Existing Vacation Packages Competitive Products A1 B1 C1={A1, B1} Full Pruning

Meta Transformation PackageNo-of- stops Distance- to-beach Hotel- class Price P P P PackageNo-of- stops Distance- to-beach Hotel- class Price P PackageNo-of- stops Price P21170 PackageDistance- to-beach Hotel-classPrice P HotelDistance-to- beach Hotel- class Hotel- cost H H H FlightNo-of- stops Flight- cost F10200 F21180 No inter-dominance checking for {F2} X{H2} Meta-Hotel Meta-Flight Min1100 Min HotelDistance- to-beach Hotel- class Hotel- cost H11003 H H FlightNo-of- stops Flight- cost F10120 F21100 A1 B1

Experiments 32 From 2.5M to 10M ParametersDefault value No. of attributes in each source table4 No. of indirect attributes in a product table 1 No. of source tables2 No. of clusters in each source table6 Size of existing packages5M Size of each source table100k More competitive Slightly decreases

Experiments 33 Travel Agency A Package Generation Set 1.A, 296 packages, 1014 hotels and 4394 flights. B, 149 packages, 995 hotels and 866 flights 2.Source tables from B, and Package from A 3.Vary discount from 0 to Efficiency ACCP(44.74s) and Baseline (84.47s) 5.|SKY|/|T Q | 6.|DOM|/|T E | DOM SKY