Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order 1.

1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology.

Introduction to Computer Science 2 Lecture 7: Extended binary trees

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

Fast Algorithms For Hierarchical Range Histogram Constructions

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Indexing and Range Queries in Spatio-Temporal Databases

Maintaining Sliding Widow Skylines on Data Streams.

1 Finding Shortest Paths on Terrains by Killing Two Birds with One Stone Manohar Kaul (Aarhus University) Raymond Chi-Wing Wong (Hong Kong University of.

Active Learning and Collaborative Filtering

A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.

Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.

Indexes. An index on an attribute A of a relation is a data structure that makes it efficient to find those tuples that have a fixed value for attribute.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.

Using Trees to Depict a Forest Bin Liu, H. V. Jagadish EECS, University of Michigan, Ann Arbor Presented by Sergey Shepshelvich 1.

Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.

Probabilistic Inference Protection on Anonymized Data

Privacy Preserving Serial Data Publishing By Role Composition Yingyi Bu 1, Ada Wai-Chee Fu 1, Raymond Chi-Wing Wong 2, Lei Chen 2, Jiuyong Li 3 The Chinese.

On Efficient Spatial Matching Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Yufei Tao (the Chinese University of Hong Kong) Ada Wai-Chee.

1 Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor Raymond Chi-Wing Wong (Hong Kong University of Science and Technology) M. Tamer.

Computational Methods for Management and Economics Carla Gomes

1 Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University.

B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.

Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.

16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.

Quality-driven Integration of Heterogeneous Information System by Felix Naumann, et al. (VLDB1999) 17 Feb 2006 Presented by Heasoo Hwang.

Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The University of New South Wales & NICTA Sydney, Australia.

Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm.

1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.

Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science.

1 Efficient Algorithms for Optimal Location Queries in Road Networks Zitong Chen (Sun Yat-Sen University) Yubao Liu (Sun Yat-Sen University) Raymond Chi-Wing.

Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces Jian Pei # Wen Jin # Martin Ester # Yufei Tao + # Simon Fraser University,

SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,

The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.

©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.

Creating Competitive Products Qian Wan [1], Raymond Chi-Wing Wong [1], Ihab F. Ilyas [2], M. Tamer Ozsu [2], Yu Peng [1] [1] Hong Kong University of Science.

1 Exact Top-k Nearest Keyword Search in Large Networks Minhao Jiang†, Ada Wai-Chee Fu‡, Raymond Chi-Wing Wong† † The Hong Kong University of Science and.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

METU Department of Computer Eng Ceng 302 Introduction to DBMS Indexing Structures for Files by Pinar Senkul resources: mostly froom Elmasri, Navathe and.

K-Hit Query: Top-k Query Processing with Probabilistic Utility Function SIGMOD2015 Peng Peng, Raymond C.-W. Wong CSE, HKUST 1.

Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.

Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University.

Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.

Efficient Processing of Top-k Spatial Preference Queries

Probabilistic Contextual Skylines D. Sacharidis 1, A. Arvanitis 12, T. Sellis 12 1 Institute for the Management of Information Systems — “Athena” R.C.,

1 On Optimal Worst-Case Matching Cheng Long (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and.

ICS 253: Discrete Structures I Induction and Recursion King Fahd University of Petroleum & Minerals Information & Computer Science Department.

August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.

Efficient Computation of Combinatorial Skyline Queries Author: Yu-Chi Chung, I-Fang Su, and Chiang Lee Source: Information Systems, 38(2013), pp

1 Finding Competitive Price Yu Peng (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and Technology)

Answering Why-not Questions on Top-K Queries Andy He and Eric Lo The Hong Kong Polytechnic University.

Scalable Learning of Collective Behavior Based on Sparse Social Dimensions Lei Tang, Huan Liu CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/02/01.

Data Structures and Algorithms Instructor: Tesfaye Guta [M.Sc.] Haramaya University.

HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.

Tian Xia and Donghui Zhang Northeastern University

Updating SF-Tree Speaker: Ho Wai Shing.

Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS

Parallel Density-based Hybrid Clustering

Similarity Search: A Matching Based Approach

Overview of Query Evaluation

The Skyline Query in Databases Which Objects are the Most Important?

Efficient Processing of Top-k Spatial Preference Queries

Query Specific Ranking

Efficient Aggregation over Objects with Extent

2.2 Fixed-Point Iteration

Presentation transcript:

Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2, Tai Wong 2 and Yubao Liu 4 The Hong Kong University of Science and Technology 1 The Chinese University of Hong Kong 2 Simon Fraser University 3 Sun Yat-Sen University 4 Prepared by Raymond Chi-Wing Wong Presented by Raymond Chi-Wing Wong

Outline 1.Introduction a.Skyline b.Contributions 2.Problem Definition 3.Adaptive SFS 4.IPO-Tree 5.Conclusion

1. Introduction Package IDPriceHotel-class a16004 b24001 c packages Suppose we want to look for a vacation package Package a “ dominates ” package b We want to have a cheaper package. We want to have a higher hotel-class. We know that 1.Package a has a cheaper price 2.Package a has a higher hotel-class We want to find a set of packages which are NOT dominated by any other pacakges All of the “ best ” possible choices. i.e., {a, c} skyline

1. Introduction Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) 6 packages Suppose we want to look for a vacation package We want to have a cheaper package. We want to have a higher hotel-class. How about this one? Different customers may have different preferences on Hotel-group. Suppose a customer has the following preferences. H < T < M The skyline points are packages a and c. Suppose another customer has the following preferences. H < M < T The skyline points are packages a, c and e. In other words, different preferences give different skyline points.

1. Introduction Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) 6 packages Suppose we want to look for a vacation package Suppose a customer has the following preferences. H < T < M The skyline points are packages a and c. Suppose another customer has the following preferences. H < M < T The skyline points are packages a, c and e. In other words, different preferences give different skyline points. Problem: Given a preference on Hotel-group, we want to find the skyline with respect to this preference efficiently

Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a preference on Hotel-group, we want to find the skyline with respect to this preference efficiently 1. Introduction

Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a preference on Hotel-group, we want to find the skyline with respect to this preference efficiently

1. Introduction Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a preference on Hotel-group, we want to find the skyline with respect to this preference efficiently Straightforward solution: Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query It works. However, this solution is not scalable and the results cannot be returned efficiently.

1. Introduction Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a preference on Hotel-group, we want to find the skyline with respect to this preference efficiently Straightforward solution: Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query Full Materialization solution: Pre-computation: For each possible preference, (1) pre-compute the skyline and (2) store it in a storage Skyline Query: return the stored skyline directly for a skyline query It works when there are limited number of preferences. However, this solution is not scalable when there are a lot of possible preferences. e.g. three nominal attributes (like Hotel-Group) each of which contains 40 possible values there are 4.1 x 10 9 possible preferences (in our problem setting).

1. Introduction Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a preference on Hotel-group, we want to find the skyline with respect to this preference efficiently Straightforward solution: Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query Full Materialization solution: Pre-computation: For each possible preference, (1) pre-compute the skyline and (2) store it in a storage Skyline Query: return the stored skyline directly for a skyline query Semi-Materialization solution: Pre-computation: For SOME possible preferences, (1) pre-compute the skyline and (2) store it in a storage Skyline Query: return the stored skyline directly OR with simple operations for a skyline query Good tradeoff between storage consumption and efficiency

1. Introduction Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given a preference on Hotel-group, we want to find the skyline with respect to this preference efficiently Straightforward solution: Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query Full Materialization solution: Pre-computation: For each possible preference, (1) pre-compute the skyline and (2) store it in a storage Skyline Query: return the stored skyline directly for a skyline query Semi-Materialization solution: Pre-computation: For SOME possible preferences, (1) pre-compute the skyline and (2) store it in a storage Skyline Query: return the stored skyline directly OR with simple operations for a skyline query Adaptive SFS IPO-Tree (Implicit Preference Order Tree) Questions: 1.What preferences should be stored? 2.With these preferences, how can we perform a skyline query efficiently?

1. Contributions Most Existing Work Assume that each attribute has a certain ordering (either totally ordered or partially ordered) on the attribute values Our Work Different users can have different preferences (i.e., the ordering on attribute values are different with different users) Propose a semi-materialization method IPO- tree to answer the skyline query efficiently.

2. Problem Definition Usually, a user should NOT specify an ordering on all possible values on attribute Hotel-Group Only list a few of the most favorite choices e.g. M < H < * Implicit preference Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla)

2. Problem Definition Usually, a user should NOT specify an ordering on all possible values on attribute Hotel-Group Only list a few of the most favorite choices e.g. M < H < * Implicit preference Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) A user prefers M to H.

2. Problem Definition Usually, a user should NOT specify an ordering on all possible values on attribute Hotel-Group Only list a few of the most favorite choices e.g. M < H < * Implicit preference Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) A user prefers H to *. All possible values in attribute Hotel-group other than “ M ” and “ H ” (in this case, “ T ” ) This is the reason why we call an implicit preference. Problem: Given an implicit preference on Hotel-group, we want to find the skyline with respect to this preference efficiently

2. Problem Definition Usually, a user should NOT specify an ordering on all possible values on attribute Hotel-Group Only list a few of the most favorite choices e.g. M < H < * Implicit preference Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given an implicit preference on Hotel-group, we want to find the skyline with respect to this preference efficiently Binary orders = { } All possible values in attribute Hotel-group other than “ M ” and “ H ” (in this case, “ T ” ) M<H

2. Problem Definition Usually, a user should NOT specify an ordering on all possible values on attribute Hotel-Group Only list a few of the most favorite choices e.g. M < H < * Implicit preference Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given an implicit preference on Hotel-group, we want to find the skyline with respect to this preference efficiently Binary orders = { } All possible values in attribute Hotel-group other than “ M ” and “ H ” (in this case, “ T ” ) M<H, M<T

2. Problem Definition Usually, a user should NOT specify an ordering on all possible values on attribute Hotel-Group Only list a few of the most favorite choices e.g. M < H < * Implicit preference Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given an implicit preference on Hotel-group, we want to find the skyline with respect to this preference efficiently Binary orders = { } All possible values in attribute Hotel-group other than “ M ” and “ H ” (in this case, “ T ” ) M<H, M<T, H<T

2. Problem Definition Usually, a user should NOT specify an ordering on all possible values on attribute Hotel-Group Only list a few of the most favorite choices e.g. M < H < * Implicit preference Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Problem: Given an implicit preference on Hotel-group, we want to find the skyline with respect to this preference efficiently Since the user gives only TWO choices, we define the order of his preference to be TWO. We also call this preference the second-order implicit preference. All possible values in attribute Hotel-group other than “ M ” and “ H ” (in this case, “ T ” ) Idea of our proposed semi-materialization IPO-tree 1.Store the skyline wrt the first-order implicit preference ONLY 2.Find the skyline wrt the implicit preference of any ordering from the skyline wrt the first-order implicit preference Questions: 1.What preferences should be stored? 2.With these preferences, how can we perform a skyline query efficiently?

3. Adaptive SFS Straightforward solution: Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query Full Materialization solution: Pre-computation: For each possible preference, (1) pre-compute the skyline and (2) store it in a storage Skyline Query: return the stored skyline directly for a skyline query Semi-Materialization solution: Pre-computation: For SOME possible preferences, (1) pre-compute the skyline and (2) store it in a storage Skyline Query: return the stored skyline directly OR with simple operations for a skyline query Adaptive SFS IPO-Tree (Implicit Preference Order Tree)

3. Adaptive SFS Original SFS Idea: Suppose we have a function f Each tuple is assigned with a score obtained by f Sort the tuples in ascending order of the scores Process the tuples with this ordering Adaptive SFS Similar idea However, the original score function is based on Numeric attributes NOT nominal attributes What we change is the score function Idea: 1. Pre-Computation: first pre-sort the tuples according to this new score function 2. Skyline Query: re-sort the tuples for a skyline query

4. IPO-Tree Straightforward solution: Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query Full Materialization solution: Pre-computation: For each possible preference, (1) pre-compute the skyline and (2) store it in a storage Skyline Query: return the stored skyline directly for a skyline query Semi-Materialization solution: Pre-computation: For SOME possible preferences, (1) pre-compute the skyline and (2) store it in a storage Skyline Query: return the stored skyline directly OR with simple operations for a skyline query Adaptive SFS IPO-Tree (Implicit Preference Order Tree)

4. IPO-Tree Idea of our proposed semi-materialization IPO-tree 1.Store the skyline with respect to the first-order implicit preference ONLY 2.Find the skyline with respect the implicit preference of any ordering from the skyline with respect to the first-order implicit preference Questions: 1.What preferences should be stored? 2.With these preferences, how can we perform a skyline query efficiently?

4. IPO-Tree M < * SKY 1 = {a, c, e, f} Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Binary Orders: {M < T, M < H} Some values other than “ M ” (i.e., “ H ” and “ T ” )

4. IPO-Tree M < * SKY 1 = {a, c, e, f} Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) H < * SKY 2 = {a, c, e} Binary Orders: {M < T, M < H} Some values other than “ H ” (i.e., “ T ” and “ M ” ) Binary Orders: {H < T, H < M} f is NOT a skyline point.Why?

4. IPO-Tree M < * SKY 1 = {a, c, e, f} Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) H < * SKY 2 = {a, c, e} Binary Orders: {H < T, H < M} f is NOT a skyline point.Why? With the binary order H<M, c dominates f We say that “ H<M ” disqualifies f as a skyline point.

4. IPO-Tree M < * SKY 1 = {a, c, e, f} Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) H < * SKY 2 = {a, c, e} M < H < * Binary Orders: {M < T, M < H} Binary Orders: { } Binary Orders: {H < T, H < M} M<H

4. IPO-Tree M < * SKY 1 = {a, c, e, f} Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) H < * SKY 2 = {a, c, e} M < H < * Binary Orders: {M < T, M < H} Binary Orders: { } Some values other than “ M ” and “ H ” (i.e., “ T ” ) Binary Orders: {H < T, H < M} M<H, M<T

4. IPO-Tree M < * SKY 1 = {a, c, e, f} Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) H < * SKY 2 = {a, c, e} M < H < * Binary Orders: {M < T, M < H} Binary Orders: { } Some values other than “ M ” and “ H ” (i.e., “ T ” ) Binary Orders: {H < T, H < M} M<H, M<T, H<T

4. IPO-Tree M < * SKY 1 = {a, c, e, f} Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) H < * SKY 2 = {a, c, e} M < H < * Binary Orders: {M < T, M < H} Binary Orders: { } Binary Orders: {H < T, H < M} M<H, M<T, H<T PSKY 1 = a set of data points in SKY 1 with value “ M ” = {e, f}

4. IPO-Tree M < * SKY 1 = {a, c, e, f} Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) H < * SKY 2 = {a, c, e} M < H < * Binary Orders: {M < T, M < H} Binary Orders: { } Binary Orders: {H < T, H < M} M<H, M<T, H<T PSKY 1 = {e, f}

4. IPO-Tree M < * SKY 1 = {a, c, e, f} Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) H < * SKY 2 = {a, c, e} M < H < * Binary Orders: {M < T, M < H} Binary Orders: { } Binary Orders: {H < T, H < M} M<H, M<T, H<T PSKY 1 = {e, f}

4. IPO-Tree M < * SKY 1 = {a, c, e, f} Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) H < * SKY 2 = {a, c, e} M < H < * Binary Orders: {M < T, M < H} Binary Orders: { } Binary Orders: {H < T, H < M} M<H, M<T, H<T PSKY 1 = {e, f} SKY 3 ={ } SKY 3 = (SKY 1 SKY 2 ) U PSKY 1 U = {a, c, e} U {e, f} = {a, c, e, f} a, c, e, f Additional binary order! This binary order may disqualify some data points in SKY 3 like “ f ” Observation: These points must be in PSKY 1

4. IPO-Tree M < * SKY 1 = {a, c, e, f} Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) H < * SKY 2 = {a, c, e} M < H < * Binary Orders: {M < T, M < H} Binary Orders: { } Binary Orders: {H < T, H < M} M<H, M<T, H<T PSKY 1 = {e, f} SKY 3 ={ } SKY 3 = (SKY 1 SKY 2 ) U PSKY 1 U = {a, c, e} U {e, f} = {a, c, e, f} a, c, e, f Skyline wrt the first-order preference Skyline wrt the second-order preference Skyline wrt the first-order preference

4. IPO-Tree M < * SKY 1 = {a, c, e, f} H < * SKY 2 = {a, c, e} M < H < * SKY 3 ={ } a, c, e, f Skyline wrt the first-order preference Skyline wrt the second-order preference Skyline wrt the first-order preference v 1 < v 2 <* v 1 < * v 2 < * Merging Property

4. IPO-Tree Second-order PreferenceSkyline wrt the first-order preference Skyline wrt the second-order preference Skyline wrt the first-order preference Third-order PreferenceSkyline wrt the first-order preference Skyline wrt the third-order preference Skyline wrt the second-order preference Fourth-order PreferenceSkyline wrt the first-order preference Skyline wrt the fourth-order preference Skyline wrt the third-order preference v 1 < v 2 <* v 1 < * v 2 < * v 1 < v 2 < v 3 < * v 1 < v 2 < * v 3 < * v 1 < v 2 < v 3 < v 4 < * v 1 < v 2 < v 3 < * v 4 < *

5. Empirical Study Datasets Synthetic Dataset Anti-correlated dataset Real Dataset (from UCI) Nursery Dataset Default Values (Synthetic) No. of tuples = 500K No. of numeric dimensions = 3 No. of nominal dimensions = 2 No. of values in a nominal dimension = 20 Order of implicit preference = 3

5. Empirical Study Variation No. of data points No. of numeric dimensions No. of nominal dimensions Cardinality of nominal dimensions Order of implicit preference Comparison SFS-D SFS-A IPO Tree IPO Tree-10 Original SFS Adaptive SFS IPO Tree which stores 10 most frequent values for each nominal attribute (for comparison)

5. Empirical Study Synthetic Data Set

5. Empirical Study Real Data Set

6. Conclusion Different customers have different preferences  different skylines Skyline Query on Nominal Attributes Adaptive SFS algorithm IPO-Tree algorithm Experiments

Q&A

3. Adaptive SFS Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla) Package IDPriceReverse Hotel-class Hotel-group a16001T (Tulips) b24004T (Tulips) c30000H (Horizon) d36001H (Horizon) e24003M (Mozilla) f30002M (Mozilla)

3. Adaptive SFS Package IDPriceReverse Hotel-class Hotel-group a16001T (Tulips) b24004T (Tulips) c30000H (Horizon) d36001H (Horizon) e24003M (Mozilla) f30002M (Mozilla) Using some existing algorithms, we can first remove some data points which must not be in skyline with respect to any implicit preference

3. Adaptive SFS Package IDPriceReverse Hotel-class Hotel-group a16001T (Tulips) b24004T (Tulips) c30000H (Horizon) d36001H (Horizon) e24003M (Mozilla) f30002M (Mozilla) Package IDScore a c e f Step 1 (Pre-computation): pre-sort the tuples according to the new score function Each value in attribute Hotel-Group is assigned with a SPECIAL value This special value is set to the total number of possible values in Hotel-Group (i.e., 3)

3. Adaptive SFS Package IDPriceReverse Hotel-class Hotel-group a16001T (Tulips) b24004T (Tulips) c30000H (Horizon) d36001H (Horizon) e24003M (Mozilla) f30002M (Mozilla) Package IDScore a c e f Step 1 (Pre-computation): pre-sort the tuples according to the new score function Score of point a is Each value in attribute Hotel-Group is assigned with a SPECIAL value This special value is set to the total number of possible values in Hotel-Group (i.e., 3) =

3. Adaptive SFS Package IDPriceReverse Hotel-class Hotel-group a16001T (Tulips) b24004T (Tulips) c30000H (Horizon) d36001H (Horizon) e24003M (Mozilla) f30002M (Mozilla) Package IDScore a c e f Step 1 (Pre-computation): pre-sort the tuples according to the new score function Score of point c is Each value in attribute Hotel-Group is assigned with a SPECIAL value This special value is set to the total number of possible values in Hotel-Group (i.e., 3) =

3. Adaptive SFS Package IDPriceReverse Hotel-class Hotel-group a16001T (Tulips) b24004T (Tulips) c30000H (Horizon) d36001H (Horizon) e24003M (Mozilla) f30002M (Mozilla) Package IDScore a c e f Step 1 (Pre-computation): pre-sort the tuples according to the new score function Each value in attribute Hotel-Group is assigned with a SPECIAL value This special value is set to the total number of possible values in Hotel-Group (i.e., 3) Package IDScore a1604 e2406 c3003 f3005

3. Adaptive SFS Package IDPriceReverse Hotel-class Hotel-group a16001T (Tulips) b24004T (Tulips) c30000H (Horizon) d36001H (Horizon) e24003M (Mozilla) f30002M (Mozilla) Package IDScore a1604 e2406 c3003 f3005

3. Adaptive SFS Package IDPriceReverse Hotel-class Hotel-group a16001T (Tulips) b24004T (Tulips) c30000H (Horizon) d36001H (Horizon) e24003M (Mozilla) f30002M (Mozilla) Step 2 (Skyline Query): re-sort the tuples for a skyline query (e.g., H<T<*) Package IDScore a1604 e2406 c3003 f3005 Value “ H ” is assigned with value 1. Value “ T ” is assigned with value 2. All values other than “ H ” and “ T ” (i.e., “ M ” ) are still equal to value 3. Pre-computation: Package IDScore a e c f Skyline Query: Score of point a is =

3. Adaptive SFS Package IDPriceReverse Hotel-class Hotel-group a16001T (Tulips) b24004T (Tulips) c30000H (Horizon) d36001H (Horizon) e24003M (Mozilla) f30002M (Mozilla) Step 2 (Skyline Query): re-sort the tuples for a skyline query (e.g., H<T<*) Package IDScore a1604 e2406 c3003 f3005 Value “ H ” is assigned with value 1. Value “ T ” is assigned with value 2. All values other than “ H ” and “ T ” (i.e., “ M ” ) are still equal to value 3. Pre-computation: Package IDScore a e c f Skyline Query: Score of point c is = Since the score of a and c are updated, we need to re-sort a and c. Note that the ordering of all OTHER points not containing “ H ” nor “ T ” remains unchanged.

3. Adaptive SFS Package IDPriceReverse Hotel-class Hotel-group a16001T (Tulips) b24004T (Tulips) c30000H (Horizon) d36001H (Horizon) e24003M (Mozilla) f30002M (Mozilla) Step 2 (Skyline Query): re-sort the tuples for a skyline query (e.g., H<T<*) Package IDScore a1604 e2406 c3003 f3005 Pre-computation: Package IDScore a e c f Skyline Query: We just use the original SFS. With this sorted list, we find the skyline = {a, c}

4. IPO-Tree Idea Pre-computation Store the skyline wrt the first-order preference Skyline Query Find the skyline wrt the preference of any order according to the stored skylines wrt the first-order preference e.g.1 Hotel-Group: M<* Airline : G<* e.g.2 Hotel-Group: M<* Airline :  e.g.3 Hotel-Group:  Airline : G<* How can we do it efficiently? We propose an indexing structure called IPO-tree

4. IPO-Tree Package IDPriceReverse Hotel-class Hotel-groupAirline a16001T (Tulips)G (Gonna) b24004T (Tulips)G (Gonna) c30000H (Horizon)G (Gonna) d36001H (Horizon)R (Redish) e24003M (Mozilla)R (Redish) f30002M (Mozilla)W (Wings) root T<* H<* M<*  G<*R<*W<*  G<*R<*W<*  G<*R<*W<*  G<*R<*W<*  Hotel-group: T<* Airline : G<* Hotel-group: T<* Airline :  Hotel-group:  Airline : G<* Hotel-Group Airline e.g. three nominal attributes (like Hotel-Group) each of which contains 40 possible values Full Materialization there are 4.1 x 10 9 possible preferences (in our problem setting). Semi-Materialization IPO-tree there are 70,644 nodes (which is significantly smaller than 4.1 x 10 9 ).

4. IPO-Tree One nominal attribute Merging Property Multiple nominal attributes Consider ONE nominal attribute at a time with Merging Property Fix the ordering of OTHER nominal attributes Then, consider each of other nominal attributes with Merging Property

4. IPO-Tree Package IDPriceHotel-classHotel-group a16004T (Tulips) b24001T (Tulips) c30005H (Horizon) d36004H (Horizon) e24002M (Mozilla) f30003M (Mozilla)

4. IPO-Tree Package IDPriceHotel-classHotel-groupAirline a16004T (Tulips)G (Gonna) b24001T (Tulips)G (Gonna) c30005H (Horizon)G (Gonna) d36004H (Horizon)R (Redish) e24002M (Mozilla)R (Redish) f30003M (Mozilla)W (Wings) Hotel-Group: M<H<* Airline : G<R<* Hotel-Group: M<* Airline : G<R<* Hotel-Group: H<* Airline : G<R<* Hotel-Group: M<* Airline : G<* Hotel-Group: H<* Airline : R<* Hotel-Group: H<* Airline : G<* Hotel-Group: H<* Airline : R<*

4. IPO-Tree M < * SKY 1 = {a, c, e, f} H < * SKY 2 = {a, c, e} M < H < * PSKY 1 = {e, f} SKY 3 ={ } SKY 3 = (SKY 1 SKY 2 ) U PSKY 1 U = {a, c, e} U {e, f} = {a, c, e, f} a, c, e, f

4. IPO-Tree Theorem: Given a user query with x-th order implicit preference on m ’’ nominal attributes, the number of set operations required for an x-th order implicit preference is O(x m ’’ ). m ’’ = 2 x = 2 No. of set operations = O(2 2 ) Hotel-Group: M<H<* Airline : G<R<* e.g.