Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada Joint work with Y. Tao, M.

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

Statistical Methods for Data Analysis Random number generators Luca Lista INFN Napoli.
Disability Management and the Effects of Workplace Interventions: A Meta-Analysis IFDM GLADNET-IDMRN Symposium John Lui Norm Hursh David Rosenthal.
Skyline Charuka Silva. Outline Charuka Silva, Skyline2  Motivation  Skyline Definition  Applications  Skyline Query  Similar Interesting Problem.
Algorithms for Geometric Covering and Piercing Problems Robert Fraser PhD defence Nov. 23, 2012.
Voronoi-based Geospatial Query Processing with MapReduce
Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order 1.
Prof. Qiming Zhou Network Analysis Network Analysis.
Association Rule Mining
Mining Association Rules
Recap: Mining association rules from large datasets
Hannover 27-iv-2007DDS Data Analysis1 Alberto Lobo ICE-CSIC & IEEC.
An Introduction to Artificial Intelligence
Choosing an Order for Joins
Distributed Computing 9. Sorting - a lower bound on bit complexity Shmuel Zaks ©
A. S. Morse Yale University University of Minnesota June 4, 2014 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A.
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Frequent Closed Pattern Search By Row and Feature Enumeration
Copyright 2004 Koren & Krishna ECE655/DataRepl.1 Fall 2006 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing.
Data Mining Association Analysis: Basic Concepts and Algorithms
July 29HDMS'08 Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Association Analysis: Basic Concepts and Algorithms.
1 Continuous k-dominant Skyline Query Processing Presented by Prasad Sriram Nilu Thakur.
Branch and Bound Algorithm for Solving Integer Linear Programming
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Bin Jiang, Jian Pei.  Problem Definition  An On-the-fly Method ◦ Interval Skyline Query Answering Algorithm ◦ Online Interval Skyline Query Algorithm.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates
CS4432: Database Systems II
Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces Jian Pei # Wen Jin # Martin Ester # Yufei Tao + # Simon Fraser University,
SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,
Efficient Methods for Data Cube Computation and Data Generalization
1 Progressive Computation of Constrained Subspace Skyline Queries Evangelos Dellis 1 Akrivi Vlachou 1 Ilya Vladimirskiy 1 Bernhard Seeger 1 Yannis Theodoridis.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
1 Top-k Dominating Queries DB seminar Speaker: Ken Yiu Date: 25/05/2006.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
Efficient Processing of Top-k Spatial Preference Queries
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.
University at BuffaloThe State University of New York Pattern-based Clustering How to cluster the five objects? qHard to define a global similarity measure.
Probabilistic Skylines on Uncertain Data (VLDB2007) Jian Pei et al Supervisor: Dr Benjamin Kao Presenter: For Date: 22 Feb 2008 ??: the possible world.
CS685: Special Topics in Data Mining The UNIVERSITY of KENTUCKY Frequent Itemset Mining II Tree-based Algorithm Max Itemsets Closed Itemsets.
Tian Xia and Donghui Zhang Northeastern University
Reducing Number of Candidates
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Frequent Pattern Mining
Spatio-temporal Pattern Queries
Data Mining Association Analysis: Basic Concepts and Algorithms
Probabilistic Data Management
Sofian Maabout University of Bordeaux. CNRS
Continuous Density Queries for Moving Objects
Relaxing Join and Selection Queries
The Skyline Query in Databases Which Objects are the Most Important?
Efficient Processing of Top-k Spatial Preference Queries
Association Analysis: Basic Concepts
Presentation transcript:

Towards Multidimensional Skyline Analysis Jian Pei Simon Fraser University, Canada Joint work with Y. Tao, M. Ester and W. Jin

J. Pei: Towards Multidimensional Skyline Analysis2 Searching Flights to Sydney Price, travel-time and # stops all matter! A (long) list of all feasible flights? boring to review Presenting only some selected flights – how? –Vancouver Honolulu Sydney ($2100, 19 hours, 1 stop) Good! –Vancouver Honolulu Auckland Sydney ($1980, 24 hours, 2 stop) Also good, cheaper, though longer travel time and more stops –Vancouver Los Angles Honolulu Sydney ($2060, 28 hours, 3 stops) Not good, more expensive, longer travel time, and more stops! Skyline routes – all possible trade-offs among price, travel- time and # stops superior to the others

J. Pei: Towards Multidimensional Skyline Analysis3 Domination and Skyline A set of objects S in an n-dimensional space D=(D 1, …, D n ) –Numeric dimensions for illustration in this talk For u, v S, u dominates v if –u is better than v in one dimension, and –u is not worse than v in any other dimensions –For illustration in this talk, the smaller the better u S is a skyline object if u is not dominated by any other objects in S

J. Pei: Towards Multidimensional Skyline Analysis4 Finding the Skyline in Full Space Many existing methods Divide-and-conquer and block nested loops by Borzsonyi et al. Sort-first-skyline (SFS) by Chomicki et al. Using bitmaps and the relationships between the skyline and the minimum coordinates of individual points, by Tan et al. Using nearest-neighbor search by Kossmann et al. The progressive branch-and-bound method by Papadias et al.

J. Pei: Towards Multidimensional Skyline Analysis5 Full Space Skyline Is Not Enough! Skylines in subspaces –Mr. Richer does not care about the price, how can we derive the superior trade-offs between travel-time and number of stops from the full space skyline? Sky cube – computing skylines in all non- empty subspaces (Yuan et al., VLDB05) –Any subspace skyline queries can be answered (efficiently)

J. Pei: Towards Multidimensional Skyline Analysis6 Sky Cube

J. Pei: Towards Multidimensional Skyline Analysis7 Understanding Skylines Understanding skyline objects –Both Wilt Chamberlain and Michael Jordan are in the full space skyline of the Great NBA Players, which merits, respectively, really make them outstanding? –How are they different? Finding the decisive subspaces – the minimal combinations of factors that determine the (subspace) skyline membership of an object? –Total rebounds for Chamberlain, (total points, total rebounds, total assists) and (games played, total points, total assists) for Jordan

J. Pei: Towards Multidimensional Skyline Analysis8 Redundancy in Sky Cube Does it just happen that skylines in multiple subspaces are identical?

J. Pei: Towards Multidimensional Skyline Analysis9 Observations a, b and c are in the skyline of (X, Y) –Both a and c are in some subspace skylines –b is not in any subspace skyline d and e are not in the skyline of (X, Y) –d is in the skyline of subspace X –e is not in any subspace skyline Why and in which subspaces is an object in the skyline?

J. Pei: Towards Multidimensional Skyline Analysis10 Subspace Skylines Monotonic? Is subspace skyline membership monotonic? –x is in the skylines in spaces ABCD and A, but it is not in the skyline in ABD – it is dominated by y in ABD x and y collapse in AD, x and y are in the skylines of the same subspaces of AD

J. Pei: Towards Multidimensional Skyline Analysis11 Coincident Groups How to capture groups of objects that share values in subspaces? (G, B) is a coincident group (c-group) if all objects in G share the same values on all dimensions in B –G B is the projection A c-group (G, B) is maximal if no any further objects or dimensions can be added into the group –Example: (xy, AD)

J. Pei: Towards Multidimensional Skyline Analysis12 C-Group Lattices C-group latticesMaximal c-group lattices quotient Where are the skylines? Are they also in good structure?

J. Pei: Towards Multidimensional Skyline Analysis13 Skyline Groups A maximal c-group (G, B) is a skyline group if G B is in the subspace skyline of B How to characterize the subspaces where G B is in the skyline? –(x, ABCD) is a skyline group –If the set of subspaces are convex, we can use bounds

J. Pei: Towards Multidimensional Skyline Analysis14 Decisive Subspaces A space C B is decisive if –G C is in the subspace skyline of C –No any other objects share the same values with objects in G on C –C is minimal – no C C has the above two properties (x, ABCD) is a skyline group, AC, CD are decisive

J. Pei: Towards Multidimensional Skyline Analysis15 Semantics In which subspaces an object or a group of objects are in the skyline? The skyline membership of skyline groups are established by their decisive subspaces –For skyline group (G, B), if C is decisive, then G is in the skyline of any subspace C where C C B Signature of skyline group Sig(G, B)=(G B, C 1, …, C k ) where C 1, …, C k are all decisive subspaces

J. Pei: Towards Multidimensional Skyline Analysis16 Example The skyline membership of an object is determined by the skyline groups in which it participates An object u is in the skyline of subspace C if and only if there exists a skyline group (G, B) and its decisive subspace C such that u G and C C B

J. Pei: Towards Multidimensional Skyline Analysis17 Subspace Skyline Analysis All skyline projections form a lattice (skyline projection lattice) –A sub-lattice of the c-group lattice All skyline groups form a lattice (skyline group lattice) –A quotient lattice of the skyline projection lattice –A sub-lattice of the maximal c-group lattice

J. Pei: Towards Multidimensional Skyline Analysis18 Relationship Among Lattices C-group latticesMaximal c-group lattices Skyline projection latticesSkyline group lattices quotient sub-lattice

J. Pei: Towards Multidimensional Skyline Analysis19 OLAP Analysis on Skylines Subspace skylines Relationships between skylines in subspaces Closure information

J. Pei: Towards Multidimensional Skyline Analysis20 Full Space vs. Subspace Skylines For any skyline group (G, B), there exists at least one object u G such that u is in the full space skyline –Can use u as the representative of the group An object not in the full skyline can be in some subspace skyline only if it collapses to some full space skyline objects –All objects not in the full space skyline and not collapsing to any full space skyline object can be removed from skyline analysis –If only the projections are concerned, only the full space skyline objects are sufficient for skyline analysis

J. Pei: Towards Multidimensional Skyline Analysis21 Computing Skylines in All Subspaces NP-hard –Intuition: the curse of dimensionality – there are an exponential number of subspaces Reduction from frequent itemset mining TidItems T1{a, b, c} T2{a, c, d, e} T3{b, c, d, e} If min_sup=2, a, b, c, d, e, ac, bc, cd, cde, de are frequent itemsets Oidabcde O O O O Sup(cde)=# skyline objects in cde - 1

J. Pei: Towards Multidimensional Skyline Analysis22 Subspace Skyline Computation Compute the set of skyline groups and their signatures –NP-hard: reduction from frequent closed itemset mining Top-down enumeration of subspaces –Similar ideas in skyline cube computation For each subspace, find skyline groups and decisive subspaces –Find (subspace) skylines by sorting –Share sorting and use merge-sorting as much as possible

J. Pei: Towards Multidimensional Skyline Analysis23 Enumerating Subspaces Using a top-down enumeration tree –Each child explores a proper subspace with one dimension less –All objects not in the skyline of the parent subspace and not collapsing to one skyline object of the parent subspace can be removed

J. Pei: Towards Multidimensional Skyline Analysis24 Computing Skylines by Sorting Sort all objects in lexicographic ascending order –a-d-b-e-c Check objects in the sorted list, an object is in the skyline if it is not dominated by any skyline objects before it in the list –{a, b, c} are skyline objects

J. Pei: Towards Multidimensional Skyline Analysis25 Efficient Local Sorting Not necessary to sort for each subspace –A sorted list in subspace (A, B, C, D) can be used in subspaces (A), (A, B), (A, B, C) –To generate a sorted list in subspace (B, C, D), we can use merging sort to merge the sublists of different values on A If a non-skyline object collapses to a skyline object, the skyline object absorbs the non-skyline object by taking the non-skyline objects id –A non-skyline object may be absorbed by multiple skyline objects –Recursively reduce the number of objects and shorten the sorted lists

J. Pei: Towards Multidimensional Skyline Analysis26 Results on Great NBA Players 17,266 records 4 attributes are selected 67 skyline records in the full space, 146 decisive subspaces

J. Pei: Towards Multidimensional Skyline Analysis27 # Skyline Groups vs. Dimensionality Dimensionality: the complexity of subspaces –A 1-d subspace has only one skyline group –A high-dimensional subspace many have many skyline groups –# skyline groups tends to increase when dimensionality increases Number of subspaces –An n-d data set has n 1-d subspaces, 1 n-d (sub-)space, and n!/[(n/2)!(n/2)!] n/2-d subspaces (if n is even) The number of skyline groups in subspaces of dimensionality k depends on the joint-effect of the two factors –When k < n/2, the two factors are consistent –When k > n/2, the two factors are contrasting

J. Pei: Towards Multidimensional Skyline Analysis28 About the Synthetic Data Sets Independent: attribute values are uniformly distributed Correlated: if a record is good in one dimension, likely it is also good in others Anti-correlated: if a record is good in one dimension, it is unlikely to be good in others

J. Pei: Towards Multidimensional Skyline Analysis29 Scalability w.r.t Database Size Independent Correlated Anti-correlated

J. Pei: Towards Multidimensional Skyline Analysis30 Scalability w.r.t. Dimensionality

J. Pei: Towards Multidimensional Skyline Analysis31 Conclusions Skyline analysis is important in many applications –Only skyline objects in the full space may not be enough Skyline cube is powerful to answer subspace skyline queries –But it is interesting to ask why an object is in the subspace skylines, and more Skyline groups and decisive subspaces – capturing the semantics of subspace skylines OLAP subspace skyline analysis An efficient algorithm to compute skyline groups Latest progress: An efficient algorithm to query subspace skylines (Tao et al., ICDE06)

J. Pei: Towards Multidimensional Skyline Analysis32 References J. Pei, W. Jin, M. Ester, and Y. Tao. "Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces". In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05), Trondheim, Norway, August 30-September 2, Y. Tao, X. Xiao, and J. Pei. "SUBSKY: Efficient Computation of Skylines in Subspaces". In Proceedings of the 22nd International Conference on Data Engineering (ICDE'06), Atlanta, GA, USA, April 3-7, 2006.

J. Pei: Towards Multidimensional Skyline Analysis33 Thank You! Vancouver, BC, Canada Trondheim, Norway By Gerold Jung Hong Kong skyline_on_a_cloudy_night_around_Central

J. Pei: Towards Multidimensional Skyline Analysis34 Subspace Skyline Queries Given a set of objects in multidimensional space D, and a subspace D D, find the skyline objects in space D Materializing subspace skylines in all subspaces can be very costly if dimensionality is high

J. Pei: Towards Multidimensional Skyline Analysis35 Pruning Using Skyline Points Suppose every dimension is normalized to range [0, 1] –A C =(1, 1, …, 1) is called the maximal corner L distance –f(p) = max n i=1 {(1-p[i])} If p sky is a skyline object, then p cannot be a skyline object if –f(p) < min n i=1 {(1-p sky [i])}

J. Pei: Towards Multidimensional Skyline Analysis36 Searching a Subspace If p sky is a subspace skyline object in D, then any object satisfying the following condition cannot be in the subspace skyline –f(p) < min Di D {(1-p sky [i])} A search algorithm –Compute f(p) for every object p, sort in f(p) descending order –Maintain the current set S sky of skyline objects in D and U=max psky Ssky {min Di D {(1-p sky [i])}} –Scan points in f(p) ascending order, until U > f(p)

J. Pei: Towards Multidimensional Skyline Analysis37 Example Sorted list: p3, p4, p5, p1, p6, p2, p8, p7 First skyline point p3, U=0.5 Second skyline point p4, U=0.5 Third skyline point p5, U=0.5 Fourth skyline point p1, remove p3, U=0.8 Done!

J. Pei: Towards Multidimensional Skyline Analysis38 How Effective Is This Simple Idea? A 15-d uniformed data set of 100,000 points, retrieve a 2-d subspace skyline The probability that no point exists in region [0, λ]x[0, λ] is (1-λ) 100,000 < 10% for λ=0.001 All points p with f(p)<0.999 can be pruned! – x100% = 98.5% points in expectation We can build a B+-tree/B-tree to sort all points according to f(p)

J. Pei: Towards Multidimensional Skyline Analysis39 Using Multiple Anchors Critical Idea: cluster objects and find good anchors for clusters! Details in our ICDE06 paper.

J. Pei: Towards Multidimensional Skyline Analysis40 A Few Anchors Work Well!

J. Pei: Towards Multidimensional Skyline Analysis41 I/O Efficiency for Queries

J. Pei: Towards Multidimensional Skyline Analysis42 Scalability w.r.t. Dimensionality