Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Computation of Skyline Queries COSC6490A Fall 2007 Slawomir Kmiec.

Similar presentations


Presentation on theme: "Parallel Computation of Skyline Queries COSC6490A Fall 2007 Slawomir Kmiec."— Presentation transcript:

1 Parallel Computation of Skyline Queries COSC6490A Fall 2007 Slawomir Kmiec

2 Presentation Outline Skyline Concepts Skyline Applications Sequential Algorithms The Parallel Algorithm Goals and Objectives Questions

3 Skyline Concepts In a set of points (or records) identify points that are better than (i.e. not worse than) any of the others by a given set of their attributes. NameRatingAvg. Price Parthenon5$45.00 Olympus4$40.00 Coliseum4$30.00 Pyramid3$25.00 Bombay5$35.00 Paris5$40.00 Roma4$35.00 Palermo3$30.00 Point p a is said to dominate point p b if for all i such that 1 ≤ i ≤ d we have xi(p a ) ≤ xi(p b ), and at least one of those inequalities is strict. A point p is a skyline point if it is not dominated by any other point in S. The skyline of S is denoted sky(S).

4 Skyline Concepts (cont.) In d-dimensional data space (i.e. d attributes) the upper bound on the size of the skyline result is d! For d equal 9 the size of the skyline could be 362,880. Real-world example of skyline analysis.

5 Applications of Skyline Analysis databases: as an SQL operator skyline calculation must be at par with the other relational operators that are known to be computed efficiently and in parallel or distributed environment databases: table statistics i.e. play a role in query optimization cooperative query answering it can be used in query relaxation techniques (automatic answers for queries that have no literal results) or intentional answers (a succinct overview characterization can be returned where the amount of results is too much to handle by a human user e-commerce ranking for large resultsets when multiple attributes are involved; the best ones should be recommended statistics economics computational geometry …more papers on skyline just started appearing

6 Sequential Algorithms nested-loops O(n 2 ) 100k → 10 10 divide-and-conquer DC R-tree based nearest-neighbor (R-tree + DC) branch-and-bound BBS (R-tree) extensions in different directions LESS by P. Godfrey and J. Gryz

7 Sequential Algorithms (cont.)

8 The Parallel Algorithm Assumptions → dataset of size n Environment → p interconnected and independent processors with O(n/p) memory (can be physically separate nodes) Principles: → data divided equally and distributed → local skyline is computed at each peer → size of the local skyline is shared with peers → if combined results fit on any processor → local skylines are exchanged with peers then → processor p i picks i th chunk of the combined skyline and eliminates points in it that the combined skyline dominates → local results are sent to the central process → end // of processing

9 The Parallel Algorithm (cont.) ?

10 Principles (continued) → else // combined results do not fit on some p i → loop until required number of results is available or all p i have finished do → each processor p i picks a random set of points (in proportion of his local skyline) → this set is submitted to all peers that mark point that they dominate and marked points are returned to sender → each processor p i collects back points submitted to peers and removes marked ones from the original set but sends the remaining ones to the central processor → end loop → end // of processing NOTE: the choice of the local skyline algorithm is orthogonal to the operation of the parallel algorithm and even different ones can be selected for any of the steps

11 The Parallel Algorithm (cont.) ?

12 Goals and Objectives Can generic reusable higher-level operations be developed that could be used in other parallel computations? all-to-all messaging all-peer result consolidation 3-threaded processors transmission of large datasets process state maintenance and synchronization Can some a template design pattern be generalized for similar divide-distribute- and-conquer parallel computations? Can the count of dominated points be incorporated in the result? Can idle time on processors be utilized to assist peers or to do work-ahead or speculative preprocessing?

13 Questions???


Download ppt "Parallel Computation of Skyline Queries COSC6490A Fall 2007 Slawomir Kmiec."

Similar presentations


Ads by Google