Parallel Computation of Skyline Queries COSC6490A Fall 2007 Slawomir Kmiec.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

Lecture 3: Parallel Algorithm Design
Gossip and its application Presented by Anna Kaplun.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Maintaining Sliding Widow Skylines on Data Streams.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.
1 Maximal Independent Set. 2 Independent Set (IS): In a graph G=(V,E), |V|=n, |E|=m, any set of nodes that are not adjacent.
Garbage Collecting the World. --Bernard Lang, Christian and Jose Presented by Shikha Khanna coen 317 Date – May25’ 2005.
Peer-to-Peer Distributed Search. Peer-to-Peer Networks A pure peer-to-peer network is a collection of nodes or peers that: 1.Are autonomous: participants.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO.
12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
1 Lecture 8 Architecture Independent (MPI) Algorithm Design Parallel Computing Fall 2007.
CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
Distributed Constraint Optimization * some slides courtesy of P. Modi
Election Algorithms and Distributed Processing Section 6.5.
Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of.
Parallel Programming in C with MPI and OpenMP
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
PNear Combining Content Clustering and Distributed Hash-Tables Ronny Siebes Vrije Universiteit, Amsterdam The netherlands
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,
Adaptive Parallel Sorting Algorithms in STAPL Olga Tkachyshyn, Gabriel Tanase, Nancy M. Amato
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Complexity of algorithms Algorithms can be classified by the amount of time they need to complete compared to their input size. There is a wide variety:
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
DATA STRUCTURE & ALGORITHMS (BCS 1223) CHAPTER 8 : SEARCHING.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
1 Maximal Independent Set. 2 Independent Set (IS): In a graph G=(V,E), |V|=n, |E|=m, any set of nodes that are not adjacent.
Efficient Computation of Reverse Skyline Queries VLDB 2007.
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob SPRINT: A Scalable Parallel Classifier for Data Mining.
K-Hit Query: Top-k Query Processing with Probabilistic Utility Function SIGMOD2015 Peng Peng, Raymond C.-W. Wong CSE, HKUST 1.
Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.
Reporter : Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
Karolina Muszyńska Based on: S. Wrycza, B. Marcinkowski, K. Wyrzykowski „Język UML 2.0 w modelowaniu SI”
A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
CSCE Database Systems Chapter 15: Query Execution 1.
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
A Bandwidth Scheduling Algorithm Based on Minimum Interference Traffic in Mesh Mode Xu-Yajing, Li-ZhiTao, Zhong-XiuFang and Xu-HuiMin International Conference.
CS 540 Database Management Systems
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
Parallel Computation of Skyline Queries Verification COSC6490A Fall 2007 Slawomir Kmiec.
Automatic Categorization of Query Results Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang Sushruth Puttaswamy.
Parallel Programming in Chess Simulations Part 2 Tyler Patton.
Repairing Sensor Network Using Mobile Robots Y. Mei, C. Xian, S. Das, Y. C. Hu and Y. H. Lu Purdue University, West Lafayette ICDCS 2006 Speaker : Shih-Yun.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Decomposition and Parallel Tasks (cont.) Dr. Xiao Qin Auburn University
Garbage Collecting the World Presentation: Mark Mastroieni Authors: Bernard Lang, Christian Queinne, Jose Piquer.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
Parallel Databases.
Definition of Distributed System
Parallel Sorting Algorithms
Parallel Programming in C with MPI and OpenMP
Join Processing in Database Systems with Large Main Memories (part 2)
Communication and Memory Efficient Parallel Decision Tree Construction
Parallel Computation of Skyline Queries Implementation
A Restaurant Recommendation System Based on Range and Skyline Queries
Parallel Sorting Algorithms
Lecture 2- Query Processing (continued)
Skyline query with R*-Tree: Branch and Bound Skyline (BBS) Algorithm
Presentation transcript:

Parallel Computation of Skyline Queries COSC6490A Fall 2007 Slawomir Kmiec

Presentation Outline Skyline Concepts Skyline Applications Sequential Algorithms The Parallel Algorithm Goals and Objectives Questions

Skyline Concepts In a set of points (or records) identify points that are better than (i.e. not worse than) any of the others by a given set of their attributes. NameRatingAvg. Price Parthenon5$45.00 Olympus4$40.00 Coliseum4$30.00 Pyramid3$25.00 Bombay5$35.00 Paris5$40.00 Roma4$35.00 Palermo3$30.00 Point p a is said to dominate point p b if for all i such that 1 ≤ i ≤ d we have xi(p a ) ≤ xi(p b ), and at least one of those inequalities is strict. A point p is a skyline point if it is not dominated by any other point in S. The skyline of S is denoted sky(S).

Skyline Concepts (cont.) In d-dimensional data space (i.e. d attributes) the upper bound on the size of the skyline result is d! For d equal 9 the size of the skyline could be 362,880. Real-world example of skyline analysis.

Applications of Skyline Analysis databases: as an SQL operator skyline calculation must be at par with the other relational operators that are known to be computed efficiently and in parallel or distributed environment databases: table statistics i.e. play a role in query optimization cooperative query answering it can be used in query relaxation techniques (automatic answers for queries that have no literal results) or intentional answers (a succinct overview characterization can be returned where the amount of results is too much to handle by a human user e-commerce ranking for large resultsets when multiple attributes are involved; the best ones should be recommended statistics economics computational geometry …more papers on skyline just started appearing

Sequential Algorithms nested-loops O(n 2 ) 100k → divide-and-conquer DC R-tree based nearest-neighbor (R-tree + DC) branch-and-bound BBS (R-tree) extensions in different directions LESS by P. Godfrey and J. Gryz

Sequential Algorithms (cont.)

The Parallel Algorithm Assumptions → dataset of size n Environment → p interconnected and independent processors with O(n/p) memory (can be physically separate nodes) Principles: → data divided equally and distributed → local skyline is computed at each peer → size of the local skyline is shared with peers → if combined results fit on any processor → local skylines are exchanged with peers then → processor p i picks i th chunk of the combined skyline and eliminates points in it that the combined skyline dominates → local results are sent to the central process → end // of processing

The Parallel Algorithm (cont.) ?

Principles (continued) → else // combined results do not fit on some p i → loop until required number of results is available or all p i have finished do → each processor p i picks a random set of points (in proportion of his local skyline) → this set is submitted to all peers that mark point that they dominate and marked points are returned to sender → each processor p i collects back points submitted to peers and removes marked ones from the original set but sends the remaining ones to the central processor → end loop → end // of processing NOTE: the choice of the local skyline algorithm is orthogonal to the operation of the parallel algorithm and even different ones can be selected for any of the steps

The Parallel Algorithm (cont.) ?

Goals and Objectives Can generic reusable higher-level operations be developed that could be used in other parallel computations? all-to-all messaging all-peer result consolidation 3-threaded processors transmission of large datasets process state maintenance and synchronization Can some a template design pattern be generalized for similar divide-distribute- and-conquer parallel computations? Can the count of dominated points be incorporated in the result? Can idle time on processors be utilized to assist peers or to do work-ahead or speculative preprocessing?

Questions???