VLDB 20051 Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute

Slides:

Advertisements

Similar presentations

Revisiting Co-Processing for Hash Joins on the Coupled CPU- GPU Architecture School of Computer Engineering Nanyang Technological University 27 th Aug.

Advertisements

LIBRA: Lightweight Data Skew Mitigation in MapReduce

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

6.830 Lecture 9 10/1/2014 Join Algorithms. Database Internals Outline Front End Admission Control Connection Management (sql) Parser (parse tree) Rewriter.

The Volcano/Cascades Query Optimization Framework

Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.

Fall 2008Parallel Query Optimization1. Fall 2008Parallel Query Optimization2 Bucket Sizes and I/O Costs Bucket B does not fit in the memory in its entirety,

Parallel Database Systems The Future Of High Performance Database Systems David Dewitt and Jim Gray 1992 Presented By – Ajith Karimpana.

Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz.

Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.

Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.

Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.

SIGMOD'061 Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Bin Liu, Yali Zhu and Elke A. Rundensteiner Database Systems Research.

THE QUERY COMPILER 16.6 CHOOSING AN ORDER FOR JOINS By: Nitin Mathur Id: 110 CS: 257 Sec-1.

1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.

Chapter 4 Parallel Sort and GroupBy 4.1Sorting, Duplicate Removal and Aggregate 4.2Serial External Sorting Method 4.3Algorithms for Parallel External Sort.

Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.

Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.

Inspector Joins IC-65 Advances in Data Management Systems 1 Inspector Joins By Shimin Chen, Anastassia Ailamaki, Phillip, and Todd C. Mowry VLDB 2005 Rammohan.

Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.

An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.

Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.

1 DCAPE: Distributed and Self-Tuned Continuous Query Processing Tim Sutherland,Bin Liu,Mariana Jbantova, and Elke A. Rundensteiner Department of Computer.

1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.

Distributed Computations MapReduce

CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.

Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.

Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

1 Experimental Evidence on Partitioning in Parallel Data Warehouses Pedro Furtado Prof. at Univ. of Coimbra & Researcher at CISUC DEI/CISUC-Universidade.

Network Aware Resource Allocation in Distributed Clouds.

Database Management 9. course. Execution of queries.

Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.

MapReduce M/R slides adapted from those of Jeff Dean’s.

1 Dynamically Adaptive Distributed System for Processing CompleX Continuous Queries Bin Liu, Yali Zhu, Mariana Jbantova, Brad Momberger, and Elke A. Rundensteiner.

Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.

Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

@ Carnegie Mellon Databases Inspector Joins Shimin Chen Phillip B. Gibbons Todd C. Mowry Anastassia Ailamaki 2 Carnegie Mellon University Intel Research.

Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch

GSLPI: a Cost-based Query Progress Indicator

SECTION 5: PERFORMANCE CHRIS ZINGRAF. OVERVIEW: This section measures the performance of MapReduce on two computations, Grep and Sort. These programs.

Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE,

Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty

Data Structures and Algorithms in Parallel Computing

CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)

Query Processing CS 405G Introduction to Database Systems.

Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,

Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.

1 Choosing an Order for Joins. 2 What is the best way to join n relations? SELECT … FROM A, B, C, D WHERE A.x = B.y AND C.z = D.z Hash-Join Sort-JoinIndex-Join.

Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database.

Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.

Handling Data Skew in Parallel Joins in Shared-Nothing Systems Yu Xu, Pekka Kostamaa, XinZhou (Teradata) Liang Chen (University of California) SIGMOD’08.

Query Optimization. overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin) DBA,

BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.

Chiu Luk CS257 Database Systems Principles Spring 2009

Igor EPIMAKHOV Abdelkader HAMEURLAIN Franck MORVAN

Parallel Databases.

Efficient Join Query Evaluation in a Parallel Database System

Parallel Programming By J. H. Wang May 2, 2017.

Evaluation of Relational Operations

Parallel Programming in C with MPI and OpenMP

MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner

Lecture 17: Distributed Transactions

Akshay Tomar Prateek Singh Lohchubh

Adaptive Query Processing (Background)

The Gamma Database Machine Project

Parallel DBMS DBMS Textbook Chapter 22

Presentation transcript:

VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute

VLDB Multi-Join Queries Data Integration over Distributed Data Sources  i.e., Extract Transform Load (ETL) Services Data Source … Data Warehouse Data Warehouse … Persistent Storage (1) High IO costs given large intermediate results (2) Disk access undesirable since one time process

VLDB Applying Parallelism Processed in Main Memory of a PC Cluster  Make use of aggregated resources (main memory, CPU) Network Clusters of Machines

VLDB Three Types of Parallelism Pipelined: Operators be composed into producer and consumer relationship Independent: Independent operators run simultaneously on distinct machines Partitioned: Single operator replicated and run on multiple machines

VLDB Basics of Hash Join Two-Phase Hash Join [SD89, LTS90]  Demonstrated High Performance  Potential High Degree of Parallelism ……… …5/ …5/ …DateID ……… …HPS0012 …IPC0011 …ItemOID OrdersLineItems valuekey (1) Build hash tables of Orders based on ID ……… …5/ …5/ …DateID Orders (2) Probe hash tables and output results ……… …HPS0012 …IPC0011 …ItemOID LineItems

VLDB Partitioned Hash Join Orders (1) Build hash tables of Orders based on ID ……… …5/ …5/ …DateID Split valuekeyvaluekeyvaluekey (2) Probe hash tables and output results ……… …HPS0012 …IPC0011 …ItemOID LineItems  Partition (Inputs) Hash Tables across Processors  Have Each Processing Node Run in Parallel

VLDB Left-Deep Tree [SD90] R6R6 R7R7 R1R1 R2R2 R5R5 R4R4 R3R3 R8R8 R9R9 Example Join Graph R1R1 R2R2 R3R3 R8R8 R9R9 B 1 P 1 B 2 P 2 B 7 P 7 B 8 P 8 Left-Deep Query TreeSteps: (1) Scan R 1 – Build R 1 (2) Scan R 2 – Probe P 1 – Build B 2 (3) Scan R 3 – Probe P 2 – Build B 3 (8) Scan R 8 – Probe P 7 – Build B 8 (9) Scan R 9 – Probe P 8 – Output …

VLDB Right-Deep Tree [SD90] R6R6 R7R7 R1R1 R2R2 R5R5 R4R4 R3R3 R8R8 R9R9 Example Join Graph R1R1 R2R2 R3R3 R8R8 R9R9 B 1 P 1 B 2 P 2 B 7 P 7 B 8 P 8 Right-Deep Query Tree (1)Scan R 2 – Build R 1, Scan R 3 – Build R 2, …, Scan R 9 – Build R 8 (2) Scan R 1, Probe P 1, Probe P 2, …, Probe P 8

VLDB Tradeoffs Between Left and Right Trees Right-Deep  Good potential for pipelined parallelism.  Intermediate results exist only as a stream.  Size of building relations can be predicted accurately.  Large memory consumption. Left-Deep  Less memory consumption  Less pipelined parallelism

VLDB State-of-the-Art Solutions Implicit Assumption : Prefer Maximal Pipelined Parallelism !!! R3R3 R2R2 R1R1 R5R5 B 1 P 1 B 2 P 2 B 4 P 4 B 3 P 3 R4R4 B 8 P 8 R9R9 B 7 P 7 R8R8

VLDB State-of-the-Art Solutions What if : Memory Constrained Environments ? Strategy : R3R3 R2R2 R1R1 R5R5 B 1 P 1 B 2 P 2 B 4 P 4 B 3 P 3 R4R4 B 8 P 8 R9R9 B 7 P 7 R8R8 R3R3 R2R2 R1R1 R5R5 B 1 P 1 B 2 P 2 B 4 P 4 B 3 P 3 R4R4 B 8 P 8 R9R9 B 7 P 7 R8R8 Pipeline ! Break tree into several pieces, and Process one piece at a time (as pipeline) I.e., Static Right-Deep[SD90], ZigZag [ZZBS93], Segmented Right-Deep [CLYY92].

VLDB Pipelined Execution Optimal Degree of Parallelism? I.e., It may not be necessary to partition R 2 over a large number of machines if it only has 1000 tuples? Redirection Cost: The intermediate results generated may need to be partitioned to a different machine. R1R1 R2R2 R3R3 R4R4 R2R2 R3R3 R4R4 R1R1 Computation Machines Partition BuildingProbing P 3 2 P 3 3 P 3 4 P 2 2 P 2 3 P 2 4 P 1 2 P 1 3 P 1 4 t t P 1 2

VLDB Pipelined Cost Model Compute n-way join over k machines Probing relation R 0, building relations, R 1, R 2, …, R n I i represents the intermediate results after joining with R i Total Work (W b +W p ) & Total Processing Time (T b +T p )

VLDB Break Pipelined Parallelism  Large number of small pipelines  High interdependence between pipelined segments i.e., P 1 > P 2, P 3 > P 4, P 2 > P 4, R9R9 R7R7 R1R1 R0R0 P1P1 P2P2 P3P3 P4P4 R3R3 R2R2 R1R1 R0R0 R4R4 R5R5 R7R7 R6R6 To Break Long Pipeline and Introduce Independent Parallelism

VLDB Segmented Bushy Tree Basic Idea  Compose large pipelined segment  Run pipelined segments independently  Compose bushy tree with minimal interdependency R7R7 R6R6 R4R4 R3R3 R5R5 R0R0 R1R1 R8R8 R9R9 R2R2 R2R2 R4R4 R3R3 R8R8 R6R6 R9R9 R7R7 R5R5 R1R1 R0R0 I1I1 I2I2 P1P1 P3P3 P2P2 To balance pipelined and independent parallelism

VLDB Cost-Based Heuristics Composing Segmented Tree Input: A connected join graph G with n nodes. Number m specifies maximum number of nodes in each graph. Output: Segmented bushy tree with at least n/m subtrees. completed = false; WHILE (!completed) { Choose node V with largest cardinality that has not yet been grouped as probing relation; Enumerate all subgraphs starting from V with at most m nodes; Choose best subgraph, mark nodes in this group as having been selected in original join graph; IF !(exist K, K is a connected subgraph of G with unselected nodes) && (K.size() >= 2) { completed = true; } Compose segmented bushy tree from all groups;

VLDB Example R7R7 R6R6 R4R4 R3R3 R5R5 R0R0 R1R1 R8R8 R9R9 R2R2 R7R7 R6R6 R4R4 R3R3 R5R5 R0R0 R1R1 R8R8 R9R9 R2R2 G1G1 (1) R 7, R 8, R 9, R 6 (2) R 7, R 9, R 6, R 8 (3) R 7, R 4, R 8, R 5... (1) R 1, R 0, R 2, R 3 (2) R 1, R 2, R 0, R 3 (3) R 1, R 2, R 3, R 4... R7R7 R6R6 R4R4 R3R3 R5R5 R0R0 R1R1 R8R8 R9R9 R2R2 G1G1 G2G2

VLDB Example : Segmented Bushy Tree R2R2 R4R4 R3R3 R8R8 R6R6 R9R9 R7R7 R5R5 R1R1 R0R0 I1I1 I2I2 R7R7 R6R6 R4R4 R3R3 R5R5 R0R0 R1R1 R8R8 R9R9 R2R2 G1G1 G2G2 G3G3

VLDB Machine Allocation Based on building relation sizes of each segment  N b : total amount of building work.  k i : number of machines allocated to pipeline i R2R2 R4R4 R3R3 R8R8 R6R6 R9R9 R7R7 R5R5 R1R1 R0R0 I1I1 I2I2 k1k1 k3k3 k2k2 N b =

VLDB Insufficient Main Memory Break query based on main memory availability Compose segmented bushy tree for each part R7R7 R6R6 R4R4 R3R3 R5R5 R0R0 R1R1 R8R8 R9R9 R2R2 R 15 R 16 R 18 R 19 R 17 R 11 R 10 R 14 R 13 R 12

VLDB Experimental Setup 10 Machine Cluster  Each machine has 2 2.4GHz Xeon CPUs, 2G Memory.  Connect by gigabit ethernet switch Oracle 8i Controller Machine Cluster PIII 800M Hz PC, 256M Memory 2 PIII 1G CPUs, 1G Memory Application PIII 800M Hz PC, 256M Memory

VLDB Experimental Setup (cont.) Generated Data Set with Integer Join Values  Around 40 bytes per tuple Randomly Generated Join Queries  Acyclic join graph with 8, 12, 16 nodes  Each node represents one join relation  Each edge represents one join condition  Average join ratio is 1  Cardinality of each relation is from 1K ~ 100K  Up to 600MB per query

VLDB Pipelined vs. Segmented (I)

VLDB Pipelined vs. Segmented (II)

VLDB Insufficient Main Memory

VLDB Related Work [SD90] Tradeoffs in processing complex join queries via hashing in multiprocessor database machines. VLDB [CLYY92] Using segmented right deep trees for execution of pipelined hash joins. VLDB [MLD94] Parallel hash based join algorithms for a shared everything environment. TKDE [MD97] Data placement in shared nothing parallel database systems. VLDB [WFA95] Parallel evaluation of multi-join queries. SIGMOD [HCY94] On parallel execution of multiple pipelined hash joins. SIGMOD [DNSS92] Practical skew handling in parallel joins. VLDB [SHCF03] Flux: an adaptive partitioning operator for continuous query systems. ICDE, 2003.

VLDB Conclusions Observation: Maximal pipelined hash join processing  Redirection costs? optimal degree of parallelism? Hypothesis: Worthwhile to incorporate independent parallelism into processing  Both, so several shorter pipelines in parallel Solution: Segmented bushy tree processing  Heuristics and cost-driven algorithm developed Validation : Extensive experimental studies  Achieve around 50% improvement over pure pipelined processing