Download presentation
Presentation is loading. Please wait.
Published byErika Cain Modified over 9 years ago
1
From Theory to Practice: Efficient Join Query Processing in a Parallel Database System Shumo Chu, Magdalena Balazinska and Dan Suciu Database Group, CSE, University of Washington
2
In industry and science, users need to analyze large datasets In industry and science, users need to analyze large datasets Myria: Parallel DBMS developed at UW Myria: Parallel DBMS developed at UW New class of queries New class of queries Two key differences: Two key differences: Multiple tables need to be joined Multiple tables need to be joined Query structure may be cyclic Query structure may be cyclicMotivation 1 Knowledge base exploration Social network analysis: find all triangles
3
Traditional Parallel Join Evaluation Shuffle A, B on y A B Worker 1 A B Worker 2 A B Worker 3 A’ B’ Worker 1 ⋈ A’ B’ Worker 2 ⋈ A’ B’ Worker 3 ⋈ A’ ⋈ B’ c’ Worker 1 ⋈ A’ ⋈ B’ c’ Worker 2 ⋈ A’ ⋈ B’ c’ Worker 3 ⋈ Shuffle A ⋈ B, C on (x, z) 2 A⋈B⋈CA⋈B⋈C A ⋈ B C ⋈ Solution 1: Shuffle on joined attributes Large intermediate result Skew on shuffle Solution 2: keep largest table, broadcast others
4
Background: HyperCube (Shares) Shuffle A ⋈ B C T(x, y, z) :- A(x, y), B(y, z), C(z, x) C A C B C A C B C A C B …… Pworkers A(x 1, y 1 ) (h 1 (x 1 ), h 2 (y 1 ), *) P 1/3 replication B(y 1, z 1 ) (*, h 2 (y 1 ), h 3 (z 1 )) P 1/3 replication C(z 1, x 1 ) (h 1 (x 1 ), *, h 3 (z 1 )) P 1/3 replication 3 Afrati and Ullman EDBT10 Beame et. PODS13 P 1/3 x y z
5
Single Node Multiway Join Join algorithm with optimal guarantees Join algorithm with optimal guarantees Leapfrog TrieJoin by Veldhuizen, 2014 Leapfrog TrieJoin by Veldhuizen, 2014 Minesweeper by Ngo etc, 2014 Minesweeper by Ngo etc, 2014 Pipeline of joins Single multiway join Pipeline of joins Single multiway join Tributary Join : Leapfrog TrieJoin in Myria Tributary Join : Leapfrog TrieJoin in Myria A multiway sort-merge join on steroid A multiway sort-merge join on steroid Avoid constructing tries compared with Leapfrog Avoid constructing tries compared with Leapfrog 4 xy 20 21 23 34 42 56 yz 01 20 23 34 42 56 xz 02 10 24 32 43 65 A B C T(x, y, z) :- A(x, y), B(y, z), C(z, x)
6
Questions Empirical study of HyperCube shuffle and Tributary join Empirical study of HyperCube shuffle and Tributary join HyperCube configuration optimization HyperCube configuration optimization Tributary join cost model and attribute order optimization Tributary join cost model and attribute order optimization 5
7
Empirical Study 6 Myria deployment with 64 workers. Shuffle paradigms: Regular shuffles Regular shuffles HyperCube shuffle HyperCube shuffle Broadcast Broadcast Local join algorithms: Symmetric hash join Symmetric hash join Tributary join Tributary join Parallel semi-join Evaluate 8 queries on Twitter social graph and Freebase
8
Triangle Query on Twitter Query: Query: T(x, y, z) :- A(x, y), B(y, z), C(z, x) Dataset: Dataset: Sampled twitter social network graph with 1 million edges Sampled twitter social network graph with 1 million edges (follower:int, followee:int) (follower:int, followee:int) 7
9
Triangle Query: Data Shuffling HyperCube Shuffle (12M Total) HyperCube Shuffle (12M Total) 8 A B A⋈BA⋈B C A⋈B⋈CA⋈B⋈C #: 1M Skew:1.35 #: 1M Skew:1.72 #: 51M Skew:20.8 #: 1M Skew: 1.01 T(x, y, z) :- A(x, y), B(y, z), C(z, x) Regular Shuffle (54M Total) Regular Shuffle (54M Total) AB C A⋈B⋈CA⋈B⋈C # 4M Skew: 1.06 # 4M Skew: 1.06 # 4M Skew: 1.06 Broadcast (142M, no skew) Broadcast (142M, no skew)
10
Triangle Query: Runtime 9 Query Runtime (Sec) Shuffle paradigm: HyperCube < Broadcast < Regular Sequential join: Tributary Join < Hash Join T(x, y, z) :- A(x, y), B(y, z), C(z, x) HyperCube Broadcast Regular
11
Query 2: Knowledge Base Exploration Query Query: Show the full cast members of all films starring both Joe Pesci and Robert de Niro Query: Show the full cast members of all films starring both Joe Pesci and Robert de Niro 10 Dataset: FreeBase RDF, data is partitioned into separate tables by its predicateDataset: FreeBase RDF, data is partitioned into separate tables by its predicate CastMember(cast):- ActorName(a1, “Joe Pesci”), ActorPerform(a1, p1), PerformFilm(p1, film), ActorName(a2, “Robert de Niro”), ActorPerform(a2, p2), PerformFilm(p2, film), PerformFilm(p, film), ActorPerform(p, cast)
12
Freebase Query: Data Shuffling 11 Regular shuffles: 7M tuples Regular shuffles: 7M tuples HyperCube shuffle:105M tuples (16x replication) HyperCube shuffle:105M tuples (16x replication) Broadcast: 351M tuples (50x replication) Broadcast: 351M tuples (50x replication) R1 R2 R3 ⋈ R5 R6 R7 R8 ⋈ ⋈ ⋈ ⋈ ⋈ ⋈ 26 1.09M 1.10M 2 1.09M 1.10M 660 25.2K 140 10.3K Regular shuffle 8-way join on freebase 1
13
Knowledge Exploration in Freebase Comparing shuffle paradigms: Regular < HyperCube < Broadcast Regular < HyperCube < Broadcast Comparing sequential join algorithms: Hash join < Tributary join Hash join < Tributary join 12 Query Runtime (sec) 8-way join on freebase
14
Empirical Study Summary The best query plan depends on query, data and cluster The best query plan depends on query, data and cluster Size of intermediate result Size of intermediate result Replication factor of HyperCube Replication factor of HyperCube Large intermediate results favor HyperCube and Tributary Join Large intermediate results favor HyperCube and Tributary Join Small communication Small input Reducing sorting time Small communication Small input Reducing sorting time 13
15
Optimizing HyperCube Shuffle Optimization goal: minimizing maximum load of single worker Optimization goal: minimizing maximum load of single worker Example: Q1 with 64 workers 4x4x4 is better than 2x4x8 Example: Q1 with 64 workers 4x4x4 is better than 2x4x8 What if we have 63 workers or a 7 way join? What if we have 63 workers or a 7 way join? State of the art: Linear Programming (BeameKS, PODS13) State of the art: Linear Programming (BeameKS, PODS13) If |A| = |B| = |C| = N, 63 servers, optimal is 3.98 x 3.98 x 3.98 If |A| = |B| = |C| = N, 63 servers, optimal is 3.98 x 3.98 x 3.98 The penalty of rounding down is non-negligible The penalty of rounding down is non-negligible 3x3x3 only use 27 servers out of 63 3x3x3 only use 27 servers out of 63 14
16
A Simple Yet Effective Algorithm for HyperCube Configuration Algorithm: Algorithm: 1. Enumerate all the hypercube configurations with number of servers ≤ P 2. find the configuration with minimal shuffle cost Tie-breaking heuristic: 1x16 vs 4x4 Tie-breaking heuristic: 1x16 vs 4x4 Best configuration of previous example: 3x4x5 Best configuration of previous example: 3x4x5 15
17
Evaluation of HyperCube Optimization Compare different configuration algorithms Compare different configuration algorithms Our Algorithm Our Algorithm Rounding down Rounding down Random (many virtual servers real servers) Random (many virtual servers real servers) Opt. Ratio: Max Load / Optimal (by LP Solution) Opt. Ratio: Max Load / Optimal (by LP Solution) Our algorithm outperforms rounding down and random, with at most 1.06 optimality ratio Our algorithm outperforms rounding down and random, with at most 1.06 optimality ratio 16
18
More in the paper Tributary join cost model and attribute order optimization Tributary join cost model and attribute order optimization Evaluation of more queries Evaluation of more queries Comparison with parallel semi-join plans Comparison with parallel semi-join plans Open source implementation in Myria: Open source implementation in Myria:https://github.com/uwescience/myria 17
19
Conclusions Efficient parallel join query evaluation - break down the gap between theory and practice: Select the best parallel query plan Select the best parallel query plan Shuffle paradigm Shuffle paradigm Sequential join algorithm Sequential join algorithm Optimal HyperCube configuration Optimal HyperCube configuration Optimizing Tributary join attribute order Optimizing Tributary join attribute order 18
20
Thanks! Myria Team 19
21
Conclusions Efficient parallel join query evaluation - break down the gap between theory and practice: Select the best parallel query plan Select the best parallel query plan Shuffle paradigm Shuffle paradigm Sequential join algorithm Sequential join algorithm Optimal HyperCube configuration Optimal HyperCube configuration Optimizing Tributary join attribute order Optimizing Tributary join attribute order 20
22
Query execution profiling PerfOpticon: the visual query profiling tool used in Myria PerfOpticon: the visual query profiling tool used in Myria 21
23
Cost Model Explained query: query: Number of binary searches in first attribute: Number of binary searches in first attribute: Number of binary searches in a joined attribute: Number of binary searches in a joined attribute: The total cost The total cost 22
24
Why random HyperCube cell allocation is bad? Query: Query: A(x, y, z, p) :- S(x, y), R(y, z), T(z, p) 64 cells, 8 x 8 hypercube of cells, randomly allocate cells to 4 servers 64 cells, 8 x 8 hypercube of cells, randomly allocate cells to 4 servers Server 1 will receive Server 1 will receive 7/8 of S (1/2 if optimal) 7/8 of S (1/2 if optimal) 1/4 of R 1/4 of R 7/8 of T (1/2 if optimal) 7/8 of T (1/2 if optimal) 23
25
Myria: new generation parallel DBMS MyriaX Coordinator REST Server Worker Catalog … JSON query plans & other instructions RDBMS Worker Catalog RDBMS Worker Catalog RDBMS HDFS Shared- nothing cluster Primary data store: Can also ingest data from: 24
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.