Efficient Join Query Evaluation in a Parallel Database System
Distributed Multi-Join Query Evaluation n relations , k join attributes p servers connected by a network data is partitioned uniformly on the servers the problem can be divided into two sub-problems: how to efficiently join local data items on each server? how to deliver all data items between servers?
Distributed Shuffle Algorithms Broadcast Shuffle Hash-Based Shuffle HyperCube Shuffle
Broadcast Shuffle S11 S21 S21 S21 S21 S31 S31 S31 S31 P1 S14 S24 S24
Hash-Based Shuffle continuing for another round… S11 S121 S21 S1231 P1 S14 S124 S24 S1234 S34 S12 S122 S22 S1232 S32 P4 P2 S13 S23 S123 S1233 S33 continuing for another round… P3
HyperCube Shuffle S1(x1 = 2, x2 = 4) S1(x1 = 2, x2 = 4)
HyperCube Shuffle – contd.
HyperCube – Optimal Shares Factorization Recall that the dimensions of the hypercube are determined by shares How do we find the optimal factorization? If , the solution is optimal What about other cases? Rounding works poorly (e.g., )
Optimal Shares Factorization – The Intuition check all possible combinations of shares which satisfy select the combination with the smallest maximum workload defined as the maximum amount of data assigned to a single worker can be easily computed for each configuration break ties by choosing a hypercube with more evenly sized edges
Optimal Shares Factorization – The Algorithm best_config = ; curr_min = 0 for each configuration such that if curr_min: curr_min = best_workload = else if == curr_min: best_config = configuration with more even dimensions return best_config
Sequential Join Algorithms Centralized algorithms Used within a single node We examine two algorithms of this type: Binary Symmetric Hash Join Tributary Multi-Way Join
Binary Symmetric Hash Join x1 x2 2 3 5 1 4 6 x1 x2 x3 2 1 3 4 5 6 x1 x2 x3 2 3 4 x2 x3 1 2 3 5 4 6 x1 x3 2 1 4 3 5 6
Tributary Multi-Way Join binary search (x3 = 1)… binary search (x2 = 0)… S1 S2 S3 S123 x1 x2 2 3 5 1 4 6 x2 x3 1 2 3 5 4 6 x2 2 3 4 5 x1 x3 2 1 4 3 5 6 x3 2 4 3 5 x1 x2 x3 2 3 4 x1 x2 x3 x1 x2 x3 2 3 4 x1 x2 x3 2 3 4
Attributes Order Optimization Tributary Join requires the optimizer to choose a global order of all attributes that participate in the join Bad orderings may lead to extremely bad performance (similarly to a classic join ordering problem..) A cost model is required to allow the optimizer to compare between different orders
Tributary Join Cost Model Intuitively, the most expensive step is the binary search We will estimate the number of binary searches during the join operation A cost function for each step of attribute order selection will be given, based on the number of unique values of each attribute in the current relation
Tributary Join Cost Model – Definitions Let: - an order on the join attributes - the cost of the i-th step of - the projection of attributes on , i.e., a subset of which appears in - the number of unique values of an attribute in - the number of unique combinations of in
Tributary Join Cost Function Then the cost of an order is calculated as follows: With the following definition of a query cost:
Empirical Evaluation We compare between the different combinations of the above algorithms 3 shuffle algorithms × 2 sequential algorithms = 6 combinations Different queries with various topologies of the query graph: cycle – each relation is joined with two other relations, query graph forms a circle clique – each relation is joined with other n-1 relations acyclic etc.
Empirical Evaluation - Queries The paper presents evaluation of 8 queries We only discuss three most interesting of them: cycle query with large intermediate results clique query with large intermediate results acyclic query with small intermediate results
Cycle Query with Large Intermediate Results RS_HJ – Regular Shuffle with Hash Join BR_TJ – Broadcast Shuffle with Tributary Join RS_TJ – Regular Shuffle with Tributary Join HC_HJ – HyperCube Shuffle with Hash Join BR_HJ – Broadcast Shuffle with Tributary Join HC_TJ – HyperCube Shuffle with Tributary Join
Cycle Query - Analysis HyperCube shuffles less data, since it doesn’t send intermediate join results Additionally, HyperCube achieves good load balance and much less skew in data distribution Tribulary Join outperforms a tree of binary joins because it avoids generating a huge number of intermediate results However, it requires HyperCube to fully exploit its potential, since: 1)it requires all the input relations 2)the sorting doesn’t scale well
Clique Query with Large Intermediate Results RS_HJ – Regular Shuffle with Hash Join BR_TJ – Broadcast Shuffle with Tributary Join RS_TJ – Regular Shuffle with Tributary Join HC_HJ – HyperCube Shuffle with Hash Join BR_HJ – Broadcast Shuffle with Tributary Join HC_TJ – HyperCube Shuffle with Tributary Join
Clique Query - Analysis Similarly to the previous query, the combination HC_TJ is the best in terms of query runtime, total CPU time and total data shuffled Broadcast Shuffle performs poorly, since every join involves at least a single full relation due to large number of join attributes in each relation, no such problem for Regular Shuffle
Acyclic Query with Small Intermediate Results RS_HJ – Regular Shuffle with Hash Join BR_TJ – Broadcast Shuffle with Tributary Join RS_TJ – Regular Shuffle with Tributary Join HC_HJ – HyperCube Shuffle with Hash Join BR_HJ – Broadcast Shuffle with Tributary Join HC_TJ – HyperCube Shuffle with Tributary Join
Acyclic Query - Analysis Since the intermediate results are small, Regular Shuffle sends significantly less data than do HyperCube and Broadcast (which send base data only) For the same reason, with HyperCube and Broadcast each worker processes much more data locally Due to data sorting step of Tributary Join it performs poorly for large inputs as compared to Hash Join
Experimental Results Summary There is no overall best query plan HC_TJ combination outperforms the others in presence of large intermediate results or significant data skew When the intermediate results are small, and there is no significant skew, the traditional join techniques lead to best performance