Download presentation
Presentation is loading. Please wait.
Published byAngelica Patrick Modified over 6 years ago
1
Assignment Problems of Different-Sized Inputs in MapReduce
Foto N. Afrati1, Shlomi Dolev2, Ephraim Korach2, Shantanu Sharma2 and Jeffrey D. Ullman3 1 National Technical University of Athens, Greece 2 Ben-Gurion University of the Negev, Israel 3 Stanford University, USA
2
Outline Introduction Problem Statement and Our Contribution
All-to-All (A2A) Mapping Schema Problem Heuristics for A2A Mapping Schema Problem X-to-Y (X2Y) Mapping Schema Problem Heuristics for X2Y Mapping Schema Problem Conclusion
3
Outline Introduction Problem Statement and Our Contribution
Inputs and outputs Reducer capacity Mapping schema State-of-the-art Problem Statement and Our Contribution All-to-All (A2A) Mapping Schema Problem Heuristics for A2A Mapping Schema Problem X-to-Y (X2Y) Mapping Schema Problem Heuristics for X2Y Mapping Schema Problem Conclusion
4
Inputs and outputs in our context
Introduction Outputs Inputs and outputs in our context 1 I Reducer for I (I, 2) (like, 2) (apple, 2) (is, 1) (fruit, 1) (banana, 1) 1 like 2 apple Reducer for like Inputs I like apple. Apple is fruit. Mapper 1 1 is 1 fruit Reducer for apple Reducer for is 1 I Reducer for fruit Mapper 2 I like banana. 1 like 1 banana Reducer for banana
5
We consider two special matching problems
Reducer Capacity (q) Values, provided by each mapper, have some sizes (input size) Machines have bounded memory Reducer capacity: an upper bound on the sum of the sizes of the values that are assigned to the reducer Example: reducer capacity to be the size of the main memory of the processors on which reducers run We consider two special matching problems
6
Mapping Schema Mapping schema is an assignment of the set of inputs to some given reducers, such that Respect the reducer capacity A reducer is assigned only inputs whose sum is less than or equal to the reducer capacity Assignment of inputs For every output, it is required to assign every two corresponding inputs to at least one reducer in common M1 (1GB) M2 (2GB) M3 (2GB) M1 (1GB) M1 (1GB) M2 (2GB) M2 (2GB) M3 (2GB) M3 (2GB) Reducer (4GB) Reducer (4GB) Reducer (4GB) Reducer (4GB)
7
State-of-the-Art Unit input size Reducer Size Mapping Schema
F. Afrati, A.D. Sarma, S. Salihoglu, and J.D. Ullman, “Upper and Lower Bounds on the Cost of a Map- Reduce Computation,” PVLDB, 2013. Unit input size Reducer Size Maximum number of inputs that a given reducer can have. Mapping Schema Respect the reducer capacity Assignment of inputs
8
Outline Introduction Problem Statement and Our Contribution
All-to-All (A2A) Mapping Schema Problem Heuristics for A2A Mapping Schema Problem X-to-Y (X2Y) Mapping Schema Problem Heuristics for X2Y Mapping Schema Problem Conclusion
9
Problem Statement Notation ki: key Communication cost between the map and the reduce phases is a significant factor How we can reduce the communication cost? A lesser number of reducers, and hence, a smaller communication cost How to minimize the total number of reducers while respecting their limited capacity? Not an easy task All-to-All mapping schema problem X-to-Y mapping schema problem Reducer for k1 (1, 2) input1 Mapper for 1st input input1 Mapper for 1st input input1 k1 input1 k1 input1 k2 input2 input2 Reducer for k2 (1, 3) Mapper for 2nd input Reducer for k1 (1, 2, 3) Mapper for 2nd input input2 k1 input2 k1 input2 k3 input3 input3 Mapper for 3rd input Mapper for 3rd input input3 k2 Reducer for k3 (2, 3) input3 k1 input3 k3
10
Our Contribution Reducer capacity Try to decrease communication cost
An important parameter to be considered in MapReduce algorithms All inputs do not necessarily have identical size Try to decrease communication cost Two types of mapping schema problems: All-to-All (A2A) mapping schema problem X-to-Y (X2Y) mapping schema problem Lower and upper bounds on the communication cost
11
Outline Introduction Problem Statement and Our Contribution
All-to-All (A2A) Mapping Schema Problem Heuristics for A2A Mapping Schema Problem X-to-Y (X2Y) Mapping Schema Problem Heuristics for X2Y Mapping Schema Problem Conclusion
12
A2A Mapping Schema Problem
A set of inputs is given Each pair of inputs corresponds to one output Example Computing common friends Lists of friends of m persons are given Find common friends of the given m persons Every two friend lists must be assigned to a single common reducer
13
A2A Mapping Schema Problem
Inputs w1 = w2 = w3 = 0.20q, w4 = w5 = 0.19q, w6 = w7 = 0.18q One way Another way Group inputs such that size of a group is no more than q/2 .22q is misused w1, w2 w3, w4 w5, w6 w7 w1, w2 w3, w4 w3, w4 w5, w6 w1, w2, w3, w4, w7 w1, w2 w5, w6 w3, w4 w7 w1, w2, w5, w6, w7 w1, w2 w7 w5, w6 w7 w3, w4, w5, w6, w7 3 reducers and optimum communication cost 6 reducers and non-optimum communication cost
14
A2A Mapping Schema Problem
What to do? Assigns the given m inputs to the given number of reducers, without exceeding q, in a manner that every given input is coupled with every other given input in at least one reducer in common Polynomial time solution for one and two reducers NP-hard for z > 2 reducers Reduction from the z-partition problem
15
Outline Introduction Problem Statement and Our Contribution
All-to-All (A2A) Mapping Schema Problem Heuristics for A2A Mapping Schema Problem X-to-Y (X2Y) Mapping Schema Problem Heuristics for X2Y Mapping Schema Problem Conclusion
16
Heuristics for A2A Mapping Schema Problem
Two cases: All the inputs are upper bounded by q 2 Exactly one input size, wi > q 2
17
Heuristics for A2A Mapping Schema Problem
Sx S4 Sx-1 Case 1- All the input sizes are different Use a bin-packing algorithm to create x bins (S1, S2, …, Sx) of size at most q/2 Use x(x-1)/2 reducers to assign each bin with each other w1 w2 w3 wm S1 S2 Sx
18
Heuristics for A2A Mapping Schema Problem
Case 2 - One input, i, is of size wi, q/2 < wi < q Based on the bin-packing based algorithm Make bins of size q-wi to place all the other inputs except the input i, assign them at reducers for an assignment of the i inputs Make a solution to all the other inputs except the input i wi S1 wi Hence, all the remaining inputs must be q-wi S2 wi Sx S’1 S’2 S’1 S’3 S1 S2 Sx Size is q – wi S’y-1 S’y S’1 S’2 S’y Size is q/2
19
Outline Introduction Problem Statement and Our Contribution
All-to-All (A2A) Mapping Schema Problem Heuristics for A2A Mapping Schema Problem X-to-Y (X2Y) Mapping Schema Problem Heuristics for X2Y Mapping Schema Problem Conclusion
20
X2Y Mapping Schema Problem
Two disjoint sets X and Y are given Each pairs of element xi, yj (where xi X, yj Y, i, j) of the sets X and Y corresponds to one output Example Skew Join Two relations X(A, B) and Y(B, C) are given where lots of tuple have a common “b” value Every tuple with an identical “b” value is required to assign to at least one reducer
21
X2Y Mapping Schema Problem
w1=w2=0.25q, w3=w4=0.24q, w5=w6=0.23q, w7=w8=0.22q, w9=w10=0.21q, w11=w12=0.20q Inputs of set 𝑋 Inputs of set 𝑌 𝑤 1 ′ = 𝑤 2 ′ =0.25𝑞, 𝑤 3 ′ = 𝑤 4 ′ =0.24𝑞 One way Another way Group inputs such that size of a group is no more than q/2 12 reducers Make groups by taking three inputs from X 16 reducers w1, w2 w3, w4 w5, w6 w7, w8 w1, w2, w3 w9, w10 𝑤 1 ′ w11, w12 w1, w2, w3 𝑤 3 ′ w4, w5, w6 𝑤 1 ′ w4, w5, w6 𝑤 3 ′ 𝑤 1 ′ , 𝑤 2 ′ 𝑤 3 ′ , 𝑤 4 ′ w1, w2 𝑤 1 ′ , 𝑤 2 ′ w1, w2 𝑤 3 ′ , 𝑤 4 ′ w7, w8, w9 𝑤 1 ′ w7, w8, w9 𝑤 3 ′ w3, w4 𝑤 1 ′ , 𝑤 2 ′ w3, w4 𝑤 3 ′ , 𝑤 4 ′ w10, w11, w12 𝑤 1 ′ w10, w11, w12 𝑤 3 ′ w5, w6 𝑤 1 ′ , 𝑤 2 ′ w5, w6 𝑤 3 ′ , 𝑤 4 ′ w1, w2, w3 𝑤 2 ′ w1, w2, w3 𝑤 4 ′ w7, w8 𝑤 1 ′ , 𝑤 2 ′ w7, w8 𝑤 3 ′ , 𝑤 4 ′ w4, w5, w6 𝑤 2 ′ w4, w5, w6 𝑤 4 ′ w9, w10 𝑤 1 ′ , 𝑤 2 ′ w9, w10 𝑤 3 ′ , 𝑤 4 ′ w7, w8, w9 𝑤 2 ′ w7, w8, w9 𝑤 4 ′ w11, w12 𝑤 1 ′ , 𝑤 2 ′ w11, w12 𝑤 3 ′ , 𝑤 4 ′ w10, w11, w12 𝑤 2 ′ w10, w11, w12 𝑤 4 ′
22
X2Y Mapping Schema Problem
What to do? Assigns each input of the set X with each input of the set Y to at least one reducer in common, without exceeding q Polynomial time solution for one reducer Can we assign all the inputs of the sets X and Y to a single reducer NP-hard for z > 1 reducers Reduction from the z-partition problem
23
Outline Introduction Problem Statement and Our Contribution
All-to-All (A2A) Mapping Schema Problem Heuristics for A2A Mapping Schema Problem X-to-Y (X2Y) Mapping Schema Problem Heuristics for X2Y Mapping Schema Problem Conclusion
24
Heuristics for X2Y Mapping Schema Problem
v1 v2 vv u2 uv reducers uu Based on Bin-packing algorithm Inputs of either set are of size at most w, q/2 < w < q Inputs of the set X are of sizes at most w Hence, inputs of the set Y are of sizes at most q-w Use bin-pack algorithm to create u = bins of size at most w of the inputs of X v = bins of size at most q-w of the inputs of Y
25
Outline Introduction Problem Statement and Our Contribution
All-to-All Mapping Schema Problem X-to-Y Mapping Schema Problem Heuristics for Mapping Schema Problems Conclusion
26
Conclusion Reducer capacity
An important parameter to be considered in MapReduce algorithms All inputs do not necessarily have identical size Reducer capacity is equal to the sum of sizes of inputs Two assignment schemas of MapReduce are given All-to-All (A2A) mapping schema problem X-to-Y (X2Y) mapping schema problem Lower and upper bounds on the communication cost
27
Presentation is available at
Foto Afrati1, Shlomi Dolev2, Ephraim Korach3, Shantanu Sharma2, and Jeffrey D. Ullman4 1 School of Electrical and Computing Engineering, National Technical University of Athens, Greece 2 Department of Computer Science, Ben-Gurion University of the Negev, Israel 3 Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, Israel 4 Department of Computer Science, Stanford University, USA
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.