Presentation is loading. Please wait.

Presentation is loading. Please wait.

Assignment Problems of Different- Sized Inputs in MapReduce Foto N. Afrati 1, Shlomi Dolev 2, Ephraim Korach 2, Shantanu Sharma 2, and Jeffrey D. Ullman.

Similar presentations


Presentation on theme: "Assignment Problems of Different- Sized Inputs in MapReduce Foto N. Afrati 1, Shlomi Dolev 2, Ephraim Korach 2, Shantanu Sharma 2, and Jeffrey D. Ullman."— Presentation transcript:

1 Assignment Problems of Different- Sized Inputs in MapReduce Foto N. Afrati 1, Shlomi Dolev 2, Ephraim Korach 2, Shantanu Sharma 2, and Jeffrey D. Ullman 3 1 National Technical University of Athens, Greece 2 Ben-Gurion University of the Negev, Israel 3 Stanford University, USA

2 Outline Introduction Problem Statement and Our Contribution All-to-All (A2A) Mapping Schema Problem Heuristics for A2A Mapping Schema Problem X-to-Y (X2Y) Mapping Schema Problem Heuristics for X2Y Mapping Schema Problem Conclusion 2

3 Cluster Computing – Terabytes or Petabytes amount of data cannot be processed easily on a single computer – Cluster of computers – How to mask failures, e.g., hardware failures MapReduce is a programming model used for parallel processing over large-scale data Introduction 3

4 MapReduce job – Map Phase: applies a user-defined Map function – Reduce Phase: applies a user-defined Reduce function Mapper – An application of the Map function to a single input – Provides outputs in the form of  key, value  Reducer – An application of the Reduce function to a single key and its associated list of values Introduction 4

5 5 Worker Master process Worker fork Assign map tasks Assign reduce tasks Read Local write Remote read, sort Output File 0 Output File 1 Write Chunk 0 Chunk 1 Chunk 2 Input Data MapReduce job: Map Phase and Reduce Phase Map Phase: applies a user-defined Map function Reduce Phase: applies a user-defined Reduce function

6 Mapper 1 Reducer for k 1 Reducer for k 2 Reducer for k 3 Mapper 2 Mapper 3 Mapper 4 input 1 k1k1 k2k2 input 2 k1k1 k2k2 input 3 k3k3 input 4 k2k2 k3k3 Introduction MapReduce working 6 Notation k i : key input 1 input 2 input 3 input 4

7 Mapper 1 Reducer for I Mapper 2 1 1 I 1 1 like Introduction MapReduce working example – Word Count 2 2 apple Reducer for like Reducer for apple Reducer for is Reducer for banana Reducer for fruit (I, 2) (like, 2) (apple, 2) (is, 1) (fruit, 1) (banana, 1) I like apple. Apple is fruit. I like banana. 1 1 fruit 1 1 is 1 1 I 1 1 like 1 1 banana

8 Mapper 1 Reducer for I Mapper 2 1 1 I 1 1 like Introduction MapReduce working example – Word Count 2 2 apple Reducer for like Reducer for apple Reducer for is Reducer for banana Reducer for fruit (I, 2) (like, 2) (apple, 2) (is, 1) (fruit, 1) (banana, 1) I like apple. Apple is fruit. I like banana. 1 1 fruit 1 1 is 1 1 I 1 1 like 1 1 banana

9 Mapper 1 Reducer for I Mapper 2 1 1 I 1 1 like Introduction Inputs and outputs in our context 2 2 apple Reducer for like Reducer for apple Reducer for is Reducer for banana Reducer for fruit (I, 2) (like, 2) (apple, 2) (is, 1) (fruit, 1) (banana, 1) I like apple. Apple is fruit. I like banana. 1 1 fruit 1 1 is 1 1 I 1 1 like 1 1 banana Inputs Outputs

10 Values, provided by each mapper, have some sizes (input size) Reduce capacity: an upper bound on the sum of the sizes of the values that are assigned to the reducer Example: reducer capacity to be the size of the main memory of the processors on which reducers run We consider two special matching problems Reducer Capacity 10

11 Mapping Schema Mapping schema is an assignment of the set of inputs to some given reducers, such that – Respect the reducer capacity A reducer is assigned only inputs whose sum is less than or equal to the reducer capacity – Assignment of inputs For every output, it is required to assign every two corresponding inputs to at least one reducer in common 11 Reducer (4GB) Reducer (4GB) M 1 (1GB) M 1 (1GB) M 2 (2GB) M 3 (2GB) Reducer (4GB) Reducer (4GB) M 1 (1GB) M 1 (1GB) M 2 (2GB) M 3 (2GB) M 1 (1GB) M 1 (1GB) M 2 (2GB) M 3 (2GB) Reducer (4GB) Reducer (4GB) Reducer (4GB) Reducer (4GB)

12 State-of-the-Art F. Afrati, A.D. Sarma, S. Salihoglu, and J.D. Ullman, “Upper and Lower Bounds on the Cost of a Map- Reduce Computation,” PVLDB, 2013. Unit input size Reducer Size – Maximum number of inputs that a given reducer can have. Mapping Schema – Respect the reducer capacity – Assignment of inputs 12

13 Outline Introduction Problem Statement and Our Contribution All-to-All (A2A) Mapping Schema Problem Heuristics for A2A Mapping Schema Problem X-to-Y (X2Y) Mapping Schema Problem Heuristics for X2Y Mapping Schema Problem Conclusion 13

14 Problem Statement Communication cost between the map and the reduce phases is a significant factor How we can reduce the communication cost? – A lesser number of reducers, and hence, a smaller communication cost – How to minimize the total number of reducers while respecting their limited capacity? Not an easy task – All-to-All mapping schema problem – X-to-Y mapping schema problem 14 Mapper for 1 st input Reducer for k 1 ( 1, 2 ) Reducer for k 2 ( 1, 3 ) Reducer for k 3 ( 2, 3 ) Mapper for 2 nd input Mapper for 3 rd input input 1 k1k1 k2k2 input 2 k1k1 k3k3 input 3 k2k2 k3k3 Mapper for 1 st input Reducer for k 1 ( 1, 2, 3 ) Mapper for 2 nd input Mapper for 3 rd input input 1 k1k1 input 2 k1k1 input 3 k1k1 input 1 input 2 input 3 input 1 input 2 input 3 Notation k i : key

15 Our Contribution Try to decrease communication cost Two kinds of mapping schema problems: – All-to-All (A2A) mapping schema problem – X-to-Y (X2Y) mapping schema problem Heuristics for mapping schema problems 15

16 Outline Introduction Problem Statement and Our Contribution All-to-All (A2A) Mapping Schema Problem Heuristics for A2A Mapping Schema Problem X-to-Y (X2Y) Mapping Schema Problem Heuristics for X2Y Mapping Schema Problem Conclusion 16

17 A set of inputs is given Each pair of inputs corresponds to one output Example – Computing common friends Lists of friends of m persons are given Find common friends of the given m persons Every two friend lists must be assigned to a single common reducer A2A Mapping Schema Problem 17

18 Mapper for 1 st friend fl 2 fl 3 fl 1 Reducer for k 1 (1, 2, 3, 4) fl 4 Mapper for 2 nd friend Mapper for 3 rd friend Mapper for 4 th friend fl 1 k1k1 fl 2 k1k1 fl 3 k1k1 fl 4 k1k1 Reducer capacity is enough to hold all the friend lists together 18 Notations k i : key fl i : i th friend list 1, 2 1, 3 1, 4 2, 3 2, 4 3, 4 A2A Mapping Schema Problem

19 Mapper for 1 st friend fl 2 fl 3 fl 1 Reducer for k 1 (1, 2, 3) fl 4 Reducer for k 2 (1, 2, 4) Reducer for k 3 (3, 4) Mapper for 2 nd friend Mapper for 3 rd friend Mapper for 4 th friend fl 1 k1k1 k2k2 fl 2 k1k1 k2k2 fl 3 k1k1 k3k3 fl 4 k2k2 k3k3 Reducer capacity is enough to hold some of the friend lists together 19 Notations k i : key fl i : i th friend list 1, 21, 32, 32, 41, 43, 4 A2A Mapping Schema Problem

20 Inputs to the problem – A set of m inputs – A size for each input (w 1, w 2, …, w m ) – A set of reducers (r 1, r 2, …, r z ) – A mapping from outputs to sets of inputs Identical reducer capacity q 20 A2A Mapping Schema Problem

21 What to do? – Assigns the given m inputs to the given number of reducers, without exceeding q, in a manner that every given input is coupled with every other given input in at least one reducer in common Polynomial time solution for one and two reducers NP-hard for z > 2 reducers 21 Reducer (4GB) Reducer (4GB) M 1 (1GB) M 1 (1GB) M 2 (2GB) M 3 (2GB) Cannot assign M 3 Reducer (4GB) Reducer (4GB) M 1 (1GB) M 1 (1GB) M 2 (2GB) M 3 (2GB) Reducer (4GB) Reducer (4GB) M 1 (1GB) M 1 (1GB) Cannot assign M 2 M 3 A2A Mapping Schema Problem

22 22 w 1, w 2, …, w m ai z+1 reducers Subset 1 of I Subset 2 of I Subset z of I 3 reducers q = s Partition problem: (M.R. Garey and D.S. Johnson, "Computers and Intractability: A Guide to the Theory of NP-Completeness," 1979.) A = {3,1,1,2,2,1} A 1 = {1,1,1,2} (1+1+1+2=5) A 2 = {2,3} (2+3=5) A2A Mapping Schema Problem w 1, w 2, …, w m w ai = s/2 Subset 1 of I Subset 2 of I w ai = s/2

23 Outline Introduction Problem Statement and Our Contribution All-to-All (A2A) Mapping Schema Problem Heuristics for A2A Mapping Schema Problem X-to-Y (X2Y) Mapping Schema Problem Heuristics for X2Y Mapping Schema Problem Conclusion 23

24 Heuristics for A2A Mapping Schema Problem Based on – First-Fit Decreasing (FFD) or Best-Fit Decreasing (BFD) bin-packing algorithm – Pseudo-polynomial bin-packing algorithm * – 2-step Algorithms – The selection of a prime number p A fixed reducer capacity is given 24 * D. R. Karger and J. Scott. Efficient algorithms for fixed-precision instances of bin packing and euclidean tsp. In APPROX-RANDOM, pages 104–117, 2008.

25 Heuristics for A2A Mapping Schema Problem 25

26 Heuristics for A2A Mapping Schema Problem Parameters for analysis: – Per input replication – Replication rate, r – Total number of reducers, r(m, q) – Total communication cost, c 26

27 Heuristics for A2A Mapping Schema Problem s is sum of all the input sizes q is the reducers capacity w 2, w 4 w1w1 w1w1 w 3, w m, w 5 w1w1 w1w1 w m-1, w 6 w1w1 w1w1 s-w 1 q-w 1

28 Heuristics for A2A Mapping Schema Problem Case 1- All the input sizes are different – Use First-Fit Decreasing (FFD)* to create x bins (S 1, S 2, …, S x ) of size at most q/2 – Use x(x-1)/2 reducers to assign each bin with each other 28 S1S1 S1S1 S2S2 S2S2 S1S1 S1S1 S3S3 S3S3 S1S1 S1S1 SxSx SxSx S3S3 S3S3 S2S2 S2S2 S4S4 S4S4 S2S2 S2S2 SxSx SxSx S2S2 S2S2 SxSx SxSx S x-1 *D.S. Johnson, Near-optimal bin-packing algorithms, Doctoral thesis, MIT, Cambridge, 1973. S1S1 S1S1 S2S2 S2S2 SxSx SxSx w1w1 w2w2 wmwm w3w3

29 Heuristics for A2A Mapping Schema Problem 29 S1S1 S1S1 S2S2 S2S2 SxSx SxSx Bins of size q/2 Bins are at least half full So, each bin has at least q/4 sized input You can place every two bins at a reducer s is sum of all the input sizes q is the reducers capacity

30 Heuristics for A2A Mapping Schema Problem Case 2 - One input, i, is of size w i, q/2 < w i < q Based on the bin-packing based algorithm Make bins of size q-w i to place all the other inputs except the input i, assign them at reducers for an assignment of the i inputs Make a solution to all the other inputs except the input i 30 wiwi wiwi S1S1 S1S1 wiwi wiwi S2S2 S2S2 wiwi wiwi SxSx SxSx S’ 2 S’ 1 S’ 3 S’ 1 S’ y S’ y-1 S1S1 S1S1 S2S2 S2S2 SxSx SxSx S’ 2 S’ 1 S’ y Size is q – w_i Size is q/2

31 Heuristics for A2A Mapping Schema Problem 31 w1w1 w1w1 w2w2 w2w2 w3w3 w3w3 w4w4 w4w4 w1w1 w1w1 w5w5 w5w5 w6w6 w6w6 w7w7 w7w7 w1w1 w1w1 w m -2 w m -1 wmwm wmwm Each reducer can hold at most k inputs k-1 m-1

32 Heuristics for A2A Mapping Schema Problem Case 3: All the input sizes are identical (q/k, k>1) 4  OPTIMUM recursive algorithm, when k is odd, k>2 – Divide m inputs into two sets, A (of y inputs) and B (of x inputs) – Make y -1 groups, each holds y/2 pairs – Assign each input from B to one of the groups – Perform the same operation on set B 32 y = 4 3 groups of 2 pairs 1 3 2 4 1 4 2 3 1 2 3 4 Group 1 Group 2 Group 3

33 Heuristics for A2A Mapping Schema Problem Case 3: All the input sizes are identical (q/k, k>1): reducer capacity q = 3 – 4  OPTIMUM recursive algorithm, when k>2 is odd 33 Group 6 Group 7 Group 5 m = 15 inputs, each is of q/3 size, k= 3 Set A = {1, 2, …, 8} Set B = {9, 10, …, 15} Divide inputs of the set A into two groups of equal number of inputs Assign each row of every group at a reducers and perform the same method on the set B Assign each row of every group at a reducers and perform the same method on the set B 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 9 9 9 9 9 9 1 1 2 2 3 3 4 4 6 6 7 7 8 8 5 5 10 1 1 2 2 3 3 4 4 7 7 8 8 5 5 6 6 11 1 1 2 2 3 3 4 4 8 8 5 5 6 6 7 7 12 1 1 2 2 5 5 6 6 3 3 4 4 7 7 8 8 13 Group 2Group 1 Group 3 Group 4 1 1 2 2 5 5 6 6 4 4 3 3 8 8 7 7 14 1 1 3 3 5 5 7 7 2 2 4 4 6 6 8 8 15

34 Heuristics for A2A Mapping Schema Problem Case 3: All the input sizes are identical (q/k, k>1) : reducer capacity q = 5 – 4  OPTIMUM recursive algorithm, when k>2 is odd 34 1 2 9 10 17 3 4 11 12 17 5 6 13 14 17 7 8 15 16 17 Group 1 Group 2 1 2 11 12 18 3 4 13 14 18 5 6 15 16 18 7 8 9 10 18 1 2 13 14 19 3 4 15 16 19 5 6 9 10 19 7 8 11 12 19 1 2 7 8 22 3 4 3 4 22 9 10 15 16 22 11 12 13 14 22 1 2 15 16 20 3 4 9 10 20 5 6 11 12 20 7 8 13 14 20 1 2 5 6 21 3 4 7 8 21 9 10 13 14 21 11 12 15 16 21 1 2 3 4 23 5 6 7 8 23 9 10 11 12 23 13 14 15 16 23 Group 3 Group 4 Group 6 Group 7 Group 5 m =23 inputs, each is of q/5 size, k= 5 Set A = {1, 2, …, 16} Set B = {17, 18, …, 23} Assign each row of every group at a reducers and perform the same method on the set B Assign each row of every group at a reducers and perform the same method on the set B

35 Heuristics for A2A Mapping Schema Problem Case 3: All the input sizes are identical (q/k, k>1) : reducer capacity q = 4 – 2  OPTIMUM recursive algorithm, when k is even – Make 2m/k subgroups, and then make 2m/k -1 groups 35 Group 1 Group 2Group 3 Group 4 Group 6 Group 7 Group 5 16 inputs, each is of q/4 size Divide inputs into 8 groups each of 2 inputs 1,2 3,4 5,6 7,8 9,10 11,12 13,14 15,16 1,2 3,4 5,6 7,8 9,10 11,12 13,14 15,16 1,2 3,4 5,6 7,8 11,12 13,14 15,16 9,10 1,2 3,4 5,6 7,8 13,14 15,16 9,10 11,12 1,2 3,4 5,6 7,8 15,16 9,10 11,12 13,14 1,2 3,4 9,10 11,12 5,6 7,8 13,14 15,16 1,2 3,4 9,10 11,12 7,8 5,6 15,16 13,14 1,2 5,6 9,10 13,14 3,4 7,8 11,12 15,16 Work similar to q/3 case

36 Heuristics for A2A Mapping Schema Problem 36

37 Heuristics for A2A Mapping Schema Problem Case 3: All the input sizes are identical (q/k, k>1) – when k is a prime number Extends the approach of AU’13 AU’ 13 provides a solution when m = k 2, where k is a prime number Create k+1 teams and k players (reducers) in each team 37 Foto N. Afrati, Jeffrey D. Ullman: Matching bounds for the all-pairs MapReduce problem. IDEAS 2013: 3-4.

38 Heuristics for A2A Mapping Schema Problem Case 3: All the input sizes are identical (q/k, k>1) m = 3 2 inputs k = 3, q = 3 38 a1a2a3a1a2a3 a4a5a6a4a5a6 a7a8a9a7a8a9 a1a5a9a1a5a9 a4a8a3a4a8a3 a7a2 a6a7a2 a6 a1a8a6a1a8a6 a4a2a9a4a2a9 a7a5a3a7a5a3 a1a4a7a1a4a7 a2a5a8a2a5a8 a3a6a9a3a6a9 Team 0Team 1Team 2Team 3

39 Heuristics for A2A Mapping Schema Problem Case 3: All the input sizes are identical (q/k, k>1) when k is a prime number, q = 3 39 C1C2C3 C4C5C6 C7C8C9 123 456 789 C1 C2 C3 101112 131415 161718 C4 C5 C6 192021 222324 252627 C7 C8 C9 11220 41523 71826 C1 C6 C8 21021 51324 81627 C2 C4 C9 31119 61422 91725 C3 C5 C7 11019 41322 71625 C1 C4 C7 21120 51423 81726 C2 C5 C8 31221 61524 91727 C3 C6 C9 11121 41424 71727 C1 C5 C9 31020 61323 91626 C3 C4 C8 21219 51522 81825 C2 C6 C7 Matrix of columns 12 Reducers

40 Outline Introduction Problem Statement and Our Contribution All-to-All (A2A) Mapping Schema Problem Heuristics for A2A Mapping Schema Problem X-to-Y (X2Y) Mapping Schema Problem Heuristics for X2Y Mapping Schema Problem Conclusion 40

41 Two disjoint sets X and Y are given Each pairs of element  x i, y j  (where x i  X, y j  Y,  i, j) of the sets X and Y corresponds to one output Example – Skew Join Two relations X(A, B) and Y(B, C) are given where lots of tuple have a common “b” value Every tuple with an identical “b” value is required to assign at at least one reducer X2Y Mapping Schema Problem 41

42 Mapper for X(1,2) Reducer for key = 2 Mapper for X(5,2) Mapper for X(9,2) 2 2 2 Mapper for Y(2,5) Mapper for Y(2,4) Mapper for Y(2,7) 2 2 2 X(1, 2 ) Reducer capacity is enough to hold all the tuples whose b = 2 together X(5, 2 ) X(9, 2 ) Y( 2,5) Y( 2,4) Y( 2,7) AB 12 52.. 92 Relation X BC 25 24.. 27 Relation Y X2Y Mapping Schema Problem

43 Reducer for k 1 k1k1 k1k1 k2k2 k1k1 k1k1 k2k2 Reducer capacity is enough to hold some of the tuples of both the relations together k2k2 k2k2 k3k3 k3k3 k3k3 Reducer for k 2 Reducer for k 3 Mapper for X(1,2) Mapper for X(5,2) Mapper for X(9,2) Mapper for Y(2,5) Mapper for Y(2,4) Mapper for Y(2,7) X2Y Mapping Schema Problem

44 Input to the problem – Two sets X and Y of m and n inputs resp. – A size for each input – A set of reducers (r 1, r 2, …, r z ) – A mapping from outputs to sets of inputs Identical reducer capacity q 44

45 X2Y Mapping Schema Problem What to do? – Assigns each input of the set X with each input of the set Y to at least one reducer in common, without exceeding q Polynomial for one reducer – Can we assign all the inputs of the sets X and Y to a single reducer NP-hard for z > 1 reducers 45

46 X2Y Mapping Schema Problem 46 2 reducers s is sum of input sizes of the set X q = w 1 ’ +s/2 z = 2 reducers Set Y Subset 1 of X (s/2) Subset 2 of X (s/2)

47 Outline Introduction Problem Statement and Our Contribution All-to-All (A2A) Mapping Schema Problem Heuristics for A2A Mapping Schema Problem X-to-Y (X2Y) Mapping Schema Problem Heuristics for X2Y Mapping Schema Problem Conclusion 47

48 Heuristics for X2Y Mapping Schema Problem Based on – First-Fit Decreasing (FFD) or Best-Fit Decreasing (BFD) bin-packing algorithm A fixed reducer capacity is given 48

49 Heuristics for X2Y Mapping Schema Problem 49 Case 1- All the input sizes are upper bounded by q/2 in sets X and Y – Both the sets cannot hold inputs of size greater than q/2 – Use FFD to create u = bins of size at most q/2 of the inputs of X v = bins of size at most q/2 of the inputs of Y u1u1 u1u1 v1v1 v1v1 u1u1 u1u1 v2v2 v2v2 u1u1 u1u1vv v1v1 v1v1 u2u2 u2u2 v2v2 v2v2 u2u2 u2u2 vv u2u2 u2u2 uv reducers v1v1 v1v1uu v2v2 v2v2uu vvuu

50 Heuristics for X2Y Mapping Schema Problem Case 2- Inputs of either set are of size at most w, q/2 < w < q – Inputs of the set X are of sizes at most w – Hence, inputs of the set Y are of sizes at most q-w – Use FFD to create u = bins of size at most w of the inputs of X v = bins of size at most q-w of the inputs of Y 50 u1u1 u1u1 v1v1 v1v1 u1u1 u1u1 v2v2 v2v2 u1u1 u1u1vv v1v1 v1v1 u2u2 u2u2 v2v2 v2v2 u2u2 u2u2 vv u2u2 u2u2 uv reducers v1v1 v1v1uu v2v2 v2v2uu vvuu

51 Outline Introduction Problem Statement and Our Contribution All-to-All Mapping Schema Problem X-to-Y Mapping Schema Problem Heuristics for Mapping Schema Problems Conclusion 51

52 Conclusion Reducer capacity – An important parameter to be considered in all MapReduce algorithms – The capacity is in terms of, not necessarily identical, memory auxiliary size, augmented and added to the index of the data item(s) Two assignment schemas of MapReduce are given – All-to-All (A2A) mapping schema problem – X-to-Y (X2Y) mapping schema problem Several heuristics for A2A and X2Y mapping schema problems are provided 52

53 Foto Afrati 1, Shlomi Dolev 2, Ephraim Korach 3, Shantanu Sharma 2, and Jeffrey D. Ullman 4 1 School of Electrical and Computing Engineering, National Technical University of Athens, Greece afrati@softlab.ece.ntua.gr 2 Department of Computer Science, Ben-Gurion University of the Negev, Israel {dolev,sharmas}@cs.bgu.ac.il 3 Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, Israel korach@bgu.ac.il 4 Department of Computer Science, Stanford University, USA ullman@cs.stanford.edu Presentation is available at http://www.cs.bgu.ac.il/~sharmas/publication.html


Download ppt "Assignment Problems of Different- Sized Inputs in MapReduce Foto N. Afrati 1, Shlomi Dolev 2, Ephraim Korach 2, Shantanu Sharma 2, and Jeffrey D. Ullman."

Similar presentations


Ads by Google