Download presentation
Presentation is loading. Please wait.
Published byBrittney Lindsey Modified over 9 years ago
1
Meta-MapReduce A Technique for Reducing Communication in MapReduce Computations Foto N. Afrati 1, Shlomi Dolev 2, Shantanu Sharma 2, and Jeffrey D. Ullman 3 1 National Technical University of Athens, Greece 2 Ben-Gurion University of the Negev, Israel 3 Stanford University, USA 17th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS 2017) Canada (18-21 September 2015)
2
Communication Cost: Join of two relations 2 Organization A Map Phase Map Phase Reduce Phase Reduce Phase Outputs Final outputs Organization B
3
Do we need to send the whole database to the cloud before performing join operations? Problem Statement 3
4
Join of two relations 4 Mapper 1 Mapper 2 Mapper 3 Mapper 4 Mapper 5 Mapper 6 b 1, a 1 b 1, a 2 b 2, a 3 b 1, c 1 b 1, c 2 b 3, c 3 Reducer for b 1 Reducer for b 2 Reducer for b 3 The size of all B values is very small as compared to values of A and C Organization B Organization A
5
Join of two relations 5 Mapper 1 Mapper 2 Mapper 3 Mapper 4 Mapper 5 Mapper 6 b 1, b 2, b 1, b 3, Reducer for b 1 Reducer for b 2 Reducer for b 3 Organization B Organization A Make pairs of similar items i.e., make pairs of fruits, daily products, meats
6
The amount of data required to move – from the location of the user to the location of the mappers – from the map to the reduce phases in each iteration of the job Communication Cost 6
7
Do we need to send the whole database to the cloud before performing join operations? NO But then how to get answers?? Work on metadata Problem Statement 7
8
Meta-MapReduce A new algorithmic approach for MapReduce algorithms that decreases the communication cost significantly Work on metadata, which varies according to problems and very small in size as compared to the original database Decreases the communication cost 8
9
Meta-MapReduce 9 Chunk1 Meta- data Meta- data Original input data Step 4: Call Function: Data request and data transmission Step 2: Meta-data transmission Split 1 Split 2 Split m Input meta-data split 1 Mapper for 1 st split split 2 Mapper for 2 nd split Mapper for m th split split m Reducer for k 1 Reducer for k 2 Reducer for k r Output 1 Output 2 Master process Step 1: MapReduce job assignment Step 3: Read and Map tasks’ execution Step 4: Read and Reduce tasks’ execution
10
Users send their metadata Avoids the movement of data that does not participate in the final output The final results now computed using metadata and metadata avoids to upload the whole database Meta-MapReduce 10
11
Amazon EMR Geographically distributed MapReduce computations k-nearest-neighbors problem Shortest part problem in a social graph Multiway join Skyline queries Applications 11
12
Foto Afrati 1, Shlomi Dolev 2, Shantanu Sharma 2, and Jeffrey D. Ullman 3 1 School of Electrical and Computing Engineering, National Technical University of Athens, Greece afrati@softlab.ece.ntua.gr 2 Department of Computer Science, Ben-Gurion University of the Negev, Israel {dolev,sharmas}@cs.bgu.ac.il 3 Department of Computer Science, Stanford University, USA ullman@cs.stanford.edu Presentation is available at http://www.cs.bgu.ac.il/~sharmas/publication.html
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.