Upper and Lower Bounds on the cost of a Map-Reduce Computation

Slides:



Advertisements
Similar presentations
Boosting Textual Compression in Optimal Linear Time.
Advertisements

Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Price Of Anarchy: Routing
Divide and Conquer. Subject Series-Parallel Digraphs Planarity testing.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Noga Alon Institute for Advanced Study and Tel Aviv University
Optimization of Pearl’s Method of Conditioning and Greedy-Like Approximation Algorithm for the Vertex Feedback Set Problem Authors: Ann Becker and Dan.
The number of edge-disjoint transitive triples in a tournament.
Rajat K. Pal. Chapter 3 Emran Chowdhury # P Presented by.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Jeffrey D. Ullman Stanford University. 2 Formal Definition Implementation Fault-Tolerance Example: Join.
Jeffrey D. Ullman Stanford University.  Mining of Massive Datasets, J. Leskovec, A. Rajaraman, J. D. Ullman.  Available for free download at i.stanford.edu/~ullman/mmds.html.
GRAPH Learning Outcomes Students should be able to:
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
Foto Afrati — National Technical University of Athens Anish Das Sarma — Google Research Semih Salihoglu — Stanford University Jeff Ullman — Stanford University.
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.
Paper_topic: Parallel Matrix Multiplication using Vertical Data.
Great Theoretical Ideas in Computer Science for Some.
Algorithms for hard problems Parameterized complexity Bounded tree width approaches Juris Viksna, 2015.
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
Dr Nazir A. Zafar Advanced Algorithms Analysis and Design Advanced Algorithms Analysis and Design By Dr. Nazir Ahmad Zafar.
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
Jeffrey D. Ullman Stanford University.  A real story from CS341 data-mining project class.  Students involved did a wonderful job, got an “A.”  But.
Trees.
Assignment Problems of Different- Sized Inputs in MapReduce Foto N. Afrati 1, Shlomi Dolev 2, Ephraim Korach 2, Shantanu Sharma 2, and Jeffrey D. Ullman.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Representing Relations Using Digraphs
Algorithm Design Techniques, Greedy Method – Knapsack Problem, Job Sequencing, Divide and Conquer Method – Quick Sort, Finding Maximum and Minimum, Dynamic.
Advanced Sorting 7 2  9 4   2   4   7
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Auburn University
Theory of MapReduce Algorithms
Applied Discrete Mathematics Week 2: Functions and Sequences
Chapter 9 (Part 2): Graphs
Copyright © Zeph Grunschlag,
Advanced Algorithms Analysis and Design
Divide-and-Conquer 6/30/2018 9:16 AM
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Data Partition Dr. Xiao Qin Auburn University.
Graph theory Definitions Trees, cycles, directed graphs.
Streaming & sampling.
Haim Kaplan and Uri Zwick
Assignment Problems of Different-Sized Inputs in MapReduce
Lecture 18: Uniformity Testing Monotonicity Testing
Chapter 5. Optimal Matchings
Distinct Distances in the Plane
Theory of MapReduce Algorithms
James B. Orlin Presented by Tal Kaminker
Nikhil Bansal, Shashwat Garg, Jesper Nederlof, Nikhil Vyas
Craig Schroeder October 26, 2004
ICS 353: Design and Analysis of Algorithms
Instructor: Shengyu Zhang
Bin Sort, Radix Sort, Sparse Arrays, and Stack-based Depth-First Search CSE 373, Copyright S. Tanimoto, 2002 Bin Sort, Radix.
RS – Reed Solomon List Decoding.
Divide-and-Conquer 7 2  9 4   2   4   7
Introduction Wireless Ad-Hoc Network
Prabhas Chongstitvatana
Bin Sort, Radix Sort, Sparse Arrays, and Stack-based Depth-First Search CSE 373, Copyright S. Tanimoto, 2001 Bin Sort, Radix.
Algorithms (2IL15) – Lecture 7
The Selection Problem.
Divide-and-Conquer 7 2  9 4   2   4   7
Presentation transcript:

Upper and Lower Bounds on the cost of a Map-Reduce Computation Based on an article by Foto N. Afrati, Anish Das Sarma, Semih Salihoglu, Jeffrey D. Ullman Images taken from slides by same authors.

Agenda MapReduce – a brief overview Communication / Parallelism Tradeoff Model Motivational Example Problem Model & Assumptions Recipe for Lower Bounds Known problems Word Count Hamming-Distance-1 Problem Triangle Finding Finding Instances of Other Graphs Matrix Multiplication Summary

Agenda MapReduce – a brief overview Communication / Parallelism Tradeoff Model Motivational Example Problem Model & Assumptions Recipe for Lower Bounds Known problems Word Count Hamming-Distance-1 Problem Triangle Finding Finding Instances of Other Graphs Matrix Multiplication Summary

MapReduce - Overview A programming paradigm for processing large amounts of data with a parallel and distributed algorithm. Consists of a map() function An operation applied to all elements of a sequence, desirably in parallel. Runs on Map processors of a MapReduce implementation. And a reduce() function An operation that performs a summary (counting, averaging) on all the elements. Runs on Reduce processors of a MapReduce implementation. Implementations are available such as Apache Hadoop. Let’s look at the canonical example…

Input is partitioned into lines/files and sent to the Mappers

Mappers perform the map() function and output key-value pairs in the form <word, 1>.

In the shuffling phase data is sorted and sent to the Reducers based on the key.

The Reducers perform the reduce() function (sum) over all the key-value pairs and output a new key-value pair.

New key-value pairs are aggregated into the final result.

Agenda MapReduce – a brief overview Communication / Parallelism Tradeoff Model Motivational Example Problem Model & Assumptions Recipe for Lower Bounds Known problems Word Count Hamming-Distance-1 Problem Triangle Finding Finding Instances of Other Graphs Matrix Multiplication Summary

The Drug Interaction Problem 3000 sets of drug data (patients taking, dates, diagnoses). About 1M of data per drug. We are interested whether there are 2 drugs that when taken together increase the risk of heart attack. Naturally we cross reference every pair of drugs across whole set of drugs.

The Drug Interaction Problem

The Drug Interaction Problem We have a problem: for 3000 drugs, each set of drug data is replicated 2999 times. If each set of data is 1M large we have ~9000GB of communication. Communication cost is too high! We purpose a different approach: grouping drugs…

The Drug Interaction Problem

The Drug Interaction Problem We group 3000 drugs to 30 groups: 𝐺 1 from 1 to 100, 𝐺 2 from 101 to 200, …. , 𝐺 30 from 2901 to 3000. Each set of drug data is only replicated 29 times, which means 87GB communication cost vs 9000GB. But lower parallelism, higher processing cost!

Agenda MapReduce – a brief overview Communication / Parallelism Tradeoff Model Motivational Example Problem Model & Assumptions Recipe for Lower Bounds Known problems Word Count Hamming-Distance-1 Problem Triangle Finding Finding Instances of Other Graphs Matrix Multiplication Summary

Problem Model & Assumptions Assume that the Map phase is parallel. We will discuss the tradeoff in the Reduce phase. We will focus on single-round MR applications. We want to maximize parallelism, which means more Reduce processors while each processor gets smaller amounts of data. We also want to minimize communication costs, which means less traffic between Map and Reduce processors. But the more parallelism we got, the bigger the communication overhead is because we need to transfer data to more nodes.

Problem Model & Assumptions Let’s agree that input / output terminology is relative to the reduce phase. For the purposes of the model let us define a problem which consists of: Sets of inputs and outputs. A mapping from outputs to sets of inputs. Each output depends on only the set of inputs it is mapped to. We need to limit ourselves to finite sets of inputs and outputs.

Problem Model & Assumptions We define reducer as a reduce key together with its list of associated values. Reducer size is the upper bound on how long the list of values can be – the maximum number of inputs that can be sent to any one reducer. Let us denote reducer size q.

Problem Model & Assumptions We define replication rate as the average number of key value pairs (reducers) to which each input is mapped by the mappers. Let us denote replication rate r. Replication rate is a function of reducer size: 𝑟 = 𝑓(𝑞). Replication rate represents the expected communication if we multiply it by the number of inputs actually present.

Problem Model & Assumptions A mapping schema for a given problem is an assignment of a set of inputs to each reducer, subject to the constraints that: No reducer is assigned more than q inputs For every output, there is at least one reducer that is assigned to all of the inputs for that output. Such reducer covers the output. This reducer need not be unique. Inputs and outputs are hypothetical, in the sense that they are all the possible inputs and outputs that might be present in an instance of the problem. The mapping assigns inputs to reducers without reference to what inputs are actually present.

Problem Model & Assumptions

Agenda MapReduce – a brief overview Communication / Parallelism Tradeoff Model Motivational Example Problem Model & Assumptions Recipe for Lower Bounds Known problems Word Count Hamming-Distance-1 Problem Triangle Finding Finding Instances of Other Graphs Matrix Multiplication Summary

The Recipe for Lower Bounds What we want now is to compute a lower bound for the replication rate, so we use a generic technique. Upper bounds will be derived independently on each problem. Let us fix 𝑞, which is the maximum number of inputs. Derive 𝑔(𝑞) which is an upper bound on the number of outputs a reducer can cover, given 𝑞. Count the total number of inputs |𝐼| and outputs |𝑂|.

The Recipe for Lower Bounds Now assume there are 𝑝 reducers, each receiving 𝑞 𝑖 inputs. Together they cover all outputs. Then: 𝑖=1 𝑝 𝑔 𝑞 𝑖 ≥|𝑂| 𝑖=1 𝑝 𝑞 𝑖 𝑔( 𝑞 𝑖 ) 𝑞 𝑖 ≥|𝑂| 𝑖=1 𝑝 𝑞 𝑖 𝑔(𝑞) 𝑞 ≥ 𝑖=1 𝑝 𝑞 𝑖 𝑔 𝑞 𝑖 𝑞 𝑖 ≥ |𝑂| ∀ 𝑞 𝑖 : 𝑞 𝑖 ≤𝑞 𝑎𝑛𝑑 𝑔 𝑞 𝑖 𝑞 𝑖 𝑚𝑜𝑛𝑜𝑡𝑜𝑛𝑖𝑐𝑎𝑙𝑙𝑦 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑖𝑛𝑔 𝑟= 1 |𝐼| 𝑖=1 𝑝 𝑞 𝑖 ≥ 𝑞|𝑂| 𝑔 𝑞 |𝐼| Thus we get a lower bound for r.

The Drug Interaction Problem

The Drug Interaction Problem

The Drug Interaction Problem Using the inequality we get: 𝐼 =6, q=4, g q = 4 2 =6, O = 6 2 =15 2= 𝑟≥ 𝑞 𝑂 𝑔 𝑞 𝐼 = 4 ∙ 15 6 ∙ 6 = 5 3

Agenda MapReduce – a brief overview Communication / Parallelism Tradeoff Model Motivational Example Problem Model & Assumptions Recipe for Lower Bounds Known problems Word Count Hamming-Distance-1 Problem Triangle Finding Finding Instances of Other Graphs Matrix Multiplication Summary

Word Count Think of the inputs as the word occurrences themselves in the files. Then each word occurrence results in exactly one key-value pair. Replication rate is 1, independent on the limit on reducer size. There is no tradeoff at all between q and r – Word Count problem is embarrassingly parallel !

Agenda MapReduce – a brief overview Communication / Parallelism Tradeoff Model Motivational Example Problem Model & Assumptions Recipe for Lower Bounds Known problems Word Count Hamming-Distance-1 Problem Triangle Finding Finding Instances of Other Graphs Matrix Multiplication Summary

Hamming Distance 1 Problem A reminder: Hamming distance measures distance between two binary strings. Our problem is to find pairs of bit strings of length 𝑏 that are at hamming distance 1.

Hamming Distance 1 Problem Lemma: 𝑔 𝑞 ≤ 𝑞∙𝑙𝑜𝑔𝑞 2 Proof by induction on 𝑏: Basis: 𝑏 = 1. q is either 1 or 2. If 𝑞 = 1 the reducer can cover no outputs and 1∙𝑙𝑜𝑔1 2 =0. If 𝑞 = 2 the reducer can cover at most one output and 2∙𝑙𝑜𝑔2 2 =1 Now assume for b and consider string length 𝑏+1. Let X be a set of q bit strings of length 𝑏 + 1. Let Y be the subset of X of strings in the form 0𝑤. |𝑌| = 𝑦. Let Z be the subset of X of strings in the form 1𝑤. |𝑍| = 𝑧. 𝑞 = 𝑦 + 𝑧.

Hamming Distance 1 Problem Proof by induction on 𝑏 (cont.): For any string in Y, there is at most one string in Z at Hamming distance 1. Thus, the number of outputs with one string in Y and the other in Z is at most min⁡(𝑦,𝑧). By induction, at most 𝑦∙𝑙𝑜𝑔𝑦 2 outputs both of whose inputs are in Y. By induction, at most 𝑧∙𝑙𝑜𝑔𝑧 2 outputs both of whose inputs are in Z. So 𝑔 𝑞 = 𝑧∙𝑙𝑜𝑔𝑧 2 + 𝑦∙𝑙𝑜𝑔𝑦 2 + min 𝑦,𝑧 = … ≤ 𝑦+𝑧 ∙ log 𝑦+𝑧 2 = 𝑞∙𝑙𝑜𝑔𝑞 2

Hamming Distance 1 Problem Note that 𝐼 = 2 𝑏 𝑎𝑛𝑑 𝑂 = 𝑏∙ 2 𝑏 2 𝑔(𝑞) 𝑞 ≤ 𝑙𝑜𝑔𝑞 2 is monotonically increasing. Theorem: 𝑟≥ 𝑏 𝑙𝑜𝑔𝑞 Proof: Using the recipe from previous section we get: 𝑟≥ 𝑞 𝑂 𝑔 𝑞 𝐼 ≥ 𝑞 ∙𝑏 ∙ 2 𝑏 ∙2 2 ∙𝑞 ∙𝑙𝑜𝑔𝑞 ∙ 2 𝑏 = 𝑏 𝑙𝑜𝑔𝑞

Hamming Distance 1 Problem Now we want to find an upper bound for the problem. Let’s treat extreme cases first… If 𝑞 = 2 every reducer gets exactly 2 inputs. In our case it means that every input string is sent to exactly 𝑏 reducers. So 𝑟 = 𝑏. If 𝑞 = 2 𝑏 we need only one reducer which gets all the input. So 𝑟 = 1.

Hamming Distance 1 Problem Let b≥𝑐≥ 2 such that 𝑐 divides 𝑏. We have reducer size 2 𝑏 𝑐 and replication rate 𝑐 if we use a splitting algorithm. We split each bit string into c segments each of length 𝑏/𝑐, and send them to c reducers. There are c groups of reducers corresponding to each of the 2 𝑏 – 𝑏 𝑐 strings of length 𝑏 – 𝑏/𝑐 (Each group ignores one segment). Any two strings of Hamming distance 1 will disagree in only one of the c segments of length 𝑏/𝑐. The reducer ignoring this segment will cover the output pair.

Hamming Distance 1 Problem Consider now Hamming distance 2… While 𝑔 𝑞 =𝑂(𝑞𝑙𝑜𝑔𝑞) for Hamming distance 1, in this case the bound is Ω 𝑞 2 , which prevents us from getting a good lower bound on replication rate. There is an algorithm that creates one reducer for each string of length 𝑏. For string 𝑠, this algorithm assigns for its reducer all strings at distance 1 from 𝑠. Notice that all distinct strings at distance 1 from 𝑠 are distance 2 from each other. Thus, each reducer covers 𝑏 2 = 𝑞 2 =Ω 𝑞 2 outputs.

Agenda MapReduce – a brief overview Communication / Parallelism Tradeoff Model Motivational Example Problem Model & Assumptions Recipe for Lower Bounds Known problems Word Count Hamming-Distance-1 Problem Triangle Finding Finding Instances of Other Graphs Matrix Multiplication Summary

Triangle Finding Given a graph G, the problem now is to find edges that form a triangle. Assume that all possible edges in the graph can be present according to our model… Therefore, the inputs to the reducers are the possible edges of a graph and the outputs are the triples of edges that form a triangle.

Triangle Finding Lemma: 𝑔 𝑞 ≤ 2 ∙ 𝑞 3/2 3 . Proof: Suppose we assign a reducer all the edges among a set of 𝑘 nodes. Then there approx. 𝑘 2 2 edges assigned. Let this quantity be 𝑞, then 𝑘 = 2𝑞 . The number of triangles among 𝑘 nodes is approx. 𝑘 3 6 and in terms of 𝑞, the upper bound on the number of outputs is 2 ∙ 𝑞 3/2 3 .

Triangle Finding Let 𝑛 be the number of vertices in G, then 𝐼 = 𝑛 2 ≈ 𝑛 2 2 and 𝑂 = 𝑛 3 ≈ 𝑛 3 6 . 𝑔(𝑞) 𝑞 ≤ 2𝑞 3 is monotonically increasing. Using the recipe we derive that 𝑟≥ 𝑞 𝑂 𝑔 𝑞 𝐼 = 𝑛 3 ∙3∙2 6∙ 2𝑞 ∙ 𝑛 2 = 𝑛 2𝑞 . There are known algorithms that match the lower bound on replication rate within a constant factor so 𝑟=Ω( 𝑛 𝑞 ).

Triangle Finding In practice, applications of triangle finding in analysis of communities in social networks are generally applied to large but sparse graphs. Suppose data graph has 𝑚 of possible 𝑛 2 edges, chosen randomly. We can assign a “target” 𝑞 𝑡 = 𝑞 𝑚 𝑛 2 of the possible edges to one reducer. The expected number of edges that will arrive will be q. 𝑟=Ω 𝑛 𝑞 𝑡 = Ω 𝑚 𝑞 .

Agenda MapReduce – a brief overview Communication / Parallelism Tradeoff Model Motivational Example Problem Model & Assumptions Recipe for Lower Bounds Known problems Word Count Hamming-Distance-1 Problem Triangle Finding Finding Instances of Other Graphs Matrix Multiplication Summary

The Alon Class of Sample Graphs Let’s generalize triangle finding further… Problem: find instances of Alon class graphs, named after Noga Alon. These graph have the property that we can partition the nodes into disjoint sets, such that the subgraph induced by each partition is either: A Single Edge between two nodes, or Contains an odd-length Hamiltonian cycle. Cycles, complete graphs, paths of odd length are in the Alon class.

The Alon Class of Sample Graphs Let S be a sample graph in the Alon class, with s nodes. Noga Alon proved that the number of instances of S in a graph of m edges is 𝑂 𝑚 𝑠 2 . So if a reducer has 𝑞 inputs, the number of instances of S it can find is 𝑂 𝑞 𝑠 2 . If all edges are present the number of instances is Ω( 𝑛 𝑠 ). Repeating the analysis we get: 𝑟= Ω 𝑞 ∙ 𝑛 𝑠 𝑛 2 ∙ 𝑞 𝑠 2 = Ω 𝑛 𝑠−2 𝑞 𝑠−2 .

Paths of Length Two Let’s look at the simplest non-Alon graph: the path of length 2 (2- path), and perform the known analysis: Any two distinct edges can be combined to form at most one 2- path, so the number of 2-paths covered by the reducer is at most 𝑔 𝑞 = 𝑞 2 ≈ 𝑞 2 2 . Assume that input graph has n vertices. 𝐼 = 𝑛 2 ≈ 𝑛 2 2 and 𝑂 =3 𝑛 3 ≈ 𝑛 3 2 . 𝑟≥ 𝑞 𝑂 𝑔 𝑞 𝐼 = 𝑞∙2∙2∙ 𝑛 3 𝑞 2 ∙ 2∙𝑛 2 = 2𝑛 𝑞 .

Paths of Length Two We want to show now upper bounds for r. If 𝑞 = 𝑛 we have one reducer for each node. We send an edge (𝑎,𝑏) to two reducers. The replication rate is thus 2. The reducer for node 𝑎 receives all edges consisting of 𝑎 and another node and can produce all 2-paths that have 𝑎 as the middle node. If 𝑞 ≥ 𝑛 r is either 1 or 2.

Paths of Length Two If 𝑞 < 𝑛 we denote 𝑘 = 2𝑛 𝑞 (suppose for convenience that it is an integer). Suppose ℎ is a hash function that divides 𝑛 nodes into 𝑘 equal size buckets. The reducers correspond to pairs [𝑢, {𝑖,𝑗}] where u is the middle node of the 2-path and 1≤𝑖,𝑗≤ 𝑘, i≠𝑗. There are thus 𝑛 𝑘 2 reducers. We send (𝑎,𝑏) to 2(𝑘−1) reducers [𝑏,{ℎ(𝑎),∗}] and [𝑎,{ ∗,ℎ(𝑏)}].

Paths of Length Two Let’s look at the reducer [𝑢, {𝑖,𝑗}]. This covers all 2-paths 𝑣−𝑢 −𝑤 such that ℎ(𝑣) and ℎ(𝑤) are each either 𝑖 or 𝑗. If ℎ(𝑣) = ℎ(𝑤), then many reducers will cover the 2-path, and we want only one. We can fix this by letting the reducer produce the 𝑣−𝑢−𝑤 if: ℎ(𝑣) = 𝑖 𝑎𝑛𝑑 ℎ(𝑤) = 𝑗 (or vice versa). ℎ(𝑣) = ℎ(𝑤) = 𝑖 𝑎𝑛𝑑 𝑗 = 𝑖 + 1 (𝑚𝑜𝑑 𝑘). In this case 𝑟 = 2(𝑘−1). Thus, to within a constant factor, the upper and lower bounds match.

Agenda MapReduce – a brief overview Communication / Parallelism Tradeoff Model Motivational Example Problem Model & Assumptions Recipe for Lower Bounds Known problems Word Count Hamming-Distance-1 Problem Triangle Finding Finding Instances of Other Graphs Matrix Multiplication Summary

Matrix Multiplication Suppose we have 𝑛×𝑛 matrices R and S and we wish to form their product T. Unlike previous examples, each output depends on 2𝑛 inputs, rather than just two or three. We will also explore methods that use two interrelated rounds of MapReduce, and discover that they can be considerably better.

Matrix Multiplication We now apply the familiar recipe… Suppose a reducer covers the outputs 𝑡 14 and 𝑡 23 . Then rows 1,2 of R are input to that reducer and columns 3,4 of S are input to that reducer. Thus, this reducer covers also 𝑡 13 and 𝑡 24 . The set of outputs covered forms a “rectangle”. If an input to a reducer is not part of a whole row or column it cannot be used in any output… Thus, the number of inputs to this reducer is 𝑛(𝑤+ℎ), where 𝑤 and ℎ are “width” and “height” of the “rectangle” respectively.

Matrix Multiplication Total number of outputs covered is g q = ℎ𝑤. For a given q, the number of outputs is maximized when the rectangle is a square: 𝑤 = ℎ = 𝑞/(2𝑛). In this case, 𝑔(𝑞)= 𝑞 2 /(4 𝑛 2 ). Obviously, 𝐼 =2 𝑛 2 𝑎𝑛𝑑 𝑂 = 𝑛 2 . 𝑟≥ 𝑞 𝑂 𝑔 𝑞 𝐼 = 4 𝑛 2 ∙𝑞∙ 𝑛 2 𝑞 2 ∙2 𝑛 2 = 2 𝑛 2 𝑞 .

Matrix Multiplication If 𝑞≥ 2 𝑛 2 then the entire job can be done by one reducer. If 𝑞 < 2𝑛 then no reducer can get enough input to compute even one output. Between these ranges, we can match the lower bound by giving each reducer a set of rows from R and an equal number of columns from S. Partition the rows and columns into 𝑛/𝑠 groups of 𝑠 rows/columns. 𝑞 = 2𝑠𝑛 and 𝑟 = 2 𝑛 2 /𝑞. Number of reducers is 𝑛 𝑠 2 .

Matrix Multiplication Let’s discuss now the two phase MapReduce algorithm for matrix multiplication. In the first phase we compute 𝑥 𝑖𝑗𝑘 = 𝑟 𝑖𝑗 𝑠 𝑗𝑘 for each 𝑖,𝑗,𝑘 between 1 and n. We sum 𝑥 𝑖𝑗𝑘 ’s at a given reducer if they share common values of 𝑖 and 𝑘, producing a partial sum for (𝑖,𝑘). In the second phase, the partial sum for each pair is sent to a reducer whose responsibility is to sum all the partial sums and compute 𝑡 𝑖𝑘 .

Matrix Multiplication

Matrix Multiplication Note that the mappers of the second phase can reside at the same node as the 𝑥 𝑖𝑗𝑘 ’s to which they apply. Thus, no communication is needed between first phase reducers and second phase mappers. The set of outputs covered by a reducer (first phase) again forms a “rectangle”. If a reducer covers 𝑥 𝑖𝑗𝑧 and 𝑥 𝑦𝑗𝑧 then it also covers 𝑥 𝑖𝑗𝑧 and 𝑥 𝑦𝑗𝑘 . We shall assume that each reducer (first phase) is given a set of 𝑠 rows of R, 𝑠 columns of S and 𝑡 values of 𝑗, 1≤ 𝑡≤ 𝑛.

Matrix Multiplication There is a reducer covering each 𝑥 𝑖𝑗𝑘 which means that the number of reducers is 𝑛 𝑠 2 𝑛 𝑡 . Each element of matrices R and S is sent to 𝑛/𝑠 reducers so “total communication” in the first phase is 2 𝑛 3 /𝑠. Each reducer produces a partial sum for 𝑠 2 pairs, so “total communication” in the second phase is 𝑠 2 𝑛 𝑠 2 𝑛 𝑡 = 𝑛 3 𝑡 .

Matrix Multiplication Sum of communication is 2 𝑛 3 𝑠 + 𝑛 3 𝑡 . We want to minimize the function subject to the constraint that 𝑞 = 2𝑠𝑡, and we get 𝑡= 𝑞 2 and 𝑠= 𝑞 . Sum of communication is then 2 𝑛 3 𝑞 + 2𝑛 3 𝑞 = 4 𝑛 3 𝑞 . Total communication for the one phase method is the replication rate times the number of inputs: 2 𝑛 2 𝑞 ∙2 𝑛 2 = 4 𝑛 4 𝑞 . The two phase method uses less communication when 4 𝑛 4 𝑞 > 4 𝑛 3 𝑞 or 𝑞≤ 𝑛 2 . That is, for any number of reducers except 1.

Agenda MapReduce – a brief overview Communication / Parallelism Tradeoff Model Motivational Example Problem Model & Assumptions Recipe for Lower Bounds Known problems Word Count Hamming-Distance-1 Problem Triangle Finding Finding Instances of Other Graphs Matrix Multiplication Summary

Summary & Open Problems We introduced a simple model for MapReduce algorithms, enabling us to study their performance. We identified replication rate and reducer size as two parameters representing the communication cost and node capabilities. We demonstrated that these two parameters are related by a precise tradeoff formula. An open problem for example: Hamming distance greater than 1…

Summary & Open Problems

Questions?