Jeffrey D. Ullman Stanford University. 2  Communication cost for a MapReduce job = the total number of key-value pairs generated by all the mappers.

Slides:



Advertisements
Similar presentations
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
Advertisements

J OIN ALGORITHMS USING MAPREDUCE Haiping Wang
7.1Variable Notation.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Assignment of Different-Sized Inputs in MapReduce Shantanu Sharma 2 joint work with Foto N. Afrati 1, Shlomi Dolev 2, Ephraim Korach 2, and Jeffrey D.
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
Discrete Structure Li Tak Sing( 李德成 ) Lectures
Chapter 4 Systems of Linear Equations; Matrices Section 6 Matrix Equations and Systems of Linear Equations.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Cluster Computing, Recursion and Datalog Foto N. Afrati National Technical University of Athens, Greece.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
Review of Matrix Algebra
Mercury: Supporting Scalable Multi-Attribute Range Queries A. Bharambe, M. Agrawal, S. Seshan In Proceedings of the SIGCOMM’04, USA Παρουσίαση: Τζιοβάρα.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Standard architecture emerging: – Cluster of commodity.
Hashing General idea: Get a large array
Finding Similar Items.
Jeffrey D. Ullman Stanford University. 2 Formal Definition Implementation Fault-Tolerance Example: Join.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
1 Generalizing Map-Reduce The Computational Model Map-Reduce-Like Algorithms Computing Joins.
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
College Algebra Fifth Edition James Stewart Lothar Redlin Saleem Watson.
5  Systems of Linear Equations: ✦ An Introduction ✦ Unique Solutions ✦ Underdetermined and Overdetermined Systems  Matrices  Multiplication of Matrices.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Two-Level Minimization I.
Jeffrey D. Ullman Stanford University.  Mining of Massive Datasets, J. Leskovec, A. Rajaraman, J. D. Ullman.  Available for free download at i.stanford.edu/~ullman/mmds.html.
Database Principles Relational Database Design I.
The Game of Algebra or The Other Side of Arithmetic The Game of Algebra or The Other Side of Arithmetic © 2007 Herbert I. Gross by Herbert I. Gross & Richard.
FINITE FIELDS 7/30 陳柏誠.
Jeffrey D. Ullman Stanford University. 2 Chunking Replication Distribution on Racks.
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
Copyright © Cengage Learning. All rights reserved. 6 Inverse Functions.
EM and expected complete log-likelihood Mixture of Experts
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
Large-scale file systems and Map-Reduce Single-node architecture Memory Disk CPU Google example: 20+ billion web pages x 20KB = 400+ Terabyte 1 computer.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
4.6 Matrix Equations and Systems of Linear Equations In this section, you will study matrix equations and how to use them to solve systems of linear equations.
Expressions, Equations, and Functions Chapter 1 Introductory terms and symbols: Algebraic expression – One or more numbers or variables along with one.
1 Cluster Computing and Datalog Recursion Via Map-Reduce Seminaïve Evaluation Re-engineering Map-Reduce for Recursion.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
1 Map-Reduce and Datalog Implementation Distributed File Systems Map-Reduce Join Implementations.
Foto Afrati — National Technical University of Athens Anish Das Sarma — Google Research Semih Salihoglu — Stanford University Jeff Ullman — Stanford University.
CS 4432query processing1 CS4432: Database Systems II Lecture #11 Professor Elke A. Rundensteiner.
1 Lecture 6 BOOLEAN ALGEBRA and GATES Building a 32 bit processor PH 3: B.1-B.5.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
Flow in Network. Graph, oriented graph, network A graph G =(V, E) is specified by a non empty set of nodes V and a set of edges E such that each edge.
Optimizing Joins in a Map-Reduce Environment EDBT 2010 Presented by Foto Afrati, Jeffrey D. Ullman Summarized by Jaeseok Myung Intelligent Database.
Multivalued Dependencies and 4th NF CIS 4301 Lecture Notes Lecture /21/2006.
MapReduce and the New Software Stack. Outline  Algorithm Using MapReduce  Matrix-Vector Multiplication  Matrix-Vector Multiplication by MapReduce 
ECE DIGITAL LOGIC LECTURE 15: COMBINATIONAL CIRCUITS Assistant Prof. Fareena Saqib Florida Institute of Technology Fall 2015, 10/20/2015.
Jeffrey D. Ullman Stanford University.  Different algorithms for the same problem can be parallelized to different degrees.  The same activity can (sometimes)
Jeffrey D. Ullman Stanford University.  A real story from CS341 data-mining project class.  Students involved did a wonderful job, got an “A.”  But.
Chapter 4 Systems of Linear Equations; Matrices
DeMorgan’s Theorem DeMorgan’s 2nd Theorem
Large-scale file systems and Map-Reduce
Relational Algebra Chapter 4, Part A
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Review of Bulk-Synchronous Communication Costs Problem of Semijoin
湖南大学-信息科学与工程学院-计算机与科学系
Cse 344 May 4th – Map/Reduce.
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Review of Bulk-Synchronous Communication Costs Problem of Semijoin
Presentation transcript:

Jeffrey D. Ullman Stanford University

2  Communication cost for a MapReduce job = the total number of key-value pairs generated by all the mappers.  For many applications, communication cost is the dominant cost, taking more time than the computation performed by the mappers and reducers.  Focus: minimizing communication cost for a join of three or more relations.

3  Given: a collection of relations, each with attributes labeling their columns.  Find: Those tuples over all the attributes such that when restricted to the attributes of any relation R, that tuple is in R.

4 A B A C B C The join: A B C 1 2 3

5  Suppose we want to compute R(A,B) JOIN S(B,C), using k reducers.  Remember: a “reducer” corresponds to a key of the intermediate file, so we are really asking how many keys we want to use.  R and S are each stored in a chunked file of a distributed file system.  Subtle point: when joining two relations, each tuple is sent to one reducer, so k can be as large as we like; in a multiway join, large k implies more communication.

6  Use a hash function h from B-values to k buckets.  Earlier, we used h = identity function.  Map tasks take chunks from R and S, and send:  Tuple R(a,b) to reducer h(b).  Tuple S(b,c) to reducer h(b).  If R(a,b) joins with S(b,c), then both tuples are sent to reducer h(b).  Thus, their join (a,b,c) will be produced there and shipped to the output file.

7  Consider a chain of three relations: R(A, B) JOIN S(B, C) JOIN T(C,D)  Example: R, S, and T are “friends” relations.  We could join any two by the 2-way MapReduce algorithm, then join the third with the resulting relation.  But intermediate joins are large.

8  An alternative is to divide the work among k = m 2 reducers.  Hash both B and C to m values.  A reducer (key) corresponds to a hashed B- value and a hashed C-value.

9  Each S-tuple S(b,c) is sent to one reducer: (h(b), h(c)).  But each tuple R(a,b) must be sent to m reducers, those whose keys are of the form (h(b), x).  And each tuple T(c,d) must be sent to m reducers of the form (y, h(c)).

10 h(b)=0 h(b)=1 h(b)=2 h(b)=3 h(c) = S(b, c) R(a, b) T(c, d)

11  Thus, any joining tuples R(a,b), S(b,c), and T(c,d) will be joined at the reducer (h(b), h(c)).  Communication cost: s + mr + mt.  Convention: Lower-case letter is the size of the relation whose name is the corresponding upper- case letter.  Example: r is the size of R.

12  Suppose for simplicity that:  Relations R, S, and T have the same size r.  The probability of two tuples joining is p.  The 3-way join has communication cost r(2m+1).  Two two-way joins have a communication cost:  3r to read the relations, plus  pr 2 to read the join of the first two.  Total = r(3+pr).

13  3-way beats 2-way if 2m + 1 < 3 + pr.  pr is the multiplicity of each join.  Thus, the 3-way chain-join is useful when the multiplicity is high.  Example: relations are “friends”; pr is about 300. m 2 = k can be 20,000.  Example: relations are Web links; pr is about 15. m 2 = k can be 64.

14 Share Variables and Their Optimization Special Case: Star Joins Special Case: Chain Joins Application: Skew Joins

15  When we discussed the 3-way chain-join R(A, B) JOIN S(B, C) JOIN T(C,D), we used attributes B and C for the map-key (attributes that determined the reducers).  Why not include A and/or D?  Why use the same number of buckets for B and C?

16  For the general problem, we use a share variable for each attribute.  The number of buckets into which values of that attribute are hashed.  Convention: The share variable for an attribute is the corresponding lower-case letter.  Example: the share variable for attribute A is always a.

17  The product of all the share variables is k, the number of reducers.  The communication cost of a multiway join is the sum of the size of each relation times the product of the share variables for the attributes that do not appear in the schema of that relation.

18  Consider the cyclic join R(A, B) JOIN S(B, C) JOIN T(A, C)  Cost function is rc + sa + tb.  Construct the Lagrangean remembering abc = k: rc + sa + tb – (abc – k)  Take the derivative with respect to each share variable, then multiply by that variable.  Result is 0 at minimum.

19  d/da of rc + sa + tb – (abc – k) is s – bc.  Multiply by a and set to 0: sa – abc = 0.  Note: abc = k : sa = k.  Similarly, d/db and d/dc give: sa = tb = rc = k.  Solution: a = (krt/s 2 ) 1/3 ; b = (krs/t 2 ) 1/3 ; c = (kst/r 2 ) 1/3.  Cost = rc + sa + tb = 3(krst) 1/3.

20  Certain attributes can’t be in the map-key.  A dominates B if every relation of the join with B also has A.  Example: R(A,B,C) JOIN S(A,B,D) JOIN T(A,E) JOIN U(C,E) Every relation with B Also has A

21  Cost expression: rde + sce + tbcd + uabd  Since b appears wherever a does, if there were a minimum-cost solution with b > 1, we could replace b by 1 and a by ab, and the cost would lower. R(A,B,C) JOIN S(A,B,D) JOIN T(A,E) JOIN U(C,E)

22  This rule explains why, in the discussion of the chain join R(A, B) JOIN S(B, C) JOIN T(C,D) we did not give dominated attributes A and D a share.

23  Unfortunately, there are more complex cases than dominated attributes, where the equations derived from the Lagrangean imply a positive sum of several terms = 0.  We can fix, generalizing dominated attributes, but we have to branch on which attribute needs to be eliminated from the map-key.

24  Solutions not in integers:  Drop an attribute with a share < 1 from the map-key and re-solve.  Round other nonintegers, and treat k as a suggestion, since the product of the integers may not be k.

25  A star join combines a large fact table F(A 1,A 2,…,A n ) with tiny dimension tables D 1 (A 1,B 1 ), D 2 (A 2,B 2 ),…, D n (A n,B n ).  There may be other attributes not shown, each belonging to only one relation.  Example: Facts = sales; dimensions tell about buyer, product, day, time, etc.

26 A1A1 A2A2 A3A3 A4A4 B1B1 B2B2 B3B3 B4B4

27  Map-key = the A’s.  B’s are dominated.  Solution: d i /a i = k for all i.  That is, the shares are proportional to the dimension-table sizes.

28  Fact/dimension tables are often used for analytics.  Aster Data approach: partition fact table among nodes permanently (shard the fact table); replicate needed pieces of dimension tables. Fact Shard 1 Fact Shard 2 Fact Shard k...

 Shard fact table by country.  Wins for distributing the Customer dimension table.  Each Customer tuple needed only at the shard for the country of the customer.  Loses big for other dimension tables, e.g., Item.  Pretty much every item has been bought by someone in each country, so Dimension tuples are replicated to every shard. 29

30  Our solution lets you partition the fact table to k shards, one for each reducer.  Analogy: Think of sharding process as taking the join of the fact table and all dimension tables.  Shards = reducers.  Only dimension keys (the A’s) get shares, so each Fact tuple goes to exactly one shard/reducer.  Total copies of dimension tuples = communication cost, which is minimized.

31  A chain join has the form R(A 0, A 1 ) JOIN R(A 1, A 2 ) JOIN … JOIN R(A n -1, A n )  Other attributes may appear, but only in one relation.  A 0 and A n are dominated; other attributes are in the map-key. AnAn A2A2 A3A3 A n -1 A1A1 A0A0...

32  Illustrates strange behavior.  Even and odd n have very different distributions of the share variables.  Even n : a 2 = a 4 = … = a n -2 = 1; a 1 = a 3 = … = a n -1 = k 2/n

33

34  Even a’s grow exponentially.  That is, a 4 = a 2 2 ; a 6 = a 2 3 ; a 8 = a 2 4,…  The odd a’s form the inverse sequence.  That is, a 1 = a n-1 ; a 3 = a n-3 ; a 5 = a n-5 ;…

35

 Suppose we want to join R(A,B) with S(B,C) using MapReduce, as in the introduction.  But half the tuples of R and S have value B=10.  No matter how many reducers we use, one reducer gets half the tuples and dominates the wall-clock time.  To deal with this problem, systems like PIG handle heavy-hitter values like B=10 specially. 36

 Pick one of the relations, say R.  Divide the tuples of R having B=10 into k groups, with a reducer for each.  Note: we’re pretending B=10 is the only heavy hitter.  Works for any value and > 1 heavy hitter.  Send each tuple of S with B=10 to all k reducers.  Other tuples of R and S are treated normally. 37

 Let R have r tuples with B=10 and S have s tuples with B=10.  Then communication cost = r + ks.  Can be minimum if s << r, but also might not be best. 38

 Let’s partition the tuples of R with B=10 into x groups and partition the tuples of S with B=10 into y groups, where xy = k.  Use one of k reducers for each pair (i, j) of a group for R and a group for S.  Send each tuple of R with B=10 to all y groups of the form (i, ?), and send each tuple of S with B=10 to all x reducers (?, j).  Communication = ry + sx. 39

 We need to minimize ry + sx, under the constraint xy = k.  Solution: x =  kr/s; y =  ks/r; communication = 2  krs.  Note: 2  krs is always < r + ks (the cost of the implemented skew join), often much less.  Example: if r = s, then new skew join takes 2r  k, while old takes r(k+1). 40

41 1. Multiway joins can be computed by replicating tuples and distributing them to many compute nodes. 2. Minimizing communication requires us to solve a nonlinear optimization. 3. Multiway beats 2-way joins for star queries and queries on high-fanout graphs. 4. Exact solution for chain and star queries. 5. Simple application to optimal skew joins.

1. We really don’t know the complexity of optimizing the multiway join in general.  The algorithm we offered in 2010 EDBT is exponential in the worst case.  But is it NP-hard? 2. The Skew-join technique borrows from the multiway join, but is a 2-way join.  Can you generalize the idea to all multiway joins with skew? 42