CS347: MapReduce CS347 1. 2 Motivation for Map-Reduce Distribution makes simple computations complex Communication Load balancing Fault tolerance … What.

CS347: MapReduce CS347 1

2 Motivation for Map-Reduce Distribution makes simple computations complex Communication Load balancing Fault tolerance … What if we could write “simple” programs that were automatically parallelized?

CS 347 3 R’1R’1 R’2R’2 R’3R’3 ko k1 Local sort R1 R2 R3 Result Motivation for Map-Reduce Recall one of our sort strategies: process data & partition additional processing

CS 347 4 Another example: Asymmetric fragment + replicate join Ra S S S Rb R1 R2 R3 Sa Sb Local join Result f partition union process data & partition additional processing

From our point of view… What if we didn’t have to think too hard about the number of sites or fragments? MapReduce goal: a library that hides the distribution from the developer, who can then focus on the fundamental “nuggets” of their computation CS347 5

6 Building Text Index - Part I 1 2 3 rat dog cat rat dog (rat, 1) (dog, 1) (dog, 2) (cat, 2) (rat, 3) (dog, 3) (cat, 2) (dog, 1) (dog, 2) (dog, 3) (rat, 1) (rat, 3) Disk Page stream Tokenizing Sorting Loading FLUSHING Intermediate runs CS347 original Map-Reduce application....

7 Building Text Index - Part II (cat, 2) (dog, 1) (dog, 2) (dog, 3) (rat, 1) (rat, 3) Intermediate Runs (ant, 5) (cat, 4) (dog, 4) (dog, 5) (eel, 6) Merge (ant, 5) (cat, 2) (cat, 4) (dog, 1) (dog, 2) (dog, 3) (dog, 4) (dog, 5) (eel, 6) (rat, 1) (rat, 3) Final index (ant: 2) (cat: 2,4) (dog: 1,2,3,4,5) (eel: 6) (rat: 1, 3) CS347

8 Generalizing: Map-Reduce 1 2 3 rat dog cat rat dog (rat, 1) (dog, 1) (dog, 2) (cat, 2) (rat, 3) (dog, 3) (cat, 2) (dog, 1) (dog, 2) (dog, 3) (rat, 1) (rat, 3) Disk Page stream Tokenizing Sorting Loading FLUSHING Intermediate runs Map CS347

9 Generalizing: Map-Reduce (cat, 2) (dog, 1) (dog, 2) (dog, 3) (rat, 1) (rat, 3) Intermediate Runs Final index (ant, 5) (cat, 4) (dog, 4) (dog, 5) (eel, 6) Merge (ant, 5) (cat, 2) (cat, 4) (dog, 1) (dog, 2) (dog, 3) (dog, 4) (dog, 5) (eel, 6) (rat, 1) (rat, 3) (ant: 2) (cat: 2,4) (dog: 1,2,3,4,5) (eel: 6) (rat: 1, 3) Reduce CS347

10 Map Reduce Input: R={r 1, r 2,...r n }, functions M, R –M(r i )  { [k 1, v 1 ], [k 2, v 2 ],.. } –R(k i, valSet)  [k i, valSet’] Let S={ [k, v] | [k, v]  M(r) for some r  R } Let K = {k | [k,v]  S, for any v } Let G(k) = { v | [k, v]  S } Output = { [k, T] | k  K, T=R(k, G(k)) } S is bag G is bag CS347

11 References MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and Sanjay Ghemawat, available at http://labs.google.com/papers/mapreduce-osdi04.pdf CS347

12 Example: Counting Word Occurrences map(String doc, String value); // doc is document name // value is document content for each word w in value: EmitIntermediate(w, “1”); Example: –map(doc, “cat dog cat bat dog”) emits [cat 1], [dog 1], [cat 1], [bat 1], [dog 1] CS347

13 Example: Counting Word Occurrences reduce(String key, Iterator values); // key is a word // values is a list of counts int result = 0; for each v in values: result += ParseInt(v) Emit(AsString(result)); Example: –reduce(“dog”, “1 1 1 1”) emits “4” Becomes (“dog”, 4) CS347

Mappers CS347 14 Source data Split data Workers

Mappers CS347 15 Split data Workers

Shuffle CS347 26 Workers

Shuffle CS347 27 Workers

Reduce CS347 28 Workers

Another way to think about it: Mapper: (some query) Shuffle: GROUP BY Reducer: SELECT Aggregate() CS347 29

Doesn’t have to be relational Mapper: Parse data into K,V pairs Shuffle: Repartition by K Reducer: Transform the V’s for a K into a V final CS347 30

Process model CS347 31

Process model CS347 32 Can vary the number of mappers to tune performance Reduce tasks bounded by number of reduce shards

33 Implementation Issues Combine function File system Partition of input, keys Failures Backup tasks Ordering of results CS347

34 Combine Function worker [cat 1], [cat 1], [cat 1]... [dog 1], [dog 1]... worker [cat 3]... [dog 2]... Combine is like a local reduce applied before distribution: CS347

35 Data flow worker must be able to access any part of input file; so input on distributed fs reduce worker must be able to access local disks on map workers any worker must be able to write its part of answer; answer is left as distributed file CS347 High-throughput network is essential

36 Partition of input, keys How many workers, partitions of input file? 9 3 2 1 worker How many splits? How many workers? Best to have many splits per worker: Improves load balance; if worker fails, easier to spread its tasks Should workers be assigned to splits “near” them? Similar questions for reduce workers CS347

What takes time? Reading input data Mapping data Shuffling data Reducing data Map and reduce are separate phases –Latency determined by slowest task –Reduce shard skew can increase latency Map and shuffle can be overlapped –But if lots intermediate data, shuffle may be slow CS347 37

38 Failures Distributed implementation should produce same output as would have been produced by a non- faulty sequential execution of the program. General strategy: Master detects worker failures, and has work re-done by another worker. worker split j master ok? redo j CS347

39 Backup Tasks Straggler is a machine that takes unusually long (e.g., bad disk) to finish its work. A straggler can delay final completion. When task is close to finishing, master schedules backup executions for remaining tasks. Must be able to eliminate redundant results CS347

40 Ordering of Results Final result (at each node) is in key order [k 1, T 1 ] [k 2, T 2 ] [k 3, T 3 ] [k 4, T 4 ] [k 1, v 1 ] [k 3, v 3 ] also in key order: CS347

41 Example: Sorting Records W1 W2 W3 W5 W6 Map: extract k, output [k, record] Reduce: Do nothing! one or two records for k=6? CS347

42 Other Issues Skipping bad records Debugging CS347

43 MR Claimed Advantages Model easy to use, hides details of parallelization, fault recovery Many problems expressible in MR framework Scales to thousands of machines CS347

44 MR Possible Disadvantages 1-input 2-stage data flow rigid, hard to adapt to other scenarios Custom code needs to be written even for the most common operations, e.g., projection and filtering Opaque nature of map, reduce functions impedes optimization CS347

45 Hadoop Open-source Map-Reduce system Also, a toolkit –HDFS – filesystem for Hadoop –HBase – Database for Hadoop Also, an ecosystem –Tools –Recipes –Developer community CS347

MapReduce pipelines Output of one MapReduce becomes input for another –Example: Stage 1: Translate data to canonical form Stage 2: Term count Stage 3: Sort by frequency Stage 4: Extract top-k CS347 46

Make it database-y? Simple idea: each operator is a MapReduce How to do: –Select –Project –Group by, aggregate –Join CS347 47

Reduce-side join Shuffle puts all values for the same key at the same reducer Mapper –Input: tuples from R; tuples from S –Output: (join value, (R|S, tuple)) Reducer –Local join of all R tuples with all S tuples CS347 48

Map-side join Like a hash-join, but every mapper has a copy of the hash table Mapper: –Read in hash table of R –Input: Tuples of S –Output: Tuples of S joined with tuples of R Reducer –Pass through CS347 49

Comparison Reduce-side join shuffles all the data Map-side join requires one table to be small CS347 50

Semi-join? One idea: –MapReduce 1: Extract the join keys of R; reduce-side join with S –MapReduce 2: Map-side join result of MapReduce 1 with R CS347 51

Platforms for SQL-like queries Pig Latin Hive MapR MemSQL … CS347 52

Why not just use a DBMS? Many DBMSs exist and are highly optimized A comparison of approaches to large-scale data analysis. Pavlo et al, SIGMOD 2009 CS347 53

Why not just use a DBMS? One reason: loading data into a DBMS is hard A comparison of approaches to large-scale data analysis. Pavlo et al, SIGMOD 2009 CS347 54

Why not just use a DBMS? Other possible reasons: –MapReduce is more scalable –MapReduce is more easily deployed –MapReduce is more easily extended –MapReduce is more easily optimized –MapReduce is free (that is, Hadoop) –I already know Java –MapReduce is exciting and new CS347 55

Data store Instead of HDFS, data could be stored in a database Example: HadoopDB/Hadapt –Data store is PostgresSQL –Allows for indexing, fast local query processing HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel J. Abadi, Avi Silberschatz, Alex Rasin. In Proceedings of VLDB, 2009. CS347 56

Batch processing? MapReduce materializes intermediate results –Map output written to disk before reduce starts What if we pipelined the data from map to reduce? –Reduce could quickly produce approximate answers –MapReduce could implement continuous queries MapReduce Online. Tyson Condie, Neil Conway, Peter Alvaro, Joe Hellerstein, Khaled Elmeleegy, Russell Sears. NSDI 2010 CS347 57

Not just database queries! Any large, partition-able computation –Build an inverted index –Image thumbnailing –Machine translation –… –Anything expressible in the Map/Reduce functions via general purpose language CS347 58

CS347: MapReduce CS347 1. 2 Motivation for Map-Reduce Distribution makes simple computations complex Communication Load balancing Fault tolerance … What.

Similar presentations

Presentation on theme: "CS347: MapReduce CS347 1. 2 Motivation for Map-Reduce Distribution makes simple computations complex Communication Load balancing Fault tolerance … What."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS347: MapReduce CS347 1. 2 Motivation for Map-Reduce Distribution makes simple computations complex Communication Load balancing Fault tolerance … What.

Similar presentations

Presentation on theme: "CS347: MapReduce CS347 1. 2 Motivation for Map-Reduce Distribution makes simple computations complex Communication Load balancing Fault tolerance … What."— Presentation transcript:

Similar presentations

About project

Feedback