Download presentation
Presentation is loading. Please wait.
Published byBarrie Tucker Modified over 9 years ago
1
CS347: MapReduce CS347 1
2
2 Motivation for Map-Reduce Distribution makes simple computations complex Communication Load balancing Fault tolerance … What if we could write “simple” programs that were automatically parallelized?
3
CS 347 3 R’1R’1 R’2R’2 R’3R’3 ko k1 Local sort R1 R2 R3 Result Motivation for Map-Reduce Recall one of our sort strategies: process data & partition additional processing
4
CS 347 4 Another example: Asymmetric fragment + replicate join Ra S S S Rb R1 R2 R3 Sa Sb Local join Result f partition union process data & partition additional processing
5
From our point of view… What if we didn’t have to think too hard about the number of sites or fragments? MapReduce goal: a library that hides the distribution from the developer, who can then focus on the fundamental “nuggets” of their computation CS347 5
6
6 Building Text Index - Part I 1 2 3 rat dog cat rat dog (rat, 1) (dog, 1) (dog, 2) (cat, 2) (rat, 3) (dog, 3) (cat, 2) (dog, 1) (dog, 2) (dog, 3) (rat, 1) (rat, 3) Disk Page stream Tokenizing Sorting Loading FLUSHING Intermediate runs CS347 original Map-Reduce application....
7
7 Building Text Index - Part II (cat, 2) (dog, 1) (dog, 2) (dog, 3) (rat, 1) (rat, 3) Intermediate Runs (ant, 5) (cat, 4) (dog, 4) (dog, 5) (eel, 6) Merge (ant, 5) (cat, 2) (cat, 4) (dog, 1) (dog, 2) (dog, 3) (dog, 4) (dog, 5) (eel, 6) (rat, 1) (rat, 3) Final index (ant: 2) (cat: 2,4) (dog: 1,2,3,4,5) (eel: 6) (rat: 1, 3) CS347
8
8 Generalizing: Map-Reduce 1 2 3 rat dog cat rat dog (rat, 1) (dog, 1) (dog, 2) (cat, 2) (rat, 3) (dog, 3) (cat, 2) (dog, 1) (dog, 2) (dog, 3) (rat, 1) (rat, 3) Disk Page stream Tokenizing Sorting Loading FLUSHING Intermediate runs Map CS347
9
9 Generalizing: Map-Reduce (cat, 2) (dog, 1) (dog, 2) (dog, 3) (rat, 1) (rat, 3) Intermediate Runs Final index (ant, 5) (cat, 4) (dog, 4) (dog, 5) (eel, 6) Merge (ant, 5) (cat, 2) (cat, 4) (dog, 1) (dog, 2) (dog, 3) (dog, 4) (dog, 5) (eel, 6) (rat, 1) (rat, 3) (ant: 2) (cat: 2,4) (dog: 1,2,3,4,5) (eel: 6) (rat: 1, 3) Reduce CS347
10
10 Map Reduce Input: R={r 1, r 2,...r n }, functions M, R –M(r i ) { [k 1, v 1 ], [k 2, v 2 ],.. } –R(k i, valSet) [k i, valSet’] Let S={ [k, v] | [k, v] M(r) for some r R } Let K = {k | [k,v] S, for any v } Let G(k) = { v | [k, v] S } Output = { [k, T] | k K, T=R(k, G(k)) } S is bag G is bag CS347
11
11 References MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and Sanjay Ghemawat, available at http://labs.google.com/papers/mapreduce-osdi04.pdf CS347
12
12 Example: Counting Word Occurrences map(String doc, String value); // doc is document name // value is document content for each word w in value: EmitIntermediate(w, “1”); Example: –map(doc, “cat dog cat bat dog”) emits [cat 1], [dog 1], [cat 1], [bat 1], [dog 1] CS347
13
13 Example: Counting Word Occurrences reduce(String key, Iterator values); // key is a word // values is a list of counts int result = 0; for each v in values: result += ParseInt(v) Emit(AsString(result)); Example: –reduce(“dog”, “1 1 1 1”) emits “4” Becomes (“dog”, 4) CS347
14
Mappers CS347 14 Source data Split data Workers
15
Mappers CS347 15 Split data Workers
16
Mappers CS347 16 Split data Workers
17
Mappers CS347 17 Split data Workers
18
Mappers CS347 18 Split data Workers
19
Mappers CS347 19 Split data Workers
20
Mappers CS347 20 Split data Workers
21
Mappers CS347 21 Split data Workers
22
Mappers CS347 22 Split data Workers
23
Mappers CS347 23 Split data Workers
24
Mappers CS347 24 Split data Workers
25
Mappers CS347 25 Split data Workers
26
Shuffle CS347 26 Workers
27
Shuffle CS347 27 Workers
28
Reduce CS347 28 Workers
29
Another way to think about it: Mapper: (some query) Shuffle: GROUP BY Reducer: SELECT Aggregate() CS347 29
30
Doesn’t have to be relational Mapper: Parse data into K,V pairs Shuffle: Repartition by K Reducer: Transform the V’s for a K into a V final CS347 30
31
Process model CS347 31
32
Process model CS347 32 Can vary the number of mappers to tune performance Reduce tasks bounded by number of reduce shards
33
33 Implementation Issues Combine function File system Partition of input, keys Failures Backup tasks Ordering of results CS347
34
34 Combine Function worker [cat 1], [cat 1], [cat 1]... [dog 1], [dog 1]... worker [cat 3]... [dog 2]... Combine is like a local reduce applied before distribution: CS347
35
35 Data flow worker must be able to access any part of input file; so input on distributed fs reduce worker must be able to access local disks on map workers any worker must be able to write its part of answer; answer is left as distributed file CS347 High-throughput network is essential
36
36 Partition of input, keys How many workers, partitions of input file? 9 3 2 1 worker How many splits? How many workers? Best to have many splits per worker: Improves load balance; if worker fails, easier to spread its tasks Should workers be assigned to splits “near” them? Similar questions for reduce workers CS347
37
What takes time? Reading input data Mapping data Shuffling data Reducing data Map and reduce are separate phases –Latency determined by slowest task –Reduce shard skew can increase latency Map and shuffle can be overlapped –But if lots intermediate data, shuffle may be slow CS347 37
38
38 Failures Distributed implementation should produce same output as would have been produced by a non- faulty sequential execution of the program. General strategy: Master detects worker failures, and has work re-done by another worker. worker split j master ok? redo j CS347
39
39 Backup Tasks Straggler is a machine that takes unusually long (e.g., bad disk) to finish its work. A straggler can delay final completion. When task is close to finishing, master schedules backup executions for remaining tasks. Must be able to eliminate redundant results CS347
40
40 Ordering of Results Final result (at each node) is in key order [k 1, T 1 ] [k 2, T 2 ] [k 3, T 3 ] [k 4, T 4 ] [k 1, v 1 ] [k 3, v 3 ] also in key order: CS347
41
41 Example: Sorting Records W1 W2 W3 W5 W6 Map: extract k, output [k, record] Reduce: Do nothing! one or two records for k=6? CS347
42
42 Other Issues Skipping bad records Debugging CS347
43
43 MR Claimed Advantages Model easy to use, hides details of parallelization, fault recovery Many problems expressible in MR framework Scales to thousands of machines CS347
44
44 MR Possible Disadvantages 1-input 2-stage data flow rigid, hard to adapt to other scenarios Custom code needs to be written even for the most common operations, e.g., projection and filtering Opaque nature of map, reduce functions impedes optimization CS347
45
45 Hadoop Open-source Map-Reduce system Also, a toolkit –HDFS – filesystem for Hadoop –HBase – Database for Hadoop Also, an ecosystem –Tools –Recipes –Developer community CS347
46
MapReduce pipelines Output of one MapReduce becomes input for another –Example: Stage 1: Translate data to canonical form Stage 2: Term count Stage 3: Sort by frequency Stage 4: Extract top-k CS347 46
47
Make it database-y? Simple idea: each operator is a MapReduce How to do: –Select –Project –Group by, aggregate –Join CS347 47
48
Reduce-side join Shuffle puts all values for the same key at the same reducer Mapper –Input: tuples from R; tuples from S –Output: (join value, (R|S, tuple)) Reducer –Local join of all R tuples with all S tuples CS347 48
49
Map-side join Like a hash-join, but every mapper has a copy of the hash table Mapper: –Read in hash table of R –Input: Tuples of S –Output: Tuples of S joined with tuples of R Reducer –Pass through CS347 49
50
Comparison Reduce-side join shuffles all the data Map-side join requires one table to be small CS347 50
51
Semi-join? One idea: –MapReduce 1: Extract the join keys of R; reduce-side join with S –MapReduce 2: Map-side join result of MapReduce 1 with R CS347 51
52
Platforms for SQL-like queries Pig Latin Hive MapR MemSQL … CS347 52
53
Why not just use a DBMS? Many DBMSs exist and are highly optimized A comparison of approaches to large-scale data analysis. Pavlo et al, SIGMOD 2009 CS347 53
54
Why not just use a DBMS? One reason: loading data into a DBMS is hard A comparison of approaches to large-scale data analysis. Pavlo et al, SIGMOD 2009 CS347 54
55
Why not just use a DBMS? Other possible reasons: –MapReduce is more scalable –MapReduce is more easily deployed –MapReduce is more easily extended –MapReduce is more easily optimized –MapReduce is free (that is, Hadoop) –I already know Java –MapReduce is exciting and new CS347 55
56
Data store Instead of HDFS, data could be stored in a database Example: HadoopDB/Hadapt –Data store is PostgresSQL –Allows for indexing, fast local query processing HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel J. Abadi, Avi Silberschatz, Alex Rasin. In Proceedings of VLDB, 2009. CS347 56
57
Batch processing? MapReduce materializes intermediate results –Map output written to disk before reduce starts What if we pipelined the data from map to reduce? –Reduce could quickly produce approximate answers –MapReduce could implement continuous queries MapReduce Online. Tyson Condie, Neil Conway, Peter Alvaro, Joe Hellerstein, Khaled Elmeleegy, Russell Sears. NSDI 2010 CS347 57
58
Not just database queries! Any large, partition-able computation –Build an inverted index –Image thumbnailing –Machine translation –… –Anything expressible in the Map/Reduce functions via general purpose language CS347 58
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.