Download presentation
Presentation is loading. Please wait.
Published byGriselda Hensley Modified over 9 years ago
1
MapReduce Programming Model
2
HP Cluster Computing Challenges Programmability: need to parallelize algorithms manually Must look at problems from parallel standpoint Tightly coupled problems require frequent communication (more of the slow part!) We want to decouple the problem ▫Increase data locality, Balance the workload, etc Parallel efficiency: comm. is fundamental difficulty Distributing data, updating shared resource, communicating results Machines have separate memories, so no usual inter-process communication – need network Introduces inefficiencies: overhead, waiting, etc.
3
Programming Models: What is MPI? MPI: Message Passing Interface (www.mpi.org) World’s most popular high perf. distributed API MPI is “de facto standard” in scientific computing C and FORTRAN, ver. 2 in 1997 What is MPI good for? Abstracts away common network communications Allows lots of control without bookkeeping Freedom and flexibility come with complexity ▫300 subroutines, but serious programs with fewer than 10 Basics: One executable run on every node (SPMD: single program multiple data) Each node process has a rank ID number assigned Call API functions to send messages Send/Receive of a block of data (in array) ▫MPI_Send(start, count, datatype, dest, tag, comm_context) ▫MPI_Recv(start, count, datatype, source, tag, comm-context, status)
4
Poor programmability: Socket programming, with support of data structures ▫Programmers need to take care of everything: data distribution, inter-proc comm, orchestration Blocking comm can cause deadlock –Proc1: MPI_Receive(Proc2, A); MPI_Send(Proc2, B); –Proc2: MPI_Receive(Proc1, B); MPI_Send(Proc1, A); Potential to high parallel efficiency, but Large overhead from comm. mismanagement ▫Time spent blocking is wasted cycles ▫Can overlap computation with non-blocking comm. Load imbalance is possible! Dead machines? Challenges with MPI
5
Google’s MapReduce Large-Scale Data Processing Want to use hundreds or thousands of CPUs... but this needs to be easy! MapReduce provides Automatic parallelization & distribution Fault tolerance I/O scheduling Monitoring & status updates
6
Programming Concept Map Perform a function on individual values in a data set to create a new list of values Example: square x = x * x map square [1,2,3,4,5] returns [1,4,9,16,25] Reduce Combine values in a data set to create a new value Example: sum = (each elem in arr, total +=) reduce [1,2,3,4,5] returns 15 (the sum of the elements)
7
Map/Reduce map(key, val) is run on each item in set emits new-key / new-val pairs ▫Processes input key/value pair ▫Produces set of intermediate pairs reduce(key, vals) is run for each unique key emitted by map() emits final output ▫Combines all intermediate values for a particular key ▫Produces a set of merged output values (usu just one)
8
Count words in docs: An Example Input consists of (url, contents) pairs map(key=url, val=contents): ▫ For each word w in contents, emit (w, “1”) reduce(key=word, values=uniq_counts): ▫ Sum all “1”s in values list ▫ Emit result “(word, sum)”
9
Count, Illustrated map(key=url, val=contents): For each word w in contents, emit (w, “1”) reduce(key=word, values=uniq_counts): Sum all “1”s in values list Emit result “(word, sum)” see bob throw see spot run see1 bob1 run1 see 1 spot 1 throw1 bob1 run1 see 2 spot 1 throw1
10
“Mapper” nodes are responsible for the map function map(String input_key, String input_value): // input_key : document name (or line of text) // input_value: document contents for each word w in input_value: EmitIntermediate(w, "1"); “Reducer” nodes are responsible for the reduce function reduce(String output_key, Iterator intermediate_values): // output_key : a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result)); Data on a distributed file system (DFS) MapReduce WordCount Example
11
Garcia, UCB MapReduce WordCount Java code
12
Execution Overview How is this distributed? 1. Partition input key/value pairs into chunks, run map() tasks in parallel 2. After all map()s are complete, consolidate all emitted values for each unique emitted key 3. Now partition space of output map keys, and run reduce() in parallel If map() or reduce() fails, reexecute!
13
Execution Overview (Cont’d)
14
Garcia, UCB MapReduce WordCount Diagram ah ah erahif oror uhorah if ah:1,1,1,1 ah:1if:1 or:1or:1 uh:1or:1ah:1 if:1 er:1if:1,1or:1,1,1uh:1 ah:1 ah:1 er:1 41231 file 1 file 2 file 3 file 4 file 5 file 6 file 7 (ah)(er)(if)(or)(uh) map(String input_key, String input_value): // input_key : doc name // input_value: doc contents for each word w in input_value: EmitIntermediate(w, "1"); reduce(String output_key, Iterator intermediate_values): // output_key : a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result));
15
Map Reads contents of assigned portion of input-file Parses and prepares data for input to map function (e.g. read from HTML) Passes data into map function and saves result in memory (e.g. ) Periodically writes completed work to local disk Notifies Master of this partially completed work (intermediate data)
16
Reduce Receives notification from Master of partially completed work Retrieves intermediate data from Map-Machine via remote-read Sorts intermediate data by key (e.g. by target page) Iterates over intermediate data For each unique key, sends corresponding set through reduce function Appends result of reduce function to final output file (GFS)
17
Parallel Execution
18
Task Granularity & Pipelining Fine granularity tasks: map tasks >> machines Minimizes time for fault recovery Can pipeline shuffling with map execution Better dynamic load balancing Often use 200,000 map & 5000 reduce tasks Running on 2000 machines
19
Handled via re-execution Detect failure : periodically ping workers Any machine who does not respond is considered “dead” Re-execute completed + in-progress map tasks ▫ Data stored on local machine becomes unreachable. Re-execute in progress reduce tasks Task completion committed through master Robust: lost 1600/1800 machines once finished ok Fault Tolerance / Workers
20
Master Failure Could be handled by making the write periodic checkpoints of the master data structures. But don't yet (master failure unlikely)
21
Slow workers significantly delay completion time Other jobs consuming resources on machine Bad disks w/ soft errors transfer data slowly Weird things: processor caches disabled (!!) Solution: Near end of phase, schedule redundant execution of in-process tasks Whichever one finishes first "wins" Dramatically shortens job completion time Refinement: Redundant Execution
22
Refinement: Locality Optimization Master scheduling policy: Asks GFS for locations of replicas of input file blocks Map tasks typically split into 64MB (GFS block size) Map tasks scheduled so GFS input block replica are on same machine or same rack Effect Thousands of machines read input at local disk speed ▫ Without this, rack switches limit read rate
23
Refinement Skipping Bad Records Map/Reduce functions sometimes fail for particular inputs Best solution is to debug & fix ▫ Not always possible ~ third-party source libraries On segmentation fault: ▫ Send UDP packet to master from signal handler ▫ Include sequence number of record being processed If master sees two failures for same record: ▫ Next worker is told to skip the record ( it is acceptable to ignore when doing statistical analysis on a large data set)
24
Sorting guarantees within each reduce partition Compression of intermediate data Combiner Useful for saving network bandwidth Local execution for debugging/testing User-defined counters Other Refinements
25
MapReduce: In Summary Now it’s easy to program for many CPUs Communication management effectively gone ▫I/O scheduling done for us Fault tolerance, monitoring ▫machine failures, suddenly-slow machines, etc are handled Can be much easier to design and program! Can cascade several (many?) MapReduce tasks But … it further restricts solvable problems Might be hard to express problem in MapReduce Data parallelism is key ▫Need to be able to break up a problem by data chunks MapReduce is closed-source (to Google) C++ ▫Hadoop is open-source Java-based rewrite
26
MapReduce Example 1/4 package org.myorg; import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; public class WordCount { public static class Map extends MapReduceBase implements Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text();
27
MapReduce Example 2/4 public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); }
28
MapReduce Example 3/4 public static class Reduce extends MapReduceBase implements Reducer { public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); }
29
MapReduce Example 4/4 public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); conf.setInputPath(new Path(args[0])); conf.setOutputPath(new Path(args[1])); JobClient.runJob(conf); }
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.