MapReduce ： Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang 2014.10.6.

MapReduce ： Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang 2014.10.6

Outline  Introduction  Programming model  Implementation  Refinements  Performance  Conclusion

1. Introduction

What is MapReduce  Origin from Google, [OSDI’04]  A simple programming model  Functional model  For large-scale data processing  Exploits large set of commodity computers  Executes process in distributed manner  Offers high availability

Motivation  Lots of demands for very large scale data processing ：  computation are conceptually straightforward  input data is large  distributed across thousands of machines  The issue of how to parallelize computation, distribute the data, and handle failures obscure the original computation with complex code to deal with these issues

Distributed Grep Very big data Split data grep matches cat All matches

Distributed Word Count Very big data Split data count merge merged count

Goal Design a new abstraction that allows us to:  express the simple computation we are trying to perform  hides the messy details of parallelization, fault- tolerance, data distribution and load balancing in a library

2. Programming Model

Map + Reduce  Map:  Accepts input key/value pair  Emits intermediate key/value pair  Reduce :  Accepts intermediate key/value* pair  Emits output key/value pair Very big data Result MAPMAP REDUCEREDUCE Partitioning Function

A Simple Example  Counting words in a large set of documents ： map(string value)‏ //key: document name //value: document contents for each word w in value EmitIntermediate(w, “1”); reduce(string key, iterator values)‏ //key: word //values: list of counts int results = 0; for each v in values result += ParseInt(v); Emit(AsString(result));

More Examples  Distributed Grep  Map: emits a line that matches the pattern  Reduce: identity function  Count of URL Access Frequency  Map: processes logs and emit  Map: processes logs and emit  Reduce: adds together all values for the same URL and emits a  Reduce: adds together all values for the same URL and emits a  Distributed Sort  Map: extracts key from each record and emit  Map: extracts key from each record and emit  Reduce: identity function

3. Implementation

Environment Implementation depends on the environment:  Machines with x86 dual-CPU, 2-4 GB of memory;  Commodity networking hardware, 100 Mb/s or 1 Gb/s at machine level;  A cluster consists of hundreds or thousands of machines;  Embedded inexpensive IDE disks provides storage

Execution Overview

1. Input data partitioning (M splits, each 16-64MB); Starting up copies of program on a cluster 2. Tasks assignment: master assigns Map or Reduce to workers 3. Map task: parse key/value pair from input; produce intermediate key/value pair by Map function

Execution Overview 4. Pairs partitioning (hash function, typically mod); Location forwarding by master 5. Reduce task: read data from map worker; sort it by intermediate key; group 6. Reduce function: deal with the groups passed by Reduce task 7. All tasks completed. The MapReduce call returns

Details of Map/Reduce Task

Master Data Structure  Master keeps several data structure:  It stores the state (idle, in-process, or completed) for each map and reduce task  It stores the identity of the worker machine  Master is the conduit:  With master the location of intermediate file is propagated from map task to reduce task

Fault Tolerance  Worker failure  Master pings workers periodically  Any machine who does not respond is considered “dead”  For both Map and Reduce machines, any task in progress needs to be re-executed  For Map machines, completed tasks are also reset because results are stored on local disk  Master failure  Abort entire computation

Locality Issue  Master scheduling policy  Asks GFS for locations of replicas of input file blocks  Map tasks typically split into 64MB (== GFS block size)  Map tasks scheduled so GFS input block replica are on same or nearby machine  Effect  Most input data is read locally  Consumes no network bandwidth

Choice of M and R :  Ideally, M and R should be much larger than the number of work machines  There are practical bounds on M and R:  O(M+R) scheduling decisions  O(M*R) state in memory  M=200,000 and R=5,000, using 2,000 working machines Task Granularity

Backup Tasks  Some “straggler” not performing optimally  Near end of computation, schedule redundant execution of in-process tasks  First to complete “wins”

4. Refinements

Refinements  An Input Reader  Support read input data in different formats  Support read records from database or memory  An output writer  Support produce data in different formats

Refinements  A Partition Function  Data gets partitioned using the function on the intermediate key  Default: hash(key) mod R  A Combiner Function  Do partial merging of data before it is send over network  Typically the same code is used for the combiner and the reduce

Refinements  Ordering Guarantees  The intermediate key/value pairs are processed in increasing key order  Generate a sorted output file per partition  Side-effects  Produce auxiliary files as additional outputs  Write to a temporary file and atomically rename it

Refinements  Skipping Bad Records  map/reduce functions might fail for particular inputs  Fixing the bug might not be possible: third party libraries  On error  Worker sends signal to master  If multiple error on the same record, skip record

Refinements  Local Execution  Debugging problems can be tricky: distributed system  An alternative implementation: execute on local machine  Computation can be limited to particular map tasks

Refinements  Status Information  The master exports a set of status pages for human consumption  Useful for diagnose bugs  Counters  Count occurrences of various events  The counter are periodically propagated to the master  Display on the status page

Status monitor

5. Performance

Performance Boasts  Distributed grep  10 10 100-byte files (~1TB of data)‏  3-character substring found in ~100k files  ~1800 workers  150 seconds start to finish, including ~60 seconds startup overhead

Performance Boasts  Distributed sort  Same files/workers as above  50 lines of MapReduce code  891 seconds, including overhead  Best reported result of 1057 seconds for TeraSort benchmark

Performance Boasts

6. Conclusion

Conclusion  Easy to use  A large variety of problems are easily expressible as MapReduce computation  Develop an implementation of MapReduce

Thank you!

MapReduce ： Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang 2014.10.6.

Similar presentations

Presentation on theme: "MapReduce ： Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang 2014.10.6."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MapReduce ： Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang 2014.10.6.

Similar presentations

Presentation on theme: "MapReduce ： Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang 2014.10.6."— Presentation transcript:

Similar presentations

About project

Feedback