MapReduce Simplified Data Processing on Large Cluster

Slides:



Advertisements
Similar presentations
Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Data-Intensive Computing with MapReduce/Pig Pramod Bhatotia MPI-SWS Distributed Systems – Winter Semester 2014.
Distributed Computations
MapReduce Dean and Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, Vol. 51, No. 1, January Shahram.
MapReduce Simplified Data Processing on Large Clusters Google, Inc. Presented by Prasad Raghavendra.
Distributed Computations MapReduce
7/14/2015EECS 584, Fall MapReduce: Simplied Data Processing on Large Clusters Yunxing Dai, Huan Feng.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
MapReduce Simplified Data Processing On large Clusters Jeffery Dean and Sanjay Ghemawat.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
MapReduce: Simplified Data Processing on Large Clusters
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.
MapReduce.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
MapReduce: Simplified Data Processing on Large Clusters 컴퓨터학과 김정수.
Map Reduce: Simplified Data Processing On Large Clusters Jeffery Dean and Sanjay Ghemawat (Google Inc.) OSDI 2004 (Operating Systems Design and Implementation)
Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Map Reduce: Simplified Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat Google, Inc. OSDI ’04: 6 th Symposium on Operating Systems Design.
MAP REDUCE : SIMPLIFIED DATA PROCESSING ON LARGE CLUSTERS Presented by: Simarpreet Gill.
MapReduce How to painlessly process terabytes of data.
MapReduce M/R slides adapted from those of Jeff Dean’s.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
SECTION 5: PERFORMANCE CHRIS ZINGRAF. OVERVIEW: This section measures the performance of MapReduce on two computations, Grep and Sort. These programs.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
MapReduce : Simplified Data Processing on Large Clusters P 謝光昱 P 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
MapReduce: simplified data processing on large clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce: Simplified Data Processing on Large Cluster Authors: Jeffrey Dean and Sanjay Ghemawat Presented by: Yang Liu, University of Michigan EECS 582.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Hadoop Aakash Kag What Why How 1.
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
Large-scale file systems and Map-Reduce
MapReduce: Simplified Data Processing on Large Clusters
The Map-Reduce Framework
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
Lecture 3: Bringing it all together
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Introduction to Spark.
MapReduce Simplied Data Processing on Large Clusters
湖南大学-信息科学与工程学院-计算机与科学系
February 26th – Map/Reduce
Cse 344 May 4th – Map/Reduce.
CS110: Discussion about Spark
CS-4513 Distributed Computing Systems Hugh C. Lauer
Apache Spark Lecture by: Faria Kalim (lead TA) CS425, UIUC
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
CS 345A Data Mining MapReduce This presentation has been altered.
Apache Spark Lecture by: Faria Kalim (lead TA) CS425 Fall 2018 UIUC
Cloud Computing MapReduce, Batch Processing
Introduction to MapReduce
MapReduce: Simplified Data Processing on Large Clusters
5/7/2019 Map Reduce Map reduce.
COS 518: Distributed Systems Lecture 11 Mike Freedman
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

MapReduce Simplified Data Processing on Large Cluster Jeffrey Dean and Sanjay Ghemawat Google, Inc Presented by Haoran Wang EECS 582 – F16

The problem The need to process large data set To hide the messy details of infrastructure 2004 – gmail, google earth EECS 582 – F16

The solution Massive parallelism on a large set of commodity computers Simplicity A programming model – easy to understand A simple interface – easy to use Parallelism Performance Efficiency Load balancing Fault-tolerance Scalability EECS 582 – F16

Programming model – word count apple banana Doc 1 banana orange Doc 2 orange apple Doc 3 EECS 582 – F16

Programming model – word count apple banana Doc 1 banana orange Doc 2 orange apple Doc 3 (Doc1: apple:1, banana:1) (Doc2: banana:1, orange:1)(Doc3: orange:1, apple:1) EECS 582 – F16

Programming model – word count apple banana Doc 1 banana orange Doc 2 orange apple Doc 3 (Doc1: apple:1, banana:1) (Doc2: banana:1, orange:1)(Doc3: orange:1, apple:1) apple: 2, banana: 2, orange: 2 EECS 582 – F16

Programming model – word count apple banana Doc 1 banana orange Doc 2 orange apple Doc 3 (Doc1: apple:1, banana:1) (Doc2: banana:1, orange:1)(Doc3: orange:1, apple:1) apple: 2, banana: 2, orange: 2 apple banana Doc 1 apple:1 banana:1 banana orange Doc 2 banana:1 orange:1 orange apple Doc 3 orange:1 apple:1 EECS 582 – F16

Programming model – word count apple banana Doc 1 banana orange Doc 2 orange apple Doc 3 (Doc1: apple:1, banana:1) (Doc2: banana:1, orange:1)(Doc3: orange:1, apple:1) apple: 2, banana: 2, orange: 2 apple banana Doc 1 apple:1 banana:1 apple 1, apple 1 banana 1, banana 1 orange 1, orange 1 banana orange Doc 2 banana:1 orange:1 orange apple Doc 3 orange:1 apple:1 EECS 582 – F16

Programming model – word count apple banana Doc 1 banana orange Doc 2 orange apple Doc 3 (Doc1: apple:1, banana:1) (Doc2: banana:1, orange:1)(Doc3: orange:1, apple:1) apple: 2, banana: 2, orange: 2 apple banana Doc 1 apple:1 banana:1 apple 1, apple 1 banana 1, banana 1 orange 1, orange 1 apple 2 banana 2 orange 2 banana orange Doc 2 banana:1 orange:1 orange apple Doc 3 orange:1 apple:1 EECS 582 – F16

Generalization apple banana apple:1 banana:1 apple 1, apple 1 Doc 1 apple:1 banana:1 apple 1, apple 1 banana 1, banana 1 orange 1, orange 1 apple 2 banana 2 orange 2 banana orange Doc 2 banana:1 orange:1 orange apple Doc 3 orange:1 apple:1 EECS 582 – F16

Generalization Partition  Shuffle  Collect Remapping: Reduction apple banana Doc 1 apple:1 banana:1 apple 1, apple 1 banana 1, banana 1 orange 1, orange 1 apple 2 banana 2 orange 2 banana orange Doc 2 banana:1 orange:1 orange apple Doc 3 orange:1 apple:1 Partition  Shuffle  Collect Remapping: (docID-content)  (word-frequency) Reduction Many (word-frequency)  small number of (word-frequency) Duplicate keys  distinct keys Bring up spark here – narrow dependency EECS 582 – F16

Things to worry about in a real system apple banana Doc 1 apple:1 banana:1 apple 1, apple 1 banana 1, banana 1 orange 1, orange 1 apple 2 banana 2 orange 2 banana orange Doc 2 banana:1 orange:1 orange apple Doc 3 orange:1 apple:1 Granularity: How many partitions for map and reduce? Scheduling: How to assign workers? How to coordinate between map and reduce workers? Fault-tolerance: What if something fails? EECS 582 – F16

The real MapReduce with word count (Doc1: apple banana …) (Doc2: banana orange…) (Doc3: orange aplle…) (Doc4: apple banana …) (Doc5: banana orange…) (Doc6: orange apple…) ... ... Input (Doc1: apple banana…) (Doc2: banana orange…) (Doc3: orange apple…) Split 1 apple 1, apple 1 banana 1, banana 1 orange 1, orange 1 banana 1, banana 1, banana 1, banana1, Banana 4 (Doc4: apple banana …) (Doc5: banana orange…) (Doc6: orange apple...) Split 2 apple 1, apple 1 banana 1, banana 1 orange 1, orange 1 . . . . EECS 582 – F16

The real MapReduce with word count Map worker Reduce worker (Doc1: apple banana …) (Doc2: banana orange…) (Doc3: orange aplle…) (Doc4: apple banana …) (Doc5: banana orange…) (Doc6: orange apple…) ... ... Input (Doc1: apple banana…) (Doc2: banana orange…) (Doc3: orange apple…) Split 1 apple 1, apple 1 banana 1, banana 1 orange 1, orange 1 banana 1, banana 1, banana 1, banana1, Banana 4 (Doc4: apple banana …) (Doc5: banana orange…) (Doc6: orange apple...) Split 2 apple 1, apple 1 banana 1, banana 1 orange 1, orange 1 . . . . EECS 582 – F16

The real MapReduce with word count Map worker Reduce worker (Doc1: apple banana …) (Doc2: banana orange…) (Doc3: orange aplle…) (Doc4: apple banana …) (Doc5: banana orange…) (Doc6: orange apple…) ... ... Input (Doc1: apple banana…) (Doc2: banana orange…) (Doc3: orange apple…) Split 1 apple 1, apple 1 banana 1, banana 1 orange 1, orange 1 banana 1, banana 1, banana 1, banana1, Banana 4 (Doc4: apple banana …) (Doc5: banana orange…) (Doc6: orange apple...) Split 2 apple 1, apple 1 banana 1, banana 1 orange 1, orange 1 . . . . M splits M * R partitions R outputs EECS 582 – F16

System overview disk disk disk GFS Split 1 Output 1 Split 2 Split 3 Map worker disk Reduce worker Map worker disk Reduce worker Map worker disk GFS Split 1 Split 2 Split 3 … Output 1 Output 2 EECS 582 – F16

System overview Map Func disk Reduce Func Map Func disk Reduce Func Map worker Map Func disk Reduce worker Reduce Func Map worker Map Func disk Reduce worker Reduce Func Map worker Map Func disk GFS Split 1 Split 2 Split 3 … Output 1 Output 2 EECS 582 – F16

System overview Master Map Func disk Reduce Func Map Func disk Map worker Map Func disk Reduce worker Reduce Func Map worker Map Func disk Reduce worker Reduce Func Map worker Map Func disk GFS Split 1 Split 2 Split 3 … Output 1 Output 2 EECS 582 – F16

System overview Master Map Func disk Reduce Func Map Func disk O(M + R) Task status O(M * R) states (locations) Map worker Map Func disk Reduce worker Reduce Func Map worker Map Func disk Reduce worker Reduce Func Map worker Map Func disk GFS Split 1 Split 2 Split 3 … Output 1 Output 2 EECS 582 – F16

System overview Master Map Func disk Reduce Func Map Func disk O(M + R) Task status O(M * R) states (locations) Map worker Map Func disk Reduce worker Reduce Func Map worker Map Func disk Reduce worker Reduce Func Utilize GFS Locality Map worker Map Func disk GFS Split 1 Split 2 Split 3 … Output 1 Output 2 EECS 582 – F16

The reasoning of implementation parallelism splits limited bisection bandwidth Input splits stored in GFS Master coordination Intermediate values go over network just once Master assign new tasks to idle workers Load balancing Many more splits than workers System is designed and built on a lot of reasoning, naturally or even intuitively Not too many otherwise would bring too many overheads No huge tasks EECS 582 – F16

The reasoning of fault-tolerance Map Worker failure(before or after completion) Stateless/Pure function Restart map task Input splits stored in GFS, read from another replica Reduce worker failure (after completion) Difficult to track origin Make output reliable – store in GFS No need to restart Master failure Checkpoint Still difficult to restore updates after most recent checkpoint Materializing map outputs Reduce worker failure in the middle of writing output – GFS’s atomic rename, partial-output won’t be visible Reduce worker read some of the intermediate values of failed workers’ intermediate data – depend on determinist map function. Reduce crash and map crash – need the rerun of map. Failure unlikely for one machine Restart the whole task No need to rerun map if map worker restarts EECS 582 – F16

Refinement Performance Utilities Stragglers – backup tasks Skipping bad records – catch >1 failures on a specific record Utilities Sort inside partitions – faster lookup for reduce workers/output consumers Combiner function – “partial reduce” Additional output Status info Counters Debugging – local (sequential execution) EECS 582 – F16

Evaluation Applications Cluster configuration Grep Sort Map: (record, content)  (pattern, location/frequency) Reduce: identity function Sort Map: (record, content)  (sort_key, value) Cluster configuration 1800 machines, with 2.0 GHz CPU + 4G memory + 320GB Disk EECS 582 – F16

Distributed Grep M = 15,000 R = 1 # of machines = 1,800 Peaks when most of the machines are assigned jobs Implication – it takes a while for the master to assign jobs and for the work to actually get started EECS 582 – F16

Sort M = 15,000 R = 4,000 # of machines = 1,800 Upper left: Input rate less than grep  sort has a lot of intermediate data to write to disk Middle left: Shuffling starts as soon as the first map task starts Middle left: Batch scheduling, so two peaks Right: failures (simulated by killing 200 random workers) Bottom left: sort takes some time upper left > bottom left: locality optimization for input mid left > bottom left: output needs to write extra copies in GFS Question: why master batch scheduling? mid: no backup tasks for schedulers EECS 582 – F16

Other applications Large-scale machine learning Data mining Large-scale indexing Large-scale graph computation EECS 582 – F16

Conclusion Applicable to massive production Good trade-offs It actually worked! Massive parallelism works! Single master works! Good trade-offs Simplicity – but not the most efficient or flexible Scales well – but not when M is comparable to # of machines (master keeps O(M * R) states) Tradeoff is not about gain a lot by compromising a little, rather, it’s about you KNOW EXACTLY what you will lose and what you will get. EECS 582 – F16

Marked the beginning of the big data / distributed computation era Influence BlinkDB Storm Spark-Streaming Pregel GraphLab GraphX DryadLINQ Spark Dremel MapReduce Hadoop Dryad Hive 2005 2010 2015 Forked from the professor’s course intro slides EECS 582 – F16

Drawbacks Not flexible Not very efficient Risk for single master A too strict programming model Must be stateless Must launch a series of MR tasks to implement multiple iterations Not very efficient Too many disk I/O Not enough exploiting of distributed memory GFS synchronization barrier Risk for single master No streaming Bring up Spark RDD in memory: much faster Advanced faul-tolerance Spark streaming EECS 582 – F16

References MIT 6.824 Distributed System (https://pdos.csail.mit.edu/6.824/notes/l01.txt ) MapReduce presentation at Google Research(http://research.google.com/archive/mapreduce-osdi04-slides/index.html ) MapReduce presentation from EECS 582 winter 2016 (http://web.eecs.umich.edu/~mosharaf/Slides/EECS582/W16/030916- YangMapReduce.pptx )\ Apache hadoop (http://hadoop.apache.org/ ) Pregel (https://kowshik.github.io/JPregel/pregel_paper.pdf ) Dremel (http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36632. pdf ) Bring up Spark RDD in memory: much faster Advanced faul-tolerance Spark streaming EECS 582 – F16

Discussions The paper didn’t really talk much about their master scheduling algorithm, do you have any guesses? What modifications can be made so that MapReduce could support streaming? MapReduce used normal Linux kernel, are there any alternatives (e.g. from our previously read papers) that could speed up the computation (assumption: no other running workloads) Are there any other large-scale single-master distributed system used in production today? What are the trade-offs between single-master and multi-master/completely distributed designs? Discuss 1: Why batching reduce tasks? Discuss 4: SVN, traditional NO-SQL databases EECS 582 – F16