MapReduce and Beyond Steve Ko 1. Trivia Quiz: What’s Common? 2 Data-intensive computing with MapReduce!

Slides:



Advertisements
Similar presentations
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Advertisements

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Efficient On-Demand Operations in Large-Scale Infrastructures Final Defense Presentation by Steven Y. Ko.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Distributed Computations
MapReduce: Simplified Data Processing on Large Clusters Cloud Computing Seminar SEECS, NUST By Dr. Zahid Anwar.
MapReduce Simplified Data Processing on Large Clusters Google, Inc. Presented by Prasad Raghavendra.
Distributed Computations MapReduce
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
On Availability of Intermediate Data in Cloud Computations Steven Y. Ko, Imranul Hoque, Brian Cho, and Indranil Gupta Distributed Protocols Research Group.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.
Lecture 3-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) August 31, 2010 Lecture 3  2010, I. Gupta.
MapReduce.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
Jeffrey D. Ullman Stanford University. 2 Chunking Replication Distribution on Racks.
Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.
Map Reduce and Hadoop S. Sudarshan, IIT Bombay
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
Introduction to MapReduce ECE7610. The Age of Big-Data  Big-data age  Facebook collects 500 terabytes a day(2011)  Google collects 20000PB a day (2011)
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Map Reduce: Simplified Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat Google, Inc. OSDI ’04: 6 th Symposium on Operating Systems Design.
CSE 486/586 CSE 486/586 Distributed Systems Data Analytics Steve Ko Computer Sciences and Engineering University at Buffalo.
MapReduce M/R slides adapted from those of Jeff Dean’s.
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
Papers on Storage Systems 1) Purlieus: Locality-aware Resource Allocation for MapReduce in a Cloud, SC ) Making Cloud Intermediate Data Fault-Tolerant,
Introduction to Search Engines Technology CS Technion, Winter 2013 Amit Gross Some slides are courtesy of: Edward Bortnikov & Ronny Lempel, Yahoo!
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
MapReduce using Hadoop Jan Krüger … in 30 minutes...
Lecture 4. MapReduce Instructor: Weidong Shi (Larry), PhD
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
An Open Source Project Commonly Used for Processing Big Data Sets
Large-scale file systems and Map-Reduce
Hadoop MapReduce Framework
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
Introduction to MapReduce and Hadoop
Introduction to HDFS: Hadoop Distributed File System
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
Distributed Systems CS
MapReduce Simplied Data Processing on Large Clusters
湖南大学-信息科学与工程学院-计算机与科学系
February 26th – Map/Reduce
Hadoop Basics.
Map reduce use case Giuseppe Andronico INFN Sez. CT & Consorzio COMETA
Cse 344 May 4th – Map/Reduce.
CS110: Discussion about Spark
CS 345A Data Mining MapReduce This presentation has been altered.
Distributed Systems CS
Introduction to MapReduce
CS639: Data Management for Data Science
5/7/2019 Map Reduce Map reduce.
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

MapReduce and Beyond Steve Ko 1

Trivia Quiz: What’s Common? 2 Data-intensive computing with MapReduce!

What is MapReduce? A system for processing large amounts of data Introduced by Google in 2004 Inspired by map & reduce in Lisp OpenSource implementation: Hadoop by Yahoo! Used by many, many companies – A9.com, AOL, Facebook, The New York Times, Last.fm, Baidu.com, Joost, Veoh, etc. 3

Background: Map & Reduce in Lisp Sum of squares of a list (in Lisp) (map square ‘( )) – Output: ( ) [processes each record individually] fff f 1 1

Background: Map & Reduce in Lisp Sum of squares of a list (in Lisp) (reduce + ‘( )) – (+ 16 (+ 9 (+ 4 1) ) ) – Output: 30 [processes set of all records in a batch] fff initial returned

Background: Map & Reduce in Lisp Map – processes each record individually Reduce – processes (combines) set of all records in a batch 6

What Google People Have Noticed Keyword search – Find a keyword in each web page individually, and if it is found, return the URL of the web page – Combine all results (URLs) and return it Count of the # of occurrences of each word – Count the # of occurrences in each web page individually, and return the list of – For each word, sum up (combine) the count Notice the similarities? 7 Map Reduce Map Reduce

What Google People Have Noticed Lots of storage + compute cycles nearby Opportunity – Files are distributed already! (GFS) – A machine can processes its own web pages (map) 8 CPU

Google MapReduce Took the concept from Lisp, and applied to large-scale data- processing Takes two functions from a programmer (map and reduce), and performs three steps Map – Runs map for each file individually in parallel Shuffle – Collects the output from all map executions – Transforms the map output into the reduce input – Divides the map output into chunks Reduce – Runs reduce (using a map output chunk as the input) in parallel 9

Programmer’s Point of View Programmer writes two functions – map() and reduce() The programming interface is fixed – map (in_key, in_value) -> list of (out_key, intermediate_value) – reduce (out_key, list of intermediate_value) -> (out_key, out_value) Caution: not exactly the same as Lisp 10

Inverted Indexing Example Word -> list of web pages containing the word 11 every -> … its -> … every -> … its -> … Input: web pagesOutput: word-> urls

Map Interface – Input: pair => – Output: list of intermediate pairs => list of 12 key = value = “every happy family is alike.” key = value = “every happy family is alike.” … … map() Map Input: … … key = value = “every unhappy family is unhappy in its own way.” key = value = “every unhappy family is unhappy in its own way.” Map Output: list of

Shuffle MapReduce system – Collects outputs from all map executions – Groups all intermediate values by the same key 13 every -> every -> … … … … Map Output: list of Reduce Input: happy -> happy -> unhappy -> unhappy -> family -> family ->

Reduce Interface – Input: – Output: 14 every -> every -> Reduce Input: happy -> happy -> unhappy -> unhappy -> family -> family -> <every, “ <every, “ <happy, “ <happy, “ <unhappy, “ <unhappy, “ <family, “ <family, “ Reduce Output: reduce()

Execution Overview 15 Map phase Shuffle phase Reduce phase

Implementing MapReduce Externally for user – Write a map function, and a reduce function – Submit a job; wait for result – No need to know anything about the environment (Google: 4000 servers disks, many failures) Internally for MapReduce system designer – Run map in parallel – Shuffle: combine map results to produce reduce input – Run reduce in parallel – Deal with failures 16

Execution Overview 17 Master Input Files Output Map workers Reduce workers M M M R R Input files sent to map tasks Intermediate keys partitioned into reduce tasks

Task Assignment 18 Master Map workers Reduce workers M M M R R Output Input Splits

Fault-tolerance: re-execution Master Map workers Reduce workers M M M R R Re-execute on failure Input Splits Output

Machines share roles 20 Master So far, logical view of cluster In reality – Each cluster machine stores data – And runs MapReduce workers Lots of storage + compute cycles nearby M R M R M R M R M R M R

MapReduce Summary Programming paradigm for data-intensive computing Simple to program (for programmers) Distributed & parallel execution model The framework automates many tedious tasks (machine selection, failure handling, etc.) 21

Hadoop Demo 22

Beyond MapReduce As a programming model – Limited: only Map and Reduce – Improvements: Pig, Dryad, Hive, Sawzall, Map- Reduce-Merge, etc. As a runtime system – Better scheduling (e.g., LATE scheduler) – Better fault handling (e.g., ISS) – Pipelining (e.g., HOP) – Etc. 23

Making Cloud Intermediate Data Fault-Tolerant Steve Ko* (Princeton University), Imranul Hoque (UIUC), Brian Cho (UIUC), and Indranil Gupta (UIUC) * work done at UIUC

Our Position Intermediate data as a first-class citizen for dataflow programming frameworks in clouds 25

Our Position Intermediate data as a first-class citizen for dataflow programming frameworks in clouds – Dataflow programming frameworks 26

Our Position Intermediate data as a first-class citizen for dataflow programming frameworks in clouds – Dataflow programming frameworks – The importance of intermediate data 27

Our Position Intermediate data as a first-class citizen for dataflow programming frameworks in clouds – Dataflow programming frameworks – The importance of intermediate data – ISS (Intermediate Storage System) Not to be confused with, International Space Station IBM Internet Security Systems 28

Dataflow Programming Frameworks Runtime systems that execute dataflow programs – MapReduce (Hadoop), Pig, Hive, etc. – Gaining popularity for massive-scale data processing – Distributed and parallel execution on clusters A dataflow program consists of – Multi-stage computation – Communication patterns between stages 29

Example 1: MapReduce Two-stage computation with all-to-all comm. – Google introduced, Yahoo! open-sourced (Hadoop) – Two functions – Map and Reduce – supplied by a programmer – Massively parallel execution of Map and Reduce 30 Stage 1: Map Stage 2: Reduce Shuffle (all-to-all)

Example 2: Pig and Hive Multi-stage with either all-to-all or 1-to-1 Stage 1: Map Stage 2: Reduce Stage 3: Map Stage 4: Reduce 31 Shuffle (all-to-all) 1-to-1 comm.

Usage Google (MapReduce) – Indexing: a chain of 24 MapReduce jobs – ~200K jobs processing 50PB/month (in 2006) Yahoo! (Hadoop + Pig) – WebMap: a chain of 100 MapReduce jobs Facebook (Hadoop + Hive) – ~300TB total, adding 2TB/day (in 2008) – 3K jobs processing 55TB/day Amazon – Elastic MapReduce service (pay-as-you-go) Academic clouds – Google-IBM Cluster at UW (Hadoop service) – CCT at UIUC (Hadoop & Pig service) 32

One Common Characteristic Intermediate data – Intermediate data? data between stages Similarities to traditional intermediate data [Bak91, Vog99] – E.g.,.o files – Critical to produce the final output – Short-lived, written-once and read-once, & used- immediately – Computational barrier 33

One Common Characteristic Computational Barrier 34 Stage 1: Map Stage 2: Reduce Computational Barrier

Why Important? Large-scale: possibly very large amount of intermediate data Barrier: Loss of intermediate data => the task can’t proceed 35 Stage 1: Map Stage 2: Reduce

Failure Stats 5 average worker deaths per MapReduce job (Google in 2006) One disk failure in every run of a 6-hour MapReduce job with 4000 machines (Google in 2008) 50 machine failures out of 20K machine cluster (Yahoo! in 2009) 36

Hadoop Failure Injection Experiment Emulab setting – 20 machines sorting 36GB – 4 LANs and a core switch (all 100 Mbps) 1 failure after Map – Re-execution of Map-Shuffle-Reduce ~33% increase in completion time 37 MapShuffleReduceMap Shuffl e Reduce

Re-Generation for Multi-Stage Cascaded re-execution: expensive Stage 1: Map Stage 2: Reduce Stage 3: Map Stage 4: Reduce 38

Importance of Intermediate Data Why? – (Potentially) a lot of data – When lost, very costly Current systems handle it themselves. – Re-generate when lost: can lead to expensive cascaded re-execution We believe that the storage can provide a better solution than the dataflow programming frameworks 39

Our Position Intermediate data as a first-class citizen for dataflow programming frameworks in clouds Dataflow programming frameworks The importance of intermediate data – ISS (Intermediate Storage System) Why storage? Challenges Solution hypotheses Hypotheses validation 40

Why Storage? Replication stops cascaded re-execution 41 Stage 1: Map Stage 2: Reduce Stage 3: Map Stage 4: Reduce

So, Are We Done? No! Challenge: minimal interference – Network is heavily utilized during Shuffle. – Replication requires network transmission too, and needs to replicate a large amount. – Minimizing interference is critical for the overall job completion time. HDFS (Hadoop Distributed File System): much interference 42

Default HDFS Interference Replication of Map and Reduce outputs (2 copies in total) 43

Background Transport Protocols TCP-Nice [Ven02] & TCP-LP [Kuz06] – Support background & foreground flows Pros – Background flows do not interfere with foreground flows (functionality) Cons – Designed for wide-area Internet – Application-agnostic – Not a comprehensive solution: not designed for data center replication Can do better! 44

Our Position Intermediate data as a first-class citizen for dataflow programming frameworks in clouds Dataflow programming frameworks The importance of intermediate data – ISS (Intermediate Storage System) Why storage? Challenges Solution hypotheses Hypotheses validation 45

Three Hypotheses 1.Asynchronous replication can help. – HDFS replication works synchronously. 2.The replication process can exploit the inherent bandwidth heterogeneity of data centers (next). 3.Data selection can help (later). 46

Bandwidth Heterogeneity Data center topology: hierarchical – Top-of-the-rack switches (under-utilized) – Shared core switch (fully-utilized) 47

Data Selection Only replicate locally-consumed data 48 Stage 1: Map Stage 2: Reduce Stage 3: Map Stage 4: Reduce

Three Hypotheses 1.Asynchronous replication can help. 2.The replication process can exploit the inherent bandwidth heterogeneity of data centers. 3.Data selection can help. The question is not if, but how much. If effective, these become techniques. 49

Experimental Setting Emulab with 80 machines – 4 X 1 LAN with 20 machines – 4 X 100Mbps top-of-the-rack switch – 1 X 1Gbps core switch – Various configurations give similar results. Input data: 2GB/machine, random-generation Workload: sort 5 runs – Std. dev. ~ 100 sec.: small compared to the overall completion time 2 replicas of Map outputs in total 50

Asynchronous Replication Modification for asynchronous replication – With an increasing level of interference Four levels of interference – Hadoop: original, no replication, no interference – Read: disk read, no network transfer, no actual replication – Read-Send: disk read & network send, no actual replication – Rep.: full replication 51

Asynchronous Replication Network utilization makes the difference Both Map & Shuffle get affected – Some Maps need to read remotely 52

Three Hypotheses (Validation) Asynchronous replication can help, but still can’t eliminate the interference. The replication process can exploit the inherent bandwidth heterogeneity of data centers. Data selection can help. 53

Rack-Level Replication Rack-level replication is effective. – Only 20~30 rack failures per year, mostly planned (Google 2008) 54

Three Hypotheses (Validation) Asynchronous replication can help, but still can’t eliminate the interference The rack-level replication can reduce the interference significantly. Data selection can help. 55

Locally-Consumed Data Replication It significantly reduces the amount of replication. 56

Three Hypotheses (Validation) Asynchronous replication can help, but still can’t eliminate the interference The rack-level replication can reduce the interference significantly. Data selection can reduce the interference significantly. 57

ISS Design Overview Implements asynchronous rack-level selective replication (all three hypotheses) Replaces the Shuffle phase – MapReduce does not implement Shuffle. – Map tasks write intermediate data to ISS, and Reduce tasks read intermediate data from ISS. Extends HDFS (next) 58

ISS Design Overview Extends HDFS – iss_create() – iss_open() – iss_write() – iss_read() – iss_close() Map tasks – iss_create() => iss_write() => iss_close() Reduce tasks – iss_open() => iss_read() => iss_close() 59

Performance under Failure 5 scenarios – Hadoop (no rep) with one permanent machine failure – Hadoop (reduce rep=2) with one permanent machine failure – ISS (map & reduce rep=2) with one permanent machine failure – Hadoop (no rep) with one transient failure – ISS (map & reduce rep=2) with one transient failure 60

Summary of Results Comparison to no failure Hadoop – One failure ISS: 18% increase in completion time – One failure Hadoop: 59% increase One failure Hadoop vs. One failure ISS – 45% speedup 61

Performance under Failure Hadoop (rep=1) with one machine failure 62

Performance under Failure Hadoop (rep=2) with one machine failure 63

Performance under Failure ISS with one machine failure 64

Performance under Failure Hadoop (rep=1) with one transient failure 65

Performance under Failure ISS-Hadoop with one transient failure 66

Replication Completion Time Replication completes before Reduce – ‘+’ indicates replication time for each block 67

Summary Our position – Intermediate data as a first-class citizen for dataflow programming frameworks in clouds Problem: cascaded re-execution Requirements – Intermediate data availability (scale and dynamism) – Interference minimization (efficiency) Asynchronous replication can help, but still can’t eliminate the interference The rack-level replication can reduce the interference significantly. Data selection can reduce the interference significantly. Hadoop & Hadoop + ISS show comparable completion times. 68

References [Vog99] W. Vogels. File System Usage in Windows NT 4.0. In SOSP, [Bak91] M. G. Baker, J. H. Hartman, M. D. Kupfer, K. W. Shirriff, and J. K. Ousterhout. Measurements of a Distributed File System. SIGOPS OSR, 25(5), [Ven02] A. Venkataramani, R. Kokku, and M. Dahlin. TCP Nice: A Mechanism for Background Transfers. In OSDI, [Kuz06] A. Kuzmanovic and E. W. Knightly. TCP-LP: Low-Priority Service via End-Point Congestion Control. IEEE/ACM TON, 14(4):739–752,