UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.

Slides:



Advertisements
Similar presentations
MapReduce: Simplified Data Processing on Large Cluster Jeffrey Dean and Sanjay Ghemawat OSDI 2004 Presented by Long Kai and Philbert Lin.
Advertisements

MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
Aggressive Cloning of Jobs for Effective Straggler Mitigation Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica.
The Datacenter Needs an Operating System Matei Zaharia, Benjamin Hindman, Andy Konwinski, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
MapReduce Online Veli Hasanov Fatih University.
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.
UC Berkeley Job Scheduling for MapReduce Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Scott Shenker, Ion Stoica 1 RAD Lab, * Facebook Inc.
C-Store: Data Management in the Cloud Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 5, 2009.
THE DATACENTER NEEDS AN OPERATING SYSTEM MATEI ZAHARIA, BENJAMIN HINDMAN, ANDY KONWINSKI, ALI GHODSI, ANTHONY JOSEPH, RANDY KATZ, SCOTT SHENKER, ION STOICA.
Discretized Streams An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker,
Charles Reiss *, Alexey Tumanov †, Gregory R. Ganger †, Randy H. Katz *, Michael A. Kozuch ‡ * UC Berkeley† CMU‡ Intel Labs.
Distributed Computations
Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Delay.
Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters Jiong Xie Ph.D. Student April 2010.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.
Improving MapReduce Performance Using Smart Speculative Execution Strategy Qi Chen, Cheng Liu, and Zhen Xiao Oct 2013 To appear in IEEE Transactions on.
Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End) 
Survey on Programming and Tasking in Cloud Computing Environments PhD Qualifying Exam Zhiqiang Ma Supervisor: Lin Gu Feb. 18, 2011.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
On Availability of Intermediate Data in Cloud Computations Steven Y. Ko, Imranul Hoque, Brian Cho, and Indranil Gupta Distributed Protocols Research Group.
A Platform for Fine-Grained Resource Sharing in the Data Center
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
MapReduce: Simplified Data Processing on Large Clusters 컴퓨터학과 김정수.
A Dynamic MapReduce Scheduler for Heterogeneous Workloads Chao Tian, Haojie Zhou, Yongqiang He,Li Zha 簡報人:碩資工一甲 董耀文.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Introduction to MapReduce ECE7610. The Age of Big-Data  Big-data age  Facebook collects 500 terabytes a day(2011)  Google collects 20000PB a day (2011)
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
MapReduce How to painlessly process terabytes of data.
MapReduce M/R slides adapted from those of Jeff Dean’s.
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.
Department of Computer Science, UIUC Presented by: Muntasir Raihan Rahman and Anupam Das CS 525 Spring 2011 Advanced Distributed Systems Cloud Scheduling.
Using Map-reduce to Support MPMD Peng
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Fair.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
MapReduce : Simplified Data Processing on Large Clusters P 謝光昱 P 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.
A Platform for Fine-Grained Resource Sharing in the Data Center
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI 11’ Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
Edinburgh Napier University
MapReduce Simplified Data Processing on Large Cluster
PA an Coordinated Memory Caching for Parallel Jobs
MapReduce Simplied Data Processing on Large Clusters
EECS 582 Final Review Mosharaf Chowdhury EECS 582 – F16.
湖南大学-信息科学与工程学院-计算机与科学系
Omega: flexible, scalable schedulers for large compute clusters
February 26th – Map/Reduce
Cse 344 May 4th – Map/Reduce.
COS 518: Advanced Computer Systems Lecture 14 Michael Freedman
Introduction to MapReduce
Cloud Computing Large-scale Resource Management
Cloud Computing MapReduce in Heterogeneous Environments
COS 518: Distributed Systems Lecture 11 Mike Freedman
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University of California at Berkeley

Motivation 1.MapReduce becoming popular –Open-source implementation, Hadoop, used by Yahoo!, Facebook, Last.fm, … –Scale: 20 PB/day at Google, O(10,000) nodes at Yahoo, 3000 jobs/day at Facebook

Motivation 2.Utility computing services like Amazon Elastic Compute Cloud (EC2) provide cheap on-demand computing –Price: 10 cents / VM / hour –Scale: thousands of VMs –Caveat: less control over performance

Results Main challenge for Hadoop on EC2 was performance heterogeneity, which breaks task scheduler assumptions Designed new LATE scheduler that can cut response time in half

Outline 1.MapReduce background 2.The problem of heterogeneity 3.LATE: a heterogeneity-aware scheduler

What is MapReduce? Programming model to split computations into independent parallel tasks Hides the complexity of fault tolerance –At 10,000’s of nodes, some will fail every day J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004.

Fault Tolerance in MapReduce 1.Nodes fail  re-run tasks Node 1 Node 2 How to do this in heterogeneous environment? 1. 2.Nodes slow (stragglers)  run backup tasks

Heterogeneity in Virtualized Environments VM technology isolates CPU and memory, but disk and network are shared –Full bandwidth when no contention –Equal shares when there is contention 2.5x performance difference

Backup Tasks in Hadoop’s Default Scheduler Start primary tasks, then look for backups to launch as nodes become free Tasks report “progress score” from 0 to 1 Launch backup if progress < avgProgress – 0.2

Problems in Heterogeneous Environment 1.Too many backups, thrashing shared resources like network bandwidth 2.Wrong tasks backed up 3.Backups may be placed on slow nodes 4.Breaks when tasks start at different times Example: ~80% of reduces backed up, most losing to originals; network thrashed

Idea: Progress Rates Instead of using progress values, compute progress rates, and back up tasks that are “far enough” below the mean Problem: can still select the wrong tasks

Progress Rate Example Time (min) Node 1 Node 2 Node 3 3x slower 1.9x slower 1 task/min 1 min2 min

Progress Rate Example Node 1 Node 2 Node 3 What if the job had 5 tasks? time left: 1.8 min 2 min Time (min) Node 2 is slowest, but should back up Node 3’s task! time left: 1 min

Our Scheduler: LATE Insight: back up the task with the largest estimated finish time –“Longest Approximate Time to End” –Look forward instead of looking backward Sanity thresholds: –Cap number of backup tasks –Launch backups on fast nodes –Only back up tasks that are sufficiently slow

LATE Details Estimating finish times: Threshold values: –10% cap on backups, 25 th percentiles for slow node/task –Validated by sensitivity analysis progress score execution time progress rate = 1 – progress score progress rate estimated time left =

LATE Example Node 1 Node 2 Node 3 2 min Time (min) Progress = 5.3% Estimated time left: (1-0.66) / (1/3) = 1 Estimated time left: (1-0.05) / (1/1.9) = 1.8 Progress = 66% LATE correctly picks Node 3

Evaluation Environments: –EC2 (3 job types, nodes) –Small local testbed Self-contention through VM placement Stragglers through background processes

EC2 Sort with Stragglers Average 58% speedup over native, 220% over no backups 93% max speedup over native

EC2 Sort without Stragglers Average 27% speedup over native, 31% over no backups

Conclusion Heterogeneity is a challenge for parallel apps, and is growing more important Lessons: –Back up tasks which hurt response time most –Mind shared resources 2x improvement using simple algorithm

Questions? ? ? ? ? ? ?

EC2 Grep and Wordcount Grep WordCount 36% gain over native 57% gain over no backups 8.5% gain over native 179% gain over no backups