Aggressive Cloning of Jobs for Effective Straggler Mitigation Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica.

Slides:



Advertisements
Similar presentations
1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.
Advertisements

Evaluating the Cost-Benefit of Using Cloud Computing to Extend the Capacity of Clusters Presenter: Xiaoyu Sun.
Combating Outliers in map-reduce Srikanth Kandula Ganesh Ananthanarayanan , Albert Greenberg, Ion Stoica , Yi Lu, Bikas Saha , Ed Harris   1.
Effective Straggler Mitigation: Attack of the Clones Ganesh Ananthanarayanan, Ali Ghodsi, Srikanth Kandula, Scott Shenker, Ion Stoica.
The Datacenter Needs an Operating System Matei Zaharia, Benjamin Hindman, Andy Konwinski, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica.
University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra
Lecture 14:Combating Outliers in MapReduce Clusters Xiaowei Yang.
GRASS: Trimming Stragglers in Approximation Analytics Ganesh Ananthanarayanan, Michael Hung, Xiaoqi Ren, Ion Stoica, Adam Wierman, Minlan Yu.
SDN + Storage.
Effective Straggler Mitigation: Attack of the Clones [1]
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
SLA-Oriented Resource Provisioning for Cloud Computing
SkewTune: Mitigating Skew in MapReduce Applications
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Discretized Streams Fault-Tolerant Streaming Computation at Scale Matei Zaharia, Tathagata Das (TD), Haoyuan (HY) Li, Timothy Hunter, Scott Shenker, Ion.
Discretized Streams An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker,
Disk-Locality in Datacenter Computing Considered Irrelevant Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica 1.
Predicting Execution Bottlenecks in Map-Reduce Clusters Edward Bortnikov, Ari Frank, Eshcar Hillel, Sriram Rao Presenting: Alex Shraer Yahoo! Labs.
HadoopDB An Architectural Hybrid of Map Reduce and DBMS Technologies for Analytical Workloads Presented By: Wen Zhang and Shawn Holbrook.
Making Sense of Performance in Data Analytics Frameworks Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Delay.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Making Sense of Performance in Data Analytics Frameworks Kay Ousterhout Joint work with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun UC.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Improving MapReduce Performance Using Smart Speculative Execution Strategy Qi Chen, Cheng Liu, and Zhen Xiao Oct 2013 To appear in IEEE Transactions on.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
The Power of Choice in Data-Aware Cluster Scheduling
Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End) 
The Case for Tiny Tasks in Compute Clusters Kay Ousterhout *, Aurojit Panda *, Joshua Rosen *, Shivaram Venkataraman *, Reynold Xin *, Sylvia Ratnasamy.
New Challenges in Cloud Datacenter Monitoring and Management
Jeffrey D. Ullman Stanford University.  Mining of Massive Datasets, J. Leskovec, A. Rajaraman, J. D. Ullman.  Available for free download at i.stanford.edu/~ullman/mmds.html.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.
임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.
Department of Computer Science, UIUC Presented by: Muntasir Raihan Rahman and Anupam Das CS 525 Spring 2011 Advanced Distributed Systems Cloud Scheduling.
Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,
GreenSched: An Energy-Aware Hadoop Workflow Scheduler
Introduction to Search Engines Technology CS Technion, Winter 2013 Amit Gross Some slides are courtesy of: Edward Bortnikov & Ronny Lempel, Yahoo!
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Fair.
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can
Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Using Map-reduce to Support MPMD Peng
Scientific days, June 16 th & 17 th, 2014 This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX ) funded by the French program.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Big Data Analytics with Parallel Jobs
GRASS: Trimming Stragglers in Approximation Analytics
Hadoop Aakash Kag What Why How 1.
Tao Zhu1,2, Chengchun Shu1, Haiyan Yu1
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
Edinburgh Napier University
PA an Coordinated Memory Caching for Parallel Jobs
湖南大学-信息科学与工程学院-计算机与科学系
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Reining in the Outliers in MapReduce Jobs using Mantri
Resource-Efficient and QoS-Aware Cluster Management
Cloud Computing Large-scale Resource Management
Cloud Computing MapReduce in Heterogeneous Environments
Presentation transcript:

Aggressive Cloning of Jobs for Effective Straggler Mitigation Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica

Small jobs increasingly important Most jobs are small – 82% of jobs contain less than 10 tasks (Facebook’s Hadoop cluster) Most small jobs are interactive and latency- constrained – Data analyst testing query on small sample Small jobs particularly sensitive to stragglers

Straggler Mitigation Blacklisting: – Clusters periodically diagnose and eliminate machines with faulty hardware Speculation: Non-deterministic stragglers – Complete systemic modeling is intrinsically complex [e.g., Dean’12 at Google] – LATE [OSDI’08], Mantri [OSDI’10]…

Despite the mitigation techniques…  LATE: The slowest task runs 8 times slower than the median task  Mantri: The slowest task runs 6 times slower than the median task (…but they work well for large jobs)

State-of-the-Art Straggler Mitigation Speculative Execution: (in LATE, Mantri, MapReduce) 1.Wait: observe relative progress rates of tasks 2.Speculate: launch copies of tasks that are predicted to be stragglers

Why doesn’t this work for small jobs? 1.Consist of just a few tasks – Statistically hard to predict stragglers – Need to wait longer to accurately predict stragglers 2.Run all their tasks simultaneously – Waiting can constitute considerable fraction of a small job’s duration Wait & Speculate is ill-suited to address stragglers in small jobs

Cloning Jobs Proactively launch clones of a job, just as they are submitted Pick the result from the earliest clone Probabilistically mitigates stragglers Eschews waiting, speculation, causal analysis… Is this really feasible??

Low Cluster Utilization Clusters have median utilization of under 20% – Provisioned for (short burst of) peak utilization Cluster energy-efficiency proposals – Not adopted in today’s clusters!  – Peak utilization decides half the energy bill – Hardware and software reliability issues…

Power-law exponent = 1.9 Tragedy of commons? Power-law:  90% of jobs use 6% of resources  FB, Bing, Yahoo! Can clone small jobs with few extra resources If every job utilizes the “lowly utilized” cluster… – Instability and negative performance effects

Job Strawman Earliest Easy to implement Directly extends to any framework M1 M2 R1 M1

Number of map clones Contention for input data by map task clones Storage crunch  Cannot increase replication >> 3 clones

Task-level Cloning Job Earliest M1 M2 R1 M2 Earliest

≤3 clones suffices Strawman Task-level Cloning

Dolly: Cloning Jobs Task-level cloning of jobs Works within a budget – Cap on the extra cluster resources for cloning

Evaluation Workload derived from Facebook traces – FB: 3500 node Hadoop cluster, 375K jobs, 1 month Trace-driven simulator Baselines: LATE and Mantri, + blacklisting Cloning budget of 5%

Baseline: LATE Small jobs benefit significantly! Average completion time improves by 44%

Baseline: Mantri Small jobs benefit significantly! Average completion time improves by 42%

Intermediate Data Contention We would like every reduce clone to get its own copy of intermediate data (map output) Not replicated, to avoid overheads What if a map clone straggles?

Intermediate Data Contention M1 M2 R1 M1 M2 Wait for exclusive copy or contend for the available copy?

Conclusion Stragglers in small jobs are not well-handled by traditional mitigation strategies – Guessing task to speculate very hard, waiting wastes significant computation time Dolly: Proactive Cloning of jobs – Power-law  Small cloning budget (5%) suffices – Jobs improve by at least 42% w.r.t. state-of-the-art straggler mitigation strategies Low utilization + Power-law + Cloning?