Effective Straggler Mitigation: Attack of the Clones Ganesh Ananthanarayanan, Ali Ghodsi, Srikanth Kandula, Scott Shenker, Ion Stoica.

Slides:



Advertisements
Similar presentations
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Advertisements

Combating Outliers in map-reduce Srikanth Kandula Ganesh Ananthanarayanan , Albert Greenberg, Ion Stoica , Yi Lu, Bikas Saha , Ed Harris   1.
Aggressive Cloning of Jobs for Effective Straggler Mitigation Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica.
Lecture 14:Combating Outliers in MapReduce Clusters Xiaowei Yang.
GRASS: Trimming Stragglers in Approximation Analytics Ganesh Ananthanarayanan, Michael Hung, Xiaoqi Ren, Ion Stoica, Adam Wierman, Minlan Yu.
SDN + Storage.
Effective Straggler Mitigation: Attack of the Clones [1]
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.
SkewTune: Mitigating Skew in MapReduce Applications
Discretized Streams Fault-Tolerant Streaming Computation at Scale Matei Zaharia, Tathagata Das (TD), Haoyuan (HY) Li, Timothy Hunter, Scott Shenker, Ion.
Discretized Streams An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker,
Disk-Locality in Datacenter Computing Considered Irrelevant Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica 1.
Predicting Execution Bottlenecks in Map-Reduce Clusters Edward Bortnikov, Ari Frank, Eshcar Hillel, Sriram Rao Presenting: Alex Shraer Yahoo! Labs.
Making Sense of Performance in Data Analytics Frameworks Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun.
Making Sense of Spark Performance
Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Delay.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Making Sense of Performance in Data Analytics Frameworks Kay Ousterhout Joint work with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun UC.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
Distributed Iterative Training Kevin Gimpel Shay Cohen Severin Hacker Noah A. Smith.
Improving MapReduce Performance Using Smart Speculative Execution Strategy Qi Chen, Cheng Liu, and Zhen Xiao Oct 2013 To appear in IEEE Transactions on.
The Power of Choice in Data-Aware Cluster Scheduling
Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End) 
The Case for Tiny Tasks in Compute Clusters Kay Ousterhout *, Aurojit Panda *, Joshua Rosen *, Shivaram Venkataraman *, Reynold Xin *, Sylvia Ratnasamy.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2
Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis [1] 4/24/2014 Presented by: Rakesh Kumar [1 ]
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Jockey Guaranteed Job Latency in Data Parallel Clusters Andrew Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.
MARISSA: MApReduce Implementation for Streaming Science Applications 作者 : Fadika, Z. ; Hartog, J. ; Govindaraju, M. ; Ramakrishnan, L. ; Gunter, D. ; Canon,
임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.
Department of Computer Science, UIUC Presented by: Muntasir Raihan Rahman and Anupam Das CS 525 Spring 2011 Advanced Distributed Systems Cloud Scheduling.
Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica.
Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,
Hadoop System simulation with Mumak Fei Dong, Tianyu Feng, Hong Zhang Dec 8, 2010.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Fair.
Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data ACM EuroSys 2013 (Best Paper Award)
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,
BIG DATA/ Hadoop Interview Questions.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Big Data Analytics with Parallel Jobs
GRASS: Trimming Stragglers in Approximation Analytics
Packing Tasks with Dependencies
Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le
An Open Source Project Commonly Used for Processing Big Data Sets
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
Edinburgh Napier University
PA an Coordinated Memory Caching for Parallel Jobs
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
湖南大学-信息科学与工程学院-计算机与科学系
Cse 344 May 4th – Map/Reduce.
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Reining in the Outliers in MapReduce Jobs using Mantri
Resource-Efficient and QoS-Aware Cluster Management
Cloud Computing MapReduce in Heterogeneous Environments
COS 518: Distributed Systems Lecture 11 Mike Freedman
Presentation transcript:

Effective Straggler Mitigation: Attack of the Clones Ganesh Ananthanarayanan, Ali Ghodsi, Srikanth Kandula, Scott Shenker, Ion Stoica

Small jobs increasingly important Most jobs are small – 82% of jobs contain less than 10 tasks (Facebook’s Hadoop cluster) Small jobs often are interactive and latency- constrained – Data analyst testing query on small sample – New frameworks targeted at interactive analyses

Stragglers in Small Jobs Small jobs particularly sensitive to stragglers – Inordinately slow tasks that delay job completion Straggler Mitigation: – Blacklisting: Clusters periodically diagnose and eliminate machines with faulty hardware – Speculation: LATE [OSDI’08], Mantri [OSDI’10]… Address the non-deterministic stragglers Complete systemic modeling is intrinsically complex

Despite the mitigation techniques…  LATE: The slowest task runs 8 times slower * than the median task  Mantri: The slowest task runs 6 times slower * than the median task (…but they work well for large jobs) * progress rate of a task = input-size/duration

State-of-the-art Straggler Mitigation Speculative Execution: 1.Wait: observe relative progress rates of tasks 2.Speculate: launch copies of tasks that are predicted to be stragglers

Why doesn’t this work for small jobs? 1.Consist of just a few tasks – Statistically hard to predict stragglers – Need to wait longer to accurately predict stragglers 2.Run all their tasks simultaneously – Waiting can constitute considerable fraction of a small job’s duration Wait & Speculate is ill-suited to address stragglers in small jobs

Cloning Jobs Proactively launch clones of a job, just as they are submitted Pick the result from the earliest clone Probabilistically mitigates stragglers Eschews waiting, speculation, causal analysis… Is this really feasible??

Heavy-tailed Distribution 90% of jobs use 6% of resources Can clone small jobs with few extra resources

Challenge: Avoid I/O contention  Every clone should get its own copy of data Input data of jobs – Replicated three times (typically) – Storage crunch: Cannot increase replication Intermediate data of jobs – Not replicated at all, to avoid overheads

Job Strawman: Job-level Cloning Earliest Easy to implement Directly extends to any framework M1 M2 R1 M1

Number of clones Contention for input data by map task clones Storage crunch  Cannot increase replication >> 3 clones (Map-only job)

Task-level Cloning Job Earliest M1 M2 R1 M2 Earliest

≤3 clones suffices Strawman Task-level Cloning

Intermediate Data Contention We would like every reduce clone to get its own copy of intermediate data (map output) When a map clones does not straggle, use its output When they do straggle?

R1 Contention-Avoidance Cloning (CAC) M1 M2 R1 M1 M2 Exclusive copy Jobs are more vulnerable to stragglers

Contention Cloning (CC) M1 M2 R1 M1 M2 Earliest copy Intermediate data transfer takes longer

CAC vs. CC CAC avoids contentions but makes jobs more vulnerable to stragglers – Straggler probability in a job increases by >10% CC mitigates stragglers in jobs but causes contentions – Shuffle takes ~50% longer Do not distinguish intrinsic variations in task durations from stragglers

Delay Assignment Small delay before contending for the available copy of the intermediate data – (Similar to delay scheduling [EuroSys’10]) Probabilistic modeling of the delay – Expected task durations – Read bandwidths w/ and w/o contention – Happens automatically and periodically

Dolly: Cloning Jobs Task-level cloning of jobs Delay Assignment to manage intermediate data Works within a budget – Cap on the extra cluster resources for cloning

Evaluation Setup Workload derived from Facebook traces – FB: 3500 node Hadoop cluster, 375K jobs, 1 month Prototype on top of Hadoop Experiments on 150-node cluster Baselines: LATE and Mantri, + blacklisting Cloning budget of 5%

Average job completion time Jobs are 44% and 42% faster w.r.t. LATE and Mantri Slowest task in a job now runs 1.06x times slower than median (down from 8x)

Delay Assignment is crucial… 1.5x – 2x better (Exclusive Copy) (Earliest Copy)

…and gets better with #phases in job Dryad jobs have multiple phases in a single job Steady gains, and outperforms CAC and CC

Summary Stragglers in small jobs are not well-handled by traditional mitigation strategies Dolly: Proactive Cloning of jobs – Heavy-tail  Small cloning budget (5%) suffices Jobs improve by at least 42% w.r.t. state-of- the-art straggler mitigation strategies