Combating Outliers in map-reduce Srikanth Kandula Ganesh Ananthanarayanan , Albert Greenberg, Ion Stoica , Yi Lu, Bikas Saha , Ed Harris   1.

Slides:

Advertisements

Similar presentations

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST

Advertisements

You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…

1 Concurrency: Deadlock and Starvation Chapter 6.

© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.

Chapter 1 The Study of Body Function Image PowerPoint

Effective Change Detection Using Sampling Junghoo John Cho Alexandros Ntoulas UCLA.

UNITED NATIONS Shipment Details Report – January 2006.

Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.

Objectives To introduce software project management and to describe its distinctive characteristics To discuss project planning and the planning process.

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Title Subtitle.

Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.

Create an Application Title 1D - Dislocated Worker Chapter 9.

Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.

Year 6 mental test 5 second questions

Year 6 mental test 10 second questions

REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.

Tom Hamilton – America’s Channel Database CSE

Availability in Globally Distributed Storage Systems

HyLog: A High Performance Approach to Managing Disk Layout Wenguang Wang Yanping Zhao Rick Bunt Department of Computer Science University of Saskatchewan.

13 Copyright © 2005, Oracle. All rights reserved. Monitoring and Improving Performance.

Randomized Algorithms Randomized Algorithms CS648 1.

ABC Technology Project

Chapter 10: Virtual Memory

1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.

2 |SharePoint Saturday New York City

Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.

Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)

1 Adaptive Bandwidth Allocation in TDD-CDMA Systems Derek J Corbett & Prof. David Everitt The University of Sydney.

© 2012 National Heart Foundation of Australia. Slide 2.

While Loop Lesson CS1313 Spring while Loop Outline 1.while Loop Outline 2.while Loop Example #1 3.while Loop Example #2 4.while Loop Example #3.

1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.

Aggressive Cloning of Jobs for Effective Straggler Mitigation Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica.

Effective Straggler Mitigation: Attack of the Clones Ganesh Ananthanarayanan, Ali Ghodsi, Srikanth Kandula, Scott Shenker, Ion Stoica.

GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.

Addition 1’s to 20.

Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M

25 seconds left…...

We will resume in: 25 Minutes.

©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.

PSSA Preparation.

Choosing an Order for Joins

University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra

Lecture 14:Combating Outliers in MapReduce Clusters Xiaowei Yang.

New Opportunities for Load Balancing in Network-Wide Intrusion Detection Systems Victor Heorhiadi, Michael K. Reiter, Vyas Sekar UNC Chapel Hill UNC Chapel.

Effective Straggler Mitigation: Attack of the Clones [1]

LIBRA: Lightweight Data Skew Mitigation in MapReduce

Distributed Computations

Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc

Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)

Introduction to Hadoop and HDFS

MapReduce M/R slides adapted from those of Jeff Dean’s.

Department of Computer Science, UIUC Presented by: Muntasir Raihan Rahman and Anupam Das CS 525 Spring 2011 Advanced Distributed Systems Cloud Scheduling.

Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,

1 Making MapReduce Scheduling Effective in Erasure-Coded Storage Clusters Runhui Li and Patrick P. C. Lee The Chinese University of Hong Kong LANMAN’15.

 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.

PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,

PA an Coordinated Memory Caching for Parallel Jobs

MapReduce Simplied Data Processing on Large Clusters

Reining in the Outliers in MapReduce Jobs using Mantri

Cloud Computing MapReduce in Heterogeneous Environments

Presentation transcript:

Combating Outliers in map-reduce Srikanth Kandula Ganesh Ananthanarayanan , Albert Greenberg, Ion Stoica , Yi Lu, Bikas Saha , Ed Harris   1

log(size of dataset) GB 10 9 TB PB EB log(size of cluster) HPC, || databases mapreduce map-reduce decouples operations on data (user-code) from mechanisms to scale is widely used Cosmos (based on SVC’s Dryad) + Bing Google Hadoop inside Yahoo! and on Amazon’s Cloud (AWS) e.g., the Internet, click logs, bio/genomic data 2

Local write An Example How it Works: Goal Find frequent search queries to Bing SELECT Query, COUNT(*) AS Freq FROM QueryTable HAVING Freq > X What the user says: Read Map Reduce file block 0 job manager task output block 0 output block 1 file block 1 file block 2 file block 3 assign work, get progress 3

Outliers slow down map-reduce jobs Map.Read 22K Map.Move 15K Map 13K Reduce 51K Barrier File System Goals speeding up jobs improves productivity predictability supports SLAs … while using resources efficiently We find that: 4

This talk… Identify fundamental causes of outliers – concurrency leads to contention for resources – heterogeneity (e.g., disk loss rate) – map-reduce artifacts Current schemes duplicate long-running tasks Mantri: A cause-, resource-aware mitigation scheme takes distinct actions based on cause considers resource cost of actions Results from a production deployment 5

stragglers = Tasks that take  1.5 times the median task in that phase recomputes = Tasks that are re-run because their output was lost The median phase has 10% stragglers and no recomputes 10% of the stragglers take >10X longer The median phase has 10% stragglers and no recomputes 10% of the stragglers take >10X longer Why bother? Frequency of Outliers straggler Outlier 6

Why bother? Cost of outliers (what-if analysis, replays logs in a trace driven simulator) At median, jobs slowed down by 35% due to outliers 7

Delay due to a recompute readily cascades Why outliers? reduce sort Delay due to a recompute map Problem: Due to unavailable input, tasks have to be recomputed 8

Why outliers? (simple) Idea: Replicate intermediate data, use copy if original is unavailable Challenge(s) What data to replicate? Where? What if we still miss data? Insights: 50% of the recomputes are on 5% of machines Problem: Due to unavailable input, tasks have to be recomputed 9

Why outliers? t = predicted runtime of task r = predicted probability of recompute at machine t rep = cost to copy data over within rack M1M1 M2M2 t redo = r 2 (t 2 +t 1 redo ) Mantri preferentially acts on the more costly recomputes (simple) Idea: Replicate intermediate data, use copy if original is unavailable Challenge(s) What data to replicate? Where? What if we still miss data? Problem: Due to unavailable input, tasks have to be recomputed Insights: 50% of the recomputes are on 5% of machines cost to recompute vs. cost to replicate 10

Why outliers? Reduce task Map output uneven placement is typical in production reduce tasks are placed at first available slot Problem: Tasks reading input over the network experience variable congestion 11

Why outliers? Idea: Avoid hot-spots, keep traffic on a link proportional to bandwidth If rack i has d i map output and u i, v i bandwidths available on uplink and downlink, Place a i fraction of reduces such that: Challenge(s) Global co-ordination across jobs? Where is the congestion? Insights: local control is a good approximation (each job balances its traffic) link utilizations average out on the long term and are steady on the short term Problem: Tasks reading input over the network experience variable congestion 12

Persistently slow machines rarely cause outliers Cluster Software (Autopilot) quarantines persistently faulty machines Why outliers? 13

Solution: Ignoring these is better than the state-of-the-art! (duplicating) In an ideal world, we could divide work evenly… Problem: About 25% of outliers occur due to more dataToProcess Why outliers? We schedule tasks in descending order of dataToProcess Theorem [due to Graham, 1969] Doing so is no more than 33% worse than the optimal We schedule tasks in descending order of dataToProcess Theorem [due to Graham, 1969] Doing so is no more than 33% worse than the optimal 14

Why outliers? Problem: 25% outliers remain, likely due to Idea: Restart tasks elsewhere in the cluster Challenge(s) The earlier the better, but to restart outlier or start a pending task? (a) (b) (c) Running task Potential restart (t new ) now time t rem If predicted time is much better, kill original, restart elsewhere Else, if other tasks are pending, duplicate iff save both time and resource Else, (no pending work) duplicate iff expected savings are high Continuously, observe and kill wasteful copies If predicted time is much better, kill original, restart elsewhere Else, if other tasks are pending, duplicate iff save both time and resource Else, (no pending work) duplicate iff expected savings are high Continuously, observe and kill wasteful copies Save time and resources iff 15

Summary a)preferentially replicate costly-to-recompute tasks b)each job locally avoids network hot-spots c)quarantine persistently faulty machines d)schedule in descending order of data size e)restart or duplicate tasks, cognoscent of resource cost. Prune. (a) (b) (c) (d) (e) Theme: Cause-, Resource- aware action Explicit attempt to decouple solutions, partial success Theme: Cause-, Resource- aware action Explicit attempt to decouple solutions, partial success 16

Results Deployed in production cosmos clusters Prototype Jan’10  baking on pre-prod. clusters  release May’10 Trace driven simulations thousands of jobs mimic workflow, task runtime, data skew, failure prob. compare with existing schemes and idealized oracles 17

In production, restarts… improve on native cosmos by 25% while using fewer resources 18

Comparing jobs in the wild 340 jobs that each repeated at least five times during May (release) vs. Apr 1-30 (pre-release) CDF % cluster resources 19

In trace-replay simulations, restarts… are much better dealt with in a cause-, resource- aware manner CDF % cluster resources 20

Protecting against recomputes CDF % cluster resources 21

Outliers in map-reduce clusters are a significant problem happen due to many causes – interplay between storage, network and map-reduce cause-, resource- aware mitigation improves on prior art 22

Back-up 23

Network-aware Placement 24