Lecture 14:Combating Outliers in MapReduce Clusters Xiaowei Yang.

Slides:



Advertisements
Similar presentations
Symantec 2010 Windows 7 Migration EMEA Results. Methodology Applied Research performed survey 1,360 enterprises worldwide SMBs and enterprises Cross-industry.
Advertisements

Symantec 2010 Windows 7 Migration Global Results.
Requirements Engineering Processes – 2
Variations of the Turing Machine
1 Concurrency: Deadlock and Starvation Chapter 6.
1
Technische Universität München + Hewlett Packard Laboratories Dynamic Workload Management for Very Large Data Warehouses Juggling Feathers and Bowling.
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
Effective Change Detection Using Sampling Junghoo John Cho Alexandros Ntoulas UCLA.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination. Introduction to the Business.
Objectives To introduce software project management and to describe its distinctive characteristics To discuss project planning and the planning process.
Chapter 5 Input/Output 5.1 Principles of I/O hardware
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
CS525: Special Topics in DBs Large-Scale Data Management
Red Tag Date 13/12/11 5S.
13 Copyright © 2005, Oracle. All rights reserved. Monitoring and Improving Performance.
PP Test Review Sections 6-1 to 6-6
1 Generating Network Topologies That Obey Power LawsPalmer/Steffan Carnegie Mellon Generating Network Topologies That Obey Power Laws Christopher R. Palmer.
1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
Combating Outliers in map-reduce Srikanth Kandula Ganesh Ananthanarayanan , Albert Greenberg, Ion Stoica , Yi Lu, Bikas Saha , Ed Harris   1.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.
1 Adaptive Bandwidth Allocation in TDD-CDMA Systems Derek J Corbett & Prof. David Everitt The University of Sydney.
1 Processes and Threads Chapter Processes 2.2 Threads 2.3 Interprocess communication 2.4 Classical IPC problems 2.5 Scheduling.
While Loop Lesson CS1313 Spring while Loop Outline 1.while Loop Outline 2.while Loop Example #1 3.while Loop Example #2 4.while Loop Example #3.
Aggressive Cloning of Jobs for Effective Straggler Mitigation Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica.
Effective Straggler Mitigation: Attack of the Clones Ganesh Ananthanarayanan, Ali Ghodsi, Srikanth Kandula, Scott Shenker, Ion Stoica.
Systems Analysis and Design in a Changing World, Fifth Edition
PSSA Preparation.
University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
1.step PMIT start + initial project data input Concept Concept.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
New Opportunities for Load Balancing in Network-Wide Intrusion Detection Systems Victor Heorhiadi, Michael K. Reiter, Vyas Sekar UNC Chapel Hill UNC Chapel.
GRASS: Trimming Stragglers in Approximation Analytics Ganesh Ananthanarayanan, Michael Hung, Xiaoqi Ren, Ion Stoica, Adam Wierman, Minlan Yu.
SDN + Storage.
Effective Straggler Mitigation: Attack of the Clones [1]
LIBRA: Lightweight Data Skew Mitigation in MapReduce
SkewTune: Mitigating Skew in MapReduce Applications
Distributed Computations
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Delay.
The Power of Choice in Data-Aware Cluster Scheduling
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop & Cheetah. Key words Cluster  data center – Lots of machines thousands Node  a server in a data center – Commodity device fails very easily Slot.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Introduction to Hadoop and HDFS
MapReduce M/R slides adapted from those of Jeff Dean’s.
Papers on Storage Systems 1) Purlieus: Locality-aware Resource Allocation for MapReduce in a Cloud, SC ) Making Cloud Intermediate Data Fault-Tolerant,
Department of Computer Science, UIUC Presented by: Muntasir Raihan Rahman and Anupam Das CS 525 Spring 2011 Advanced Distributed Systems Cloud Scheduling.
Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica.
Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.
Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can
PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,
CS239-Lecture 6 Performance Madan Musuvathi Visiting Professor, UCLA Principal Researcher, Microsoft Research.
PA an Coordinated Memory Caching for Parallel Jobs
MapReduce Simplied Data Processing on Large Clusters
湖南大学-信息科学与工程学院-计算机与科学系
February 26th – Map/Reduce
Cse 344 May 4th – Map/Reduce.
Reining in the Outliers in MapReduce Jobs using Mantri
Presentation transcript:

Lecture 14:Combating Outliers in MapReduce Clusters Xiaowei Yang

References: – Reining in the Outliers in Map-Reduce Clusters using Mantri by Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha, Edward Harris – us/UM/people/srikanth/data/Combating%20Outlier s%20in%20Map-Reduce.web.pptx

log(size of dataset) GB 10 9 TB PB EB log(size of cluster) HPC, || databases mapreduce MapReduce Decouples customized data operations from mechanisms to scale Is widely used Cosmos (based on SVC’s Dryad) + Bing Google Hadoop inside Yahoo! and on Amazon’s Cloud (AWS) e.g., the Internet, click logs, bio/genomic data 3

Local write An Example How it Works: Goal Find frequent search queries to Bing SELECT Query, COUNT(*) AS Freq FROM QueryTable HAVING Freq > X What the user says: Read Map Reduce file block 0 job manager task output block 0 output block 1 file block 1 file block 2 file block 3 assign work, get progress 4

Outliers slow down map-reduce jobs Map.Read 22K Map.Move 15K Map 13K Reduce 51K Barrier File System Goals Speeding up jobs improves productivity Predictability supports SLAs … while using resources efficiently We find that: 5

What is an outlier A phase (map or reduce) has n tasks and s slots (available compute resources) Every task takes T seconds to run t i = f (datasize, code, machine, network) Ideally run time = ceiling (n/s) * T A naïve scheduler Goal is to be closer to

From a phase to a job A job may have many phases An outlier in an early phase has a cumulative effect Data loss may cause multi-phase recompute  outliers

Delay due to a recompute readily cascades Why outliers? reduce sort Delay due to a recompute map Problem: Due to unavailable input, tasks have to be recomputed 8

Previous work The original MapReduce paper observed the problem But didn’t deal with it in depth Solution was to duplicate the slow tasks Drawbacks – Some may be unnecessary – Use extra resources – Placement may be the problem

Quantifying the Outlier Problem Approach: – Understanding the problem first before proposing solutions – Understanding often leads to solutions 1.Prevalence of outliers 2.Causes of outliers 3.Impact of outliers

stragglers = Tasks that take  1.5 times the median task in that phase recomputes = Tasks that are re-run because their output was lost 50% phases have 10% stragglers and no recomputes 10% of the stragglers take >10X longer 50% phases have 10% stragglers and no recomputes 10% of the stragglers take >10X longer Why bother? Frequency of Outliers straggler Outlier 11

Causes of outliners: data skew In 40% of the phases, all the tasks with high runtimes (>1.5x the median task) correspond to large amount of data over the network Duplicating will not help!

Non-outliers can be improved as well 20% of them are 55% longer than median

Reduce task Map output uneven placement is typical in production reduce tasks are placed at first available slot Problem: Tasks reading input over the network experience variable congestion 14

Causes of outliers: cross rack traffic 70% of cross track traffic is reduce traffic Tasks in a spot with slow network run slower Tasks compete network among themselves Reduce reads from every map Reduce is put into any spare slot 50% phases takes 62% longer to finish than ideal placement

Cause of outliers: bad and busy machines 50% of recomputes happen on 5% of the machines Recompute increases resource usage

Outliers cluster by time – Resource contention might be the cause Recomputes cluster by machines – Data loss may cause multiple recomputes

Why bother? Cost of outliers (what-if analysis, replays logs in a trace driven simulator) At median, jobs slowed down by 35% due to outliers 18

Mantri Design

High-level idea Cause aware, and resource aware Runtime = f (input, network, machine, datatoProcess, …) Fix each problem with different strategies

Resource-aware restarts Duplicate or kill long outliers

When to restart Every ∆ seconds, tasks report progress Estimate t rem and t new

γ= 3 Schedule a duplicate if the total running time is smaller P(c t rem > (c+1) t new ) > δ When there are available slots, restart if reduction time is more than restart time – E(t rem – t new ) > ρ ∆

Network Aware Placement Compute the rack location for each task Find the placement that minimizes the maximum data transfer time If rack i has d i map output and u i, v i bandwidths available on uplink and downlink, Place a i fraction of reduces such that:

Avoid recomputation Replicating the output – Restart a task if data are lost – Replicate the most costly job

Data-aware task ordering Outliers due to large input Schedule tasks in descending order of dataToProcess At most 33% worse than optimal scheduling

Estimation of t rem and t new d: input data size d read : the amount read

Estimation of t new processRate: estimated of all tasks in the phase locationFactor: machine performance d: input size

Results Deployed in production cosmos clusters Prototype Jan’10  baking on pre-prod. clusters  release May’10 Trace driven simulations thousands of jobs mimic workflow, task runtime, data skew, failure prob. compare with existing schemes and idealized oracles 29

Evaluation Methodology Mantri run on production clusters Baseline is results from Dryad Use trace-driven simulations to compare with other systems

Comparing jobs in the wild w/ and w/o Mantri for one month of jobs in Bing production cluster jobs that each repeated at least five times during May (release) vs. Apr 1-30 (pre-release)

In production, restarts… improve on native cosmos by 25% while using fewer resources 32

In trace-replay simulations, restarts… are much better dealt with in a cause-, resource- aware manner. Each job repeated thrice CDF % cluster resources 33

Network-aware Placement Equal: all links have the same bandwidth Start: same as the start Ideal: available bandwidth at run time 34

Protecting against recomputes CDF % cluster resources 35

Summary a)Reduce recomputation: preferentially replicate costly-to-recompute tasks b) Poor network: each job locally avoids network hot-spots c) Bad machines: quarantine persistently faulty machines d) DataToProcess: schedule in descending order of data size e)Others: restart or duplicate tasks, cognizant of resource cost. Prune

Conclusion Outliers in map-reduce clusters are a significant problem happen due to many causes – interplay between storage, network and map-reduce cause-, resource- aware mitigation improves on prior art 37