GRASS: Trimming Stragglers in Approximation Analytics

Slides:

Advertisements

Similar presentations

TAU Agent Team: Yishay Mansour Mariano Schain Tel Aviv University TAC-AA 2010.

Advertisements

Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.

Combating Outliers in map-reduce Srikanth Kandula Ganesh Ananthanarayanan , Albert Greenberg, Ion Stoica , Yi Lu, Bikas Saha , Ed Harris   1.

Aggressive Cloning of Jobs for Effective Straggler Mitigation Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica.

Effective Straggler Mitigation: Attack of the Clones Ganesh Ananthanarayanan, Ali Ghodsi, Srikanth Kandula, Scott Shenker, Ion Stoica.

Varys Efficient Coflow Scheduling Mosharaf Chowdhury,

GRASS: Trimming Stragglers in Approximation Analytics Ganesh Ananthanarayanan, Michael Hung, Xiaoqi Ren, Ion Stoica, Adam Wierman, Minlan Yu.

Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.

Effective Straggler Mitigation: Attack of the Clones [1]

Fast Algorithms For Hierarchical Range Histogram Constructions

Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.

1 Schedule Risk Assessment (SRA) Overview July 2013 NAVY CEVM.

UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.

Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Delay.

Three heuristics for transmission scheduling in sensor networks with multiple mobile sinks Damla Turgut and Lotzi Bölöni University of Central Florida.

Near-optimal Nonmyopic Value of Information in Graphical Models Andreas Krause, Carlos Guestrin Computer Science Department Carnegie Mellon University.

UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.

UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.

Scatter search for project scheduling with resource availability cost Jia-Xian Zhu.

The Power of Choice in Data-Aware Cluster Scheduling

VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.

Bold Stroke January 13, 2003 Advanced Algorithms CS 539/441 OR In Search Of Efficient General Solutions Joe Hoffert

Window-Based Greedy Contention Management for Transactional Memory Gokarna Sharma (LSU) Brett Estrade (Univ. of Houston) Costas Busch (LSU) 1DISC 2010.

임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.

Scheduling policies for real- time embedded systems.

Department of Computer Science, UIUC Presented by: Muntasir Raihan Rahman and Anupam Das CS 525 Spring 2011 Advanced Distributed Systems Cloud Scheduling.

Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica.

Reining in the Outliers in Map-Reduce Clusters using Mantri Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha,

MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.

Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Fair.

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

CSCI1600: Embedded and Real Time Software Lecture 24: Real Time Scheduling II Steven Reiss, Fall 2015.

Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao.

1 Low Latency Multimedia Broadcast in Multi-Rate Wireless Meshes Chun Tung Chou, Archan Misra Proc. 1st IEEE Workshop on Wireless Mesh Networks (WIMESH),

Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.

Multi-Resource Packing for Cluster Schedulers Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, Aditya Akella.

PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,

Scheduling Jobs Across Geo-distributed Datacenters Chien-Chun Hung, Leana Golubchik, Minlan Yu Department of Computer Science University of Southern California.

Embedded System Scheduling

Big Data Analytics with Parallel Jobs

R-Storm: Resource Aware Scheduling in Storm

Packing Tasks with Dependencies

Data Driven Resource Allocation for Distributed Learning

Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics

Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

Edinburgh Napier University

Monte-Carlo Planning:

Altruistic Scheduling in Multi-Resource Clusters

PA an Coordinated Memory Caching for Parallel Jobs

Measuring Service in Multi-Class Networks

Altruistic Scheduling in Multi-Resource Clusters

Feedback directed optimization in Compaq’s compilation tools for Alpha

Optimizing Expected Time Utility in Cyber-Physical Systems Schedulers

 Real-Time Scheduling via Reinforcement Learning

Is Dynamic Multi-Rate Worth the Effort?

Reining in the Outliers in MapReduce Jobs using Mantri

Richard Anderson Lecture 6 Greedy Algorithms

Cost-effective Outbreak Detection in Networks

 Real-Time Scheduling via Reinforcement Learning

Lecture 18 CSE 331 Oct 9, 2017.

Richard Anderson Lecture 7 Greedy Algorithms

Cloud Computing Large-scale Resource Management

Cloud Computing MapReduce in Heterogeneous Environments

Job-aware Scheduling in Eagle: Divide and Stick to Your Probes

CherryPick: Adaptively Unearthing the Best

What is The Optimal Number of Features

Greg Knowles ECE Fall 2004 Professor Yu Hu Hen

Tiresias A GPU Cluster Manager for Distributed Deep Learning

Presentation transcript:

GRASS: Trimming Stragglers in Approximation Analytics Ganesh Ananthanarayanan, Michael Hung, Xiaoqi Ren, Ion Stoica, Adam Wierman, Minlan Yu

Next Generation of Analytics Timely results, even if approximate Data deluge makes this necessary

Approximation Dimensions Deadline: Maximize accuracy within deadline “Pick the best ad to display within 2s” Error: Minimize time to get desired accuracy “#cars sold to the nearest thousand” Optimal Scheduler Improve accuracy by 48% Speedup by 40% *w.r.t. state-of-the-art schedulers (production workloads from Facebook and Bing)

Scheduling Challenge Prioritize tasks Straggler tasks Subset of tasks to complete #tasks » #slots (multi-waved jobs) (NP-Hard but many known heuristics…) Straggler tasks Slowest task can be 8x slower than median task Speculation: Spawn a duplicate, earliest wins Google[OSDI’04], FB[OSDI’08], Microsoft[OSDI’10]

Challenge: dynamically prioritize between speculative & unscheduled tasks to meet deadline/error bound

Is speculation worth the payoff? Opportunity Cost Speculative copies consume extra resources Slot 1 T1 Is speculation worth the payoff? Slot 2 T2 ST: deadline matters for prioritization Slot 3 T1 T3 time 5 9 10

Roadmap Two natural scheduling designs GRASS: Combining the two designs Evaluation of GRASS

Greedy Scheduling (GS) Greedily improve accuracy, i.e., earliest finishing task Accuracy = 7/9 Straggler Slot 1 T1 T1 Deadline = 6 Slot 2 T2 T3 T4 T5 T6 T7 time 1 6 (at time =1 ) Task ID T1 T2 T3 T4 T5 T6 T7 T8 T9 Time remaining 5 --- New copy 2 1 3

Resource Aware Scheduling (RAS) Speculate only if it saves time and resources Straggler Accuracy = 8/9 Slot 1 T1 T1 T4 T6 T7 Deadline = 6 Slot 2 T2 T1 T3 T5 T8 time 1 3 6 One copy for 5s (vs.) Two copies for 2s (at time =1 ) Task ID T1 T2 T3 T4 T5 T6 T7 T8 T9 Time remaining 5 --- New copy 2 1 3

Neither GS nor RAS is uniformly better GS vs. RAS Accuracy = 3/9 Accuracy = 7/9 Slot 1 T1 T1 GS Deadline = 6 Deadline = 3 Slot 2 T2 T3 T4 T5 T6 T7 time 1 3 Neither GS nor RAS is uniformly better 6 Accuracy = 8/9 Accuracy = 2/9 Slot 1 T1 T4 T6 T7 RAS Deadline = 6 Slot 2 T2 Deadline = 3 T1 T3 T5 T8 time 1 3 6

Intuition: Use RAS early in the job (be “conservative”), switch to GS towards the end (be “aggressive”) “time remaining”

Theoretical Scheduling Model Multi-waved scheduling of tasks Constant wave-width Agnostic to fairness policies Heavy-tailed (Pareto) distribution of task durations Speculation: GS, RAS, Switching, Optimal Theorem: Using RAS when >2 waves of tasks remain, and GS when ≤2 waves of tasks remain is “near-optimal”

How to estimate two remaining waves? Wave boundaries are not strict Non-uniform task durations Wave-width is not constant Start with RAS and switch to GS close to the deadline/error-bound

Learning the switching point RAS[4s]+GS[2s] RAS RAS RAS GS GS RAS[5s]+GS[1s] RAS[6s] 4 5 6 Deadline GS-only and RAS-only job samples “Exploration vs. Exploitation” Multi-armed bandit solution, ɛ = 0.1

GRASS (= GS + RAS) Scheduler Opportunity Cost in speculation for stragglers GS  Greedy Scheduling RAS  Resource Aware Scheduling Switch RASGS close to deadline/error-bound Learn switching point empirically from job samples Provably near-optimal in theoretical model

Implementation Hadoop 0.20.2 and Spark 0.7.3 Task Estimators Modified Fair Scheduler Job bins with GS-only and RAS-only samples Task Estimators Remaining time is extrapolated from data-to-process progress reports at 5% intervals New copy’s time is sampled from completed tasks

How well does GRASS perform? Workload from Facebook and Bing traces Hadoop and Dryad production jobs Added deadlines and error bounds Baselines: LATE & Mantri 200 node EC2 deployment (m2.2xlarge instances)

Accuracy of deadline-bound jobs improve by 47% Gains hold across deadlines (lenient and stringent )

GRASS is 22% better than statically picking GS or RAS … and is near-optimal

Error-bound Jobs Overall speedup of 38% (optimal is 40%) Gains hold across all error bounds Exact jobs (0% error-bound) speed up by 34% Unified Straggler Mitigation

Conclusion Next gen. of analytics: Approximate but timely results Challenge: Dynamic and unpredictable stragglers GRASS – Conservative speculation early in the job; aggressive towards its end Evaluation with Hadoop & Spark Accuracy of deadline-bound jobs improve by 47% Error-bound jobs speed up by 38%