The Case for Tiny Tasks in Compute Clusters Kay Ousterhout *, Aurojit Panda *, Joshua Rosen *, Shivaram Venkataraman *, Reynold Xin *, Sylvia Ratnasamy.

Slides:



Advertisements
Similar presentations
Sweet Storage SLOs with Frosting Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Ion Stoica, Randy Katz.
Advertisements

MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Aggressive Cloning of Jobs for Effective Straggler Mitigation Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica.
Effective Straggler Mitigation: Attack of the Clones Ganesh Ananthanarayanan, Ali Ghodsi, Srikanth Kandula, Scott Shenker, Ion Stoica.
SDN + Storage.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
MapReduce Online Veli Hasanov Fatih University.
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.
SkewTune: Mitigating Skew in MapReduce Applications
UC Berkeley Job Scheduling for MapReduce Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Scott Shenker, Ion Stoica 1 RAD Lab, * Facebook Inc.
Spark: Cluster Computing with Working Sets
Data-Intensive Computing with MapReduce/Pig Pramod Bhatotia MPI-SWS Distributed Systems – Winter Semester 2014.
Discretized Streams Fault-Tolerant Streaming Computation at Scale Matei Zaharia, Tathagata Das (TD), Haoyuan (HY) Li, Timothy Hunter, Scott Shenker, Ion.
Enabling High-level SLOs on Shared Storage Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy Katz, Ion Stoica Cake 1.
Discretized Streams An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker,
Disk-Locality in Datacenter Computing Considered Irrelevant Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica 1.
MapReduce Online Tyson Condie and Neil Conway UC Berkeley Joint work with Peter Alvaro, Rusty Sears, Khaled Elmeleegy (Yahoo! Research), and Joe Hellerstein.
Making Sense of Performance in Data Analytics Frameworks Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun.
Making Sense of Spark Performance
Distributed Computations
CS533 Concepts of Operating Systems Class 5 Event-Based Systems.
Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Delay.
CS533 Concepts of Operating Systems Class 5 Event-Based Systems.
Making Sense of Performance in Data Analytics Frameworks Kay Ousterhout Joint work with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun UC.
Distributed Computations MapReduce
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
The Power of Choice in Data-Aware Cluster Scheduling
Outline | Motivation| Design | Results| Status| Future
Distributed Low-Latency Scheduling
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
MapReduce M/R slides adapted from those of Jeff Dean’s.
The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.
임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.
Introduction to Search Engines Technology CS Technion, Winter 2013 Amit Gross Some slides are courtesy of: Edward Bortnikov & Ronny Lempel, Yahoo!
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Fair.
Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica.
1 CS : Project Suggestions Ion Stoica and Ali Ghodsi ( September 14, 2015.
DISTRIBUTED COMPUTING
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
A Platform for Fine-Grained Resource Sharing in the Data Center
Sameer Agarwal, Aurojit Panda, Barzan Moxafari Samuel Madden, Ion Stoica.
6.888 Lecture 8: Networking for Data Analytics Mohammad Alizadeh Spring  Many thanks to Mosharaf Chowdhury (Michigan) and Kay Ousterhout (Berkeley)
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,
Percolator: Incrementally Indexing the Web OSDI’10.
Re-Architecting Apache Spark for Performance Understandability Kay Ousterhout Joint work with Christopher Canel, Max Wolffe, Sylvia Ratnasamy, Scott Shenker.
Big Data Analytics with Parallel Jobs
Map reduce Cs 595 Lecture 11.
Optimizing Distributed Actor Systems for Dynamic Interactive Services
Processes and threads.
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
Networks and Operating Systems: Exercise Session 2
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
Kay Ousterhout, Christopher Canel, Sylvia Ratnasamy, Scott Shenker
Intro to Processes CSSE 332 Operating Systems
PA an Coordinated Memory Caching for Parallel Jobs
湖南大学-信息科学与工程学院-计算机与科学系
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Resource-Efficient and QoS-Aware Cluster Management
CS533 Concepts of Operating Systems Class 7
Cloud Computing Large-scale Resource Management
Hawk: Hybrid Datacenter Scheduling
CS : Project Suggestions
Operating System Overview
COS 518: Distributed Systems Lecture 11 Mike Freedman
Presentation transcript:

The Case for Tiny Tasks in Compute Clusters Kay Ousterhout *, Aurojit Panda *, Joshua Rosen *, Shivaram Venkataraman *, Reynold Xin *, Sylvia Ratnasamy *, Scott Shenker *+, Ion Stoica * * UC Berkeley, + ICSI

Setting … … … Tas k Map Reduce/Spark/Dr yad Job

Today’s tasks Tiny Tasks Use smaller tasks!

Why ? How ? Wher e?

Why ? How ? Wher e?

Problem: Skew and Stragglers Contended machine? Data skew?

Benefit: Handling of Skew and Stragglers Today’s tasks Tiny Tasks As much as 5.2x reduction in job completion time!

Problem: Batch and Interactive Sharing High priority interactive job arrives Low priority batch task Clusters forced to trade off utilization and responsiveness!

Benefit: Improved Sharing Today’s tasks Tiny Tasks High-priority tasks not subject to long wait times!

Benefits: Recap (1) Straggler mitigation (2) Improved sharing Mantri (OSDI ‘10) Scarlett (EuroSys ’11) SkewTune (SIGMOD ‘12) Dolly (NSDI ’13) … Quincy (SOSP ‘09) Amoeba (SOCC ’12) …

Why ? How ? Wher e?

Scheduling requirements: High Throughput Low Latency Distributed Scheduling (e.g., Sparrow Scheduler) Sched ule task (millions per second) (millisecon ds)

Use existing thread pool to launch tasks Launc h task Sched ule task

Use existing thread pool to launch tasks + Cache task binaries Task launch = RPC time (<1ms) Launc h task Sched ule task

Read input data Smallest efficient file block size: Distribute Metadata (à la Flat Datacenter Storage, OSDI ‘12) Launch task Sched ule task 8M B

Execute task + read data for next task Sched ule task …… Tons of tiny transfers! Framework- Controlled I/O (enables optimizations, e.g., pipelining) Read input data Launch task

How low can you go? Execute task + read data for next task Sched ule task 100’s of millisecon ds Read input data Launch task 8MB disk block

Why ? How ? Wher e?

Original Job Map Task 1 … Map Task 2 … N … Map Task s Tiny Tasks Job Reduce Task 1 … Reduc e Tasks K1:  K2:  K3:  K5:  … K1:  K2:  Kn: 

Original Reduce Phase Tiny Tasks = ? Reduce Task 1 K1: 

Splitting Large Tasks Aggregation trees –Works for functions that are associative and commutative Framework-managed temporary state store Ultimately, need to allow a small number of large tasks

Tiny tasks mitigate stragglers + Improve sharing Distribu ted file metada ta Launch task in existing thread pool Distribu ted schedul ing Pipelined task execution Questions? Find me or Shivaram:

Backup Slides

5.2x at the 95 th percentile! Benefit of Eliminating Stragglers Based on Facebook Trace

Why Not Preemption? Preemption only handles sharing (not stragglers) Task migration is time consuming Tiny tasks improve fault tolerance

Dremel/Drill/Impala Similar goals and challenges (supporting short tasks) Dremel statically assigns tablets to machines; rebalances if query dispatcher notices that a machine is processing a tablet slowly  standard straggler mitigation Most jobs expected to be interactive (no sharing)

10,000 Machines 16 cores/machine 100 millisecond tasks Scheduling Throughput Over 1 million task scheduling decisions per second

Sparrow: Technique Place m tasks on the least loaded of d  m slaves Slave Schedu ler Job m = 2 tasks 4 probes (d = 2) More at tinyurl.com/sparrow-scheduler

Sparrow: Performance on TPC- H Workload Within 12% of offline optimal; median queuing delay of 8ms 29 More at tinyurl.com/sparrow-scheduler