Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Case for Tiny Tasks in Compute Clusters Kay Ousterhout *, Aurojit Panda *, Joshua Rosen *, Shivaram Venkataraman *, Reynold Xin *, Sylvia Ratnasamy.

Similar presentations


Presentation on theme: "The Case for Tiny Tasks in Compute Clusters Kay Ousterhout *, Aurojit Panda *, Joshua Rosen *, Shivaram Venkataraman *, Reynold Xin *, Sylvia Ratnasamy."— Presentation transcript:

1 The Case for Tiny Tasks in Compute Clusters Kay Ousterhout *, Aurojit Panda *, Joshua Rosen *, Shivaram Venkataraman *, Reynold Xin *, Sylvia Ratnasamy *, Scott Shenker *+, Ion Stoica * * UC Berkeley, + ICSI

2 Setting … … … Tas k Map Reduce/Spark/Dr yad Job

3 Today’s tasks Tiny Tasks Use smaller tasks!

4 Why ? How ? Wher e?

5 Why ? How ? Wher e?

6 Problem: Skew and Stragglers Contended machine? Data skew?

7 Benefit: Handling of Skew and Stragglers Today’s tasks Tiny Tasks As much as 5.2x reduction in job completion time!

8 Problem: Batch and Interactive Sharing High priority interactive job arrives Low priority batch task Clusters forced to trade off utilization and responsiveness!

9 Benefit: Improved Sharing Today’s tasks Tiny Tasks High-priority tasks not subject to long wait times!

10 Benefits: Recap (1) Straggler mitigation (2) Improved sharing Mantri (OSDI ‘10) Scarlett (EuroSys ’11) SkewTune (SIGMOD ‘12) Dolly (NSDI ’13) … Quincy (SOSP ‘09) Amoeba (SOCC ’12) …

11 Why ? How ? Wher e?

12 Scheduling requirements: High Throughput Low Latency Distributed Scheduling (e.g., Sparrow Scheduler) Sched ule task (millions per second) (millisecon ds)

13 Use existing thread pool to launch tasks Launc h task Sched ule task

14 Use existing thread pool to launch tasks + Cache task binaries Task launch = RPC time (<1ms) Launc h task Sched ule task

15 Read input data Smallest efficient file block size: Distribute Metadata (à la Flat Datacenter Storage, OSDI ‘12) Launch task Sched ule task 8M B

16 Execute task + read data for next task Sched ule task …… Tons of tiny transfers! Framework- Controlled I/O (enables optimizations, e.g., pipelining) Read input data Launch task

17 How low can you go? Execute task + read data for next task Sched ule task 100’s of millisecon ds Read input data Launch task 8MB disk block

18 Why ? How ? Wher e?

19 Original Job Map Task 1 … Map Task 2 … 1 2 3 4 N … Map Task s Tiny Tasks Job Reduce Task 1 … Reduc e Tasks K1:  K2:  K3:  K5:  … K1:  K2:  Kn: 

20 Original Reduce Phase Tiny Tasks = ? Reduce Task 1 K1: 

21 Splitting Large Tasks Aggregation trees –Works for functions that are associative and commutative Framework-managed temporary state store Ultimately, need to allow a small number of large tasks

22 Tiny tasks mitigate stragglers + Improve sharing Distribu ted file metada ta Launch task in existing thread pool Distribu ted schedul ing Pipelined task execution Questions? Find me or Shivaram:

23 Backup Slides

24 5.2x at the 95 th percentile! Benefit of Eliminating Stragglers Based on Facebook Trace

25 Why Not Preemption? Preemption only handles sharing (not stragglers) Task migration is time consuming Tiny tasks improve fault tolerance

26 Dremel/Drill/Impala Similar goals and challenges (supporting short tasks) Dremel statically assigns tablets to machines; rebalances if query dispatcher notices that a machine is processing a tablet slowly  standard straggler mitigation Most jobs expected to be interactive (no sharing)

27 10,000 Machines 16 cores/machine 100 millisecond tasks Scheduling Throughput Over 1 million task scheduling decisions per second

28 Sparrow: Technique Place m tasks on the least loaded of d  m slaves Slave Schedu ler Job m = 2 tasks 4 probes (d = 2) More at tinyurl.com/sparrow-scheduler

29 Sparrow: Performance on TPC- H Workload Within 12% of offline optimal; median queuing delay of 8ms 29 More at tinyurl.com/sparrow-scheduler


Download ppt "The Case for Tiny Tasks in Compute Clusters Kay Ousterhout *, Aurojit Panda *, Joshua Rosen *, Shivaram Venkataraman *, Reynold Xin *, Sylvia Ratnasamy."

Similar presentations


Ads by Google