Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica
Outline The Spark scheduling bottleneck Sparrow’s fully distributed, fault-tolerant technique Sparrow’s near-optimal performance
Spark Today Worke r Spark Context User 1 User 2 User 3 Query Compilation Storage Scheduling
Spark Today Worke r Spark Context User 1 User 2 User 3 Query Compilation Storage Scheduling
Job Latencies Rapidly Decreasing 10 min. 10 sec. 100 ms 1 ms 2004: MapReduce batch job 2009: Hive query 2010: Dremel Query 2012: Impala query 2010: In- memory Spark query 2013: Spark streamin g
Job latencies rapidly decreasing
+ Spark deployments growing in size Scheduling bottleneck!
Spark scheduler throughput: 1500 tasks / second 1 second 100 ms 10 second Task Duration Cluster size (# 16-core machines)
Optimizing the Spark Scheduler 0.8: Monitoring code moved off critical path 0.8.1: Result deserialization moved off critical path Future improvements may yield 2-3x higher throughput
Is the scheduler the bottleneck in my cluster?
Worke r Cluster Scheduler Task launch Task completion
Worke r Cluster Scheduler Task launch Task completion
Worke r Cluster Scheduler Task launch Task completion Schedu ler delay
Spark Today Worke r Spark Context User 1 User 2 User 3 Query Compilation Storage Scheduling
Future Spark Worke r User 1 User 2 User 3 Scheduler Query compilation Scheduler Query compilation Scheduler Query compilation Benefits: High throughput Fault tolerance
Future Spark Worke r User 1 User 2 User 3 Scheduler Query compilation Scheduler Query compilation Scheduler Query compilation Storage: Tachy on
Scheduling with Sparrow Worke r Schedu ler Stage Worke r
Stage Batch Sampling Worke r Schedu ler Worke r Place m tasks on the least loaded of 2m workers 4 probes (d = 2)
Queue length poor predictor of wait time Work er 80 ms 155 ms 530 ms Poor performance on heterogeneous workloads
Stage Late Binding Worker Scheduler Worker Place m tasks on the least loaded of d m workers 4 probes (d = 2)
Late Binding Scheduler Place m tasks on the least loaded of d m workers 4 probes (d = 2) Worker Stage
Late Binding Scheduler Place m tasks on the least loaded of d m workers Worke r reques ts task Worker Stage
What about constraints?
Stage Per-Task Constraints Scheduler Worker Probe separately for each task
Technique Recap Scheduler Batch sampling + Late binding + Constraints Work er
How well does Sparrow perform?
How does Sparrow compare to Spark’s native scheduler? core EC2 nodes, 10 tasks/job, 10 schedulers, 80% load
TPC-H Queries: Background TPC-H: Common benchmark for analytics workloads Sparrow Spark Shark : SQL execution engine
TPC-H Queries core EC2 nodes, 10 schedulers, 80% load Percent iles 5 Within 12% of ideal Median queuing delay of 9ms
Policy Enforcement Worker High Priority Low Priority Worker User A (75%) User B (25%) Fair Shares Serve queues using weighted fair queuing Priorities Serve queues based on strict priorities
Weighted Fair Sharing
Fault Tolerance Schedul er 1 Schedul er 2 Spark Client 1 ✗ Spark Client 2 Timeout: 100ms Failover: 5ms Re-launch queries: 15ms
Making Sparrow feature- complete Interfacing with UI Delay scheduling Speculation
(2) Distributed, fault-tolerant scheduling with Sparrow Scheduler Work er (1) Diagnosing a Spark scheduling bottleneck