Job-aware Scheduling in Eagle: Divide and Stick to Your Probes

Job-aware Scheduling in Eagle: Divide and Stick to Your Probes
Pamela Delgado, Diego Didona, Florin Dinu, Willy Zwaenepoel

I. Data-center scheduling
cluster Job 1 task … task scheduler … … The context of this presentation is data center scheduling Job N task … task  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

I. Data-center scheduling challenges
Heterogeneous workloads Short vs long tasks Problem: Head-of-line blocking (short behind long) Short Long Short Short In data-center scheduling we face some challenges combination of tasks that have a long execution time and tasks with short execution time for the purpose of this talk if a job has short tasks we call it short  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

I. Data-center scheduling challenges
Scheduler induced stragglers Problem: Non job-aware scheduling Large scale task 1 Job completion time … task n task x time cluster In this case one task finishes later than others, this leads to BAD job completion time schedulers schedule at the task level, this leads to non job-aware scheduling Scale: both in terms of cluster size and terms of load Tens of thousands tasks/second … Tens of thousands …  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

II. Eagle Contributions Divide: Stick to Your Probes: Hybrid scheduler
Novel technique to avoid head-of-line blocking Stick to Your Probes: Decentralized job-awareness Hybrid scheduler On top of Hybrid Scheduler to have necessary scalability so what is hybrid scheduling? hybrid means a mix of centralized/distributed how does it work  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

I. Hybrid scheduling: long centralized L L L L L L L L L centralized
scheduler L L L L L L L … L L

I. Hybrid scheduling: short distributed L L L L L L distributed
scheduler distributed scheduler … s probe probe not use late binding L L L L … L L

II.1. Problem: Head-of-line blocking
Short behind long High likelihood (long = many resources) Long A short task is enqueued behind a long task (either in the queue or running) Short Short Short head of queue  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

II.1. Rationale for Divide
Expected completion time of a task proportional to variance of task execution times* DIVIDE by execution time Long Long Short Short Short *Pollaczek-Khinchine formula: Theory Vol1, Queueing Systems. L. Kleinroch 1975  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

II.1. Dynamic division Long Long Long … Short Short Short Short Short
 Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

Succinct State Sharing
II.1. Eagle – Divide IDEA: Dynamic partitioning Succinct State Sharing * Centralized: send bitmap of nodes with long tasks * Distributed: based on bitmap avoid  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

II.1. Eagle – Divide L L L reject L L L L L L distributed distributed
scheduler distributed scheduler centralized scheduler … L L L reject L L c L L … L L reschedule  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

II.1. Eagle – Divide No head-of-line blocking
Dynamic: mitigate resource wastage Scalable: no burden on centralized Succinct: bitmap Because its dynamic we mitigate  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

II.2. Problem: stragglers
distributed scheduler task 1 task 2 Task waiting to execute! probe Completely distributed schedulers like in Hawk, Sparrow, Tarcil, send random probes to n1 n2 n3 n4 Node free!  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

II.2. Rationale Expected completion time of a job inversely proportional to number of jobs* Better finish one job entirely than to execute many jobs partially Expected completion time of a job is inversely proportional to the number of jobs present in the system Job 1 Job N task … task … task … task *Little’s formula: A proof for the queueing formula: L=𝜆𝑤. J.D.C. Little 1961  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

II.2. Eagle - Stick to Your Probes
IDEA: Get a job out of the system ASAP Sticky Batch Probing * Probe STICKS to a node. * Probe can execute more tasks.  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

II.2. Eagle - Stick to Your Probes
distributed scheduler task 1 task 2 probe Probe STICKS there! n1 n2 n3 n4  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

II.2. Eagle – Stick to Your Probes
Job-awareness Straggler mitigation Decentralized end on a high note  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

II. Eagle – Recap Divide Stick to your probes Hybrid scheduler
dynamically divide nodes for short/long tasks Stick to your probes probe sticks to the node able to execute more tasks Hybrid scheduler Queue reorder: Shortest Remaining Processing Time (SRPT) Related work has shown the advantages of queue reordering  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

III. Evaluation - simulation
Event-driven simulator Google trace – half a million jobs 15000 – nodes Measure: Job running time Report short jobs 50th, 90th and 99th percentiles  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

III.A. Hawk Hybrid scheduler Work stealing
 free nodes steal tasks from another  try to avoid head-of-line blocking But this will not really avoid the head of line blocking as we will see  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

Better across the board
III.A. Eagle vs Hawk Short job running times lower better Better across the board We show only short jobs because long jobs are scheduled in the same LWL fashion in both systems  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

III.A. Eagle vs Hawk none some   Why are we better? Eagle Hawk
Avoids head-of-line blocking none some Job-aware scheduler   Queue reordering Partitioning + stealing  do not get rid of all short behind long Stealing randomized  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

III.B. State-of-the-art (SOTA)
[Apollo+] Schedule all jobs in Least Work Left (LWL) [Apollo+] Distributed: waiting times updated at heartbeat interval Google: 3 [s] [Yaq-d*] Queue reordering SRPT +Apollo: Scalable and coordinated scheduling for cloud-scale computing. E. Boutin et.al.OSDI'14 *Efficient queue management for cluster scheduling. J. Rasley et.al. EuroSys'16  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

Better across the board
III.B. Eagle vs SOTA Short job running times lower better Better across the board Better at higher loads The same at lower loads Lower Higher  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

III.B. Eagle vs SOTA Why are we better?
Eagle: more flexible task assignment SOTA: task assigned to one node SOTA heartbeats: stale information SOTA: concurrent scheduling  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

III. Evaluation - Implementation
Spark plug-in 100-node cluster Subset of Google trace Measure job running time Report short jobs 50th, 90th and 99th percentiles Compare to Hawk We don’t have availability for the other system  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

III. Evaluation - Implementation
Subset of Google trace lower better Eagle works well in a real cluster Better at higher loads The same at lower loads  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

IV. Conclusion Eagle new techniques Succinct State Sharing (Divide)
No head-of-line blocking Sticky Batch Probing (Stick to Your Probes) Job-aware Two new techniques to improve scheduling of data-parallel jobs in data centers SSS : dynamically divide nodes into partitions long/short SBP: a probe sticks until job is done  Introduction  Eagle: Divide  Eagle: Stick to Your Probes  Evaluation  Conclusion

Job-aware Scheduling in Eagle: Divide and Stick to Your Probes

Similar presentations

Presentation on theme: "Job-aware Scheduling in Eagle: Divide and Stick to Your Probes"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Job-aware Scheduling in Eagle: Divide and Stick to Your Probes

Similar presentations

Presentation on theme: "Job-aware Scheduling in Eagle: Divide and Stick to Your Probes"— Presentation transcript:

Similar presentations

About project

Feedback