1 Coscheduling in Clusters: Is it a Viable Alternative? Gyu Sang Choi, Jin-Ha Kim, Deniz Ersoz, Andy B. Yoo, Chita R. Das Presented by: Richard Huang
2 Outline Evaluation of scheduling alternatives Proposed HYBRID Coscheduling Evaluation Conclusions Discussion
3 Evaluation of Scheduling Alternatives Local Scheduling Processes of parallel job independently scheduled Batch Scheduling Most popular (Maui, PBS,etc.) Avoid memory swapping, but low utilization and high completion time Gang Scheduling All processes of job (gang) scheduled together for simultaneous execution Faster completion time, but global synchronization costs
4 Communication-Driven Coscheduling Dynamic Coscheduling (DCS) Uses incoming message to schedule processes for which messages are destined Spin Block (SB) Process waiting for message spins for fixed amount of time before blocking itself Periodic Boost (PB) Periodically boosts priority of process with un- consumed messages Co-ordinated Coscheduling (CC) Optimizes spinning time to improve performance at both sender and receiver
5 HYBRID Coscheduling Idea: Combines merits of both gang scheduling and communication-driven coscheduling Coschedule ALL processes like gang scheduler Boost process priority during communication phase Issues: How to differentiate between computation and communication phases? How to ensure fairness during boosting?
6 HYBRID Coscheduling Boost priority whenever parallel process enter collective communication phase Immediate blocking used at sender and receiver
7 Traditional and Generic Coscheduling Framework
8 Evaluation 16 node Linux cluster connected through 16- port Myrinet switch 100 mixed applications from NAS Two different job allocation PACKING: contiguous nodes assigned to a job to reduce system fragmentation and increase system utilization NO PACKING: parallel processes of job randomly allocated to available nodes in system
9 Performance Comparison
10 Observations Average performance gain for PACKING about 20% compared to NO PACKING Under high load, big differences due to waiting times Under light load, difference in execution time more pronounced Batch scheduler has lowest execution time, followed by HYBRID HYBRID has lowest completion time among all scheduling schemes
11 Explanations HYBRID avoids unnecessary spinning process immediately blocked if communication operation is not complete HYBRID reduces communication delay process wake up immediately upon receipt of message (since its priority boosted) HYBRID avoids interrupt overheads Frequent interrupts from NIC to CPU to boost process’s priority in CC, DCS, and PB HYBRID boosted only at beginning of an MPI collective communication HYBRID avoids global synchronization overhead like gang scheduling HYBRID follows implicit coscheduling
12 Other Results Communication-driven coscheduling should deploy memory aware allocator to avoid expensive disk activities Completing jobs faster can lead to energy savings by using dynamic voltage scaling or shutting down machines
13 Conclusions Can get significant performance improvement by using coscheduling mechanisms like HYBRID, SB, or CC Block-based scheduling techniques had better results because other processes in ready state can proceed HYBRID scheme is best performer and can be easily implemented on any platform with only modification in the message passing layer New techniques deployed on cluster should avoid expensive memory swapping Improved efficiency in scheduling algorithm can translate to better performance-energy ratio
14 Discussion Can it be true that blocking is always better than spinning? How likely is it to move away from batch scheduling in clusters and super computers? Do people try to save energy by improving scheduling algorithm?