Download presentation
Presentation is loading. Please wait.
Published byLinda Underwood Modified over 9 years ago
1
Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies for Parallel Processing (JSSPP) Workshop Jerry Chou 8/29/2005
2
Outline Background Queuing and Planning Systems Advanced Planning Functions Example: Computing Center Software Conclusion Discussion
3
Background HPC systems are operated by resource management systems (RMS) based on the queuing approach PBS, SGE, Loveleveler, etc… Grid middleware emerges between resource management systems and applications Globus, vgES, etc High level function (co-allocation) needs features from RMS Advanced reservation, quality of service It is hard to realize those features with RMS because it only consider present resource usage => This paper purpose planning system to close the gap
4
Big Picture Resources RMS (PBS) RMS (Loadleveler) RMS (SGE) RMS (Condor) Application Grid Middleware GlobusvgES Co-allocation QoS Advanced Reservation
5
Queuing and Planning Systems Queuing Systems Planning Systems Queuing vs. Planning Systems
6
Queuing Systems Queues have different limits on the resource requests Number of resources requested Execution time Interactive/Batch jobs Jobs are sorted by schedule policy in the queue The highest priority request is the queue head If more than one queue can be started, further criteria are needed, such as Queue priority If no queue head can be started, the idle resources may be utilized with backfilling
7
Planning Systems - Replanning Requested Start time Estimated run time When A new request is submitted A running request ends before it’s estimated end time How Delete all non-reservations from schedule Sort non-reservations according to schedule policy Arrange reservations into schedule Insert non-reservations in the schedule at the earliest possible start time
8
Queuing vs. Planning Systems QueuingPlanning Planning time framePresentPresent and Future Submission of resource requests Insert in queueReplanning Assignment of proposed start time NoAll requested Runtime estimatesNot necessaryYes ReservationNot possibleYes BackfillingOptionYes
9
Advanced Planning Functions Requesting Resources Dynamic Aspects Service Level Agreements
10
Requesting Resources Diffuse requests Give a range: “need 32~128 CPUs” Let RMS optimizes: “need as much nodes as possible” Negotiation
11
Dynamic Aspects Variable Reservations Make a reservation ASAP Different from reserved jobs: No fix start time Different from non-reserved jobs: Never planed later than its first planned start time Resource Reclaiming Replace requested resources at run time Automatic Duration Extension Extend the runtime of jobs while they are running How long can it be extended Hoe many time it can be extended
12
Dynamic Aspects (Cont.) Automatic Restart It can utilize short time slots in the scheduling Space sharing “Cycle Stealing” Run as a background job to steal resources in a space sharing system (like condor) Deployment Servers RMS plans both the requested resources and the time to reconfigure the hardware
13
Service Level Agreements (SLA) SLA has to be considered not only in the scheduling process but also during the runtime At runtime the scheduler is not responsible for measuring the fulfillment of the SLA, but to provide all granted resources
14
Computing Center Software (CCS) Architecture User Interface (UI): provide single access point to one or more systems Access Manager (AM): manages the user interface and is responsible for authentication, authorization and accounting Planning Manager (PM): plans the user requests onto the machine Machine Manager (MM): provides machine specific feature Island Manager (IM): provide CCS internal services and watchdog facilities to keep the island in a stable condition
15
Process Flow User: specify the expected duration of their requests MM: maps schedule to machines PM: re-plans the schedule Fix-time Request: request reserves resource for a given time Var-time Request: can move to a earlier time slot when replanning Requests Schedule Verify if a schedule can be realized with the available hardware. Can PM accept? No Yes Done Find alternative time Send conflict list to PM Conflict List No Yes
16
Conclusion Classify and compare queuing systems with planning systems Present possible advanced planning functionality The aim of the paper is to show the benefit of planning systems for managing HPC machines
17
Discussion Does planning system solve all the problem? What if most of jobs want to run ASAP What if runtime is not estimated precisely What’s the performance and utilization comparison between queuing systems and planning systems If you are resource provider, will you use it? What feature could be provided by vgES? Diffuse requests Resource reclaiming Variable reservation Negotiation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.