Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving larger group
Working Group Scope The Resource Management Working Group encompasses the areas of resource management, scheduling and accounting. This working group will focus on the following software components: Job Manager(/Queue Manager) Scheduler Allocation Manager (and accounting) Meta Scheduler
Proposed Component Architecture Job/Queue Manager Allocation Manager Collector Meta Scheduler Node Manager Process Manager Security System Information Service Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Config and Infrastructure
Proposed Component Architecture Scheduler PBS server PBS Mom Queue Manager Process Manager Collector Node Monitor Job Manager Job ManagementNode Management b a
Component Interaction Diagram Job submitted to Queue Manager User Interface Node Manager Meta Scheduler Job Manager Allocation Manager SchedulerProcess Manager
Component Interaction Trace Job submitted to Queue Manager 1.A user submits a job to the Queue Manager 2.The Queue Manager does a sanity balance check with the Bank 3.The Queue Manager notifies the Scheduler that a new job has arrived 4.The Scheduler queries node and job status until job can run 5.A bank reservation is made with the Allocation Manager 6.The Scheduler requests the Queue Manager to run the job 7.The Queue Manager passes job control to the Process Manager 8.The Process Manager notifies Queue Manager of job completion 9.The Queue Manager notifies Scheduler of job completion 10.A bank withdrawal is made with the Allocation Manager 11.The user is notified of job completion
General Progress Creation of XML marshaller/unmarshaller Establishment of CVS repository Prototype demonstration: Scheduler makes a deposit to allocation manager using XML interface
Scheduler Progress Creation of SSS Resource Manager interface (RMType SSS – half-open sockets) Creation of SSS Allocation Manager interface Creation of allocation manager and resource manager objects for management of arbitrary attributes Integration of XML marshaller/unmarshaller Maui enhancements to link with C++ libs (Xerxes) Additional regression tests
Meta Scheduler Progress Added support for data-staging interface Added support for network proximity optimization Initial support for checkpoint/restart –Checkpoint aware statistics –Checkpoint aware preemption optimizations Sqsub client created allowing PBS-style jobs to be submitted and metascheduled Initial work on translation library (PBS->silver & silver->RS2) Stability enhancements
Job Manager Progress Initial job manager specification defined Interacted with process manager working group and drafted specification proposals for task manager and node manager and how they will interact with RMWG components Initial study on PBS to determine viability of dissection possibilities and functionality enhancements
Allocation Manager Progress Draft requirements document underway XML schema version 0.3 reworked to have explicit request & response elements From scratch allocation manager being used as prototype to test XML interface Implemented create, query, modify and delete for user, account and membership objects (interacting with database over JDBC)
Allocation Manager Progress (contd) Stubbed in dummy withdrawal and successfully demo’d XML interface with scheduler (validating against schema) Logging, config files, error handling General purpose dcecp-like client allows output formatting by utilizing metadata from queries
Current Issues Job Manager/Queue Manager as separate or unified components How to split up PBS (if at all) and at what levels (if any) to refit with XML interface Working with Software Engineering Working Group to decide on test framework
Next Work All components under CVS Establish initial resource management interface specifications for release Scheduler demos by next face-to-face: –Scheduler to process manager (over XML) –Scheduler to node manager (over XML) –Scheduler to job manager (over XML) –Drive an end-to-end checkpoint request –Scheduler talks to registry and discovery service
Next Work Job manager/queue manager milestones –Submission client submits job to queue manager and queue manager reports status to user client –Scheduler implements query to obtain job info from queue manager –Scheduler starts a job (requires implementation of task manager interface) – also cancel job –No prolog, epilog initially. Batch only. Simple single- step jobs. Supports polling mode only. No data-staging.
Next Work Allocation manager –Completion of XML schema for remaining objects/services –Review of requirements (SDSC, NCSA …) –Complete (1 st draft of) initial requirements –Implement machine class, allocations, reservations, withdrawals, transaction register, simple charging algorithm
Issues requiring inter-group coordination Need to solidify SSS-wide standards for packaging, revision control, documentation, problem tracking, online project schedule… and establish mechanisms and places to home them.