Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 13-14, 2002
Resource Management and Accounting Working Group Working Group Scope and Components Progress over last quarter Current issues being worked Next steps Discussions involving larger group
Working Group Scope The Resource Management Working Group is involved in the areas of resource management, scheduling and accounting. This working group will focus on the following software components: Queue Manager Scheduler Allocation Manager (and accounting) Meta Scheduler Other critical resource management components are being developed in the Process Management and Monitoring Working Group: Process Manager Node Monitor
Proposed Component Architecture Queue Manager Allocation Manager Node Monitor Meta Scheduler Local Scheduler Node Manager Process Manager Security System Information Service Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Infrastructure Services
Resource Management Prototype Demonstration Queue Manager Allocation Manager Node Monitor Local Scheduler Process Manager Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Job Submission Client 1 Submit-Job 3 Query-Node 6 Exec-Process 4 Create-Reservation 2 Query-Job 5 Run-Job 8 Delete-Job 0 Service-Lookup 7 Query-Job 9 Withdraw-Allocation This demo runs a simple end-to-end test with a job being submitted running past it’s wallclock limit
General Progress Prototype components (Queue Manager and Allocation Manager) advanced to stage of responding to basic requests over XML protocol Existing components (Maui, PBS) partially modified to communicate to SSS components over XML We can run a job now completely in SSS protocol! Initial Requirements documents for Allocation Manager & Queue Manager drafted Began initial draft of Scalable Systems Software Resource Management and Accounting Protocol (SSSRMAP)
Scheduler Progress Developed own XML parser/builder Converted to internal use of XML (job checkpointing etc.) Logically separated Node Monitor & Queue Manager Iface Implemented and tested XML interface to Allocation Mgr to create reservations and make allocation withdrawals Implemented and tested XML interface to Queue Manager to query, start and cancel jobs Implemented and tested XML interface to Node Monitor to query nodes Modified scheduler clients to allow SSS-0.1 socket protocol interface modify checkjob output to display machine readable AVP data Progress on log-based job (resourceXduration and node-mapping) GUI
Meta Scheduler Progress Call Dave in AM or get from Brett
Queue Manager Progress Initial Queue-Manager server and clients supporting: job submission, job query, job deletion and job startup Queue manager and clients use XML over basic protocol Queue Manager supports challenge protocol for communications with the Process Manager Submission client submits job to queue manager and queue manager reports status to user client Test interaction with scheduler to return job information, start a job and cancel a job Job startup is supported via create-process commands with the process manager
Allocation Manager Progress Completed first draft of initial requirements Reviewed requirements/design of other existing project management software Implemented audit log Preservation of historical state (distinct from audit log – allows statement creation and time travel) Support for operators and conjunctions in queries Reworked class structure and schema to support dynamic extensibility of objects and attributes Implemented cached metadata dictionary (for dynamic web-GUIs and generic proxy handling of objects) Lot’s of work on the protocol
Current Issues How best to provide XML interface for PBS Working with Software Engineering Working Group to decide on test framework Seeking to clarify interaction with node manager Determining which component best suited to handle arbitrary batch-specific node features
Next Work Release initial resource management interface specification Incorporate security in RMA components All components under CVS Testing framework installed and first tests created for each component
Next Work Local Scheduler Test interaction with checkpoint/restart mechanisms when interfaces ready Lot’s of testing and write-up of new capabilities Certification of milestones (20% of bullet items ready to be checked off) Security integration Progress on graphical interfaces
Next Work Queue manager Documentation and packing for easy site configuration (nearly done) Implementation of a backside database connection to provide job queue persistence across restarts of the Queue manager Full challenge protocol support in clients and server QM Support for more advanced jobs and job prologue/epilogue, stdout/stderr handling.
Next Work Allocation manager Focus on getting QBank ready for bundling with SSS (security, use key, improved installation procedure) Focus effort on open source of new Allocation Manager (gold) Implement simple pricing engine Develop XML schema for external pricing Implementation of functional allocation, reservation mechanisms Security integration (gold)
Issues requiring inter-group discussion Framing mechanism Security protocol Need to solidify SSS-wide standards for packaging, testing, revision control, documentation standards, problem tracking, etc.