Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting September 11-12, 2003 Washington D.C.

Similar presentations


Presentation on theme: "Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting September 11-12, 2003 Washington D.C."— Presentation transcript:

1 Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting September 11-12, 2003 Washington D.C.

2 Resource Management and Accounting Working Group Working group scope Progress over last quarter Next steps Topics for group consideration

3 Working Group Scope The Resource Management Working Group is involved in the areas of resource management, scheduling and accounting. This working group will focus on the following software components: Queue Manager Scheduler Accounting and Allocation Manager Meta Scheduler Other critical resource management components are being developed in the Process Management and Monitoring Working Group: Process Manager Cluster Monitor

4 Resource Management Component Architecture Queue Manager Allocation Manager Node Monitor Meta Scheduler Local Scheduler Node Manager Process Manager Security System Information Service Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Infrastructure Services Event Manager

5 Resource Management Prototype Demonstration Queue Manager Allocation Manager Node Monitor Local Scheduler Process Manager Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Job Submission Client 1 Submit-Job 3 Query-Node 6 Exec-Process 4 Create-Reservation 2 Query-Job 5 Run-Job 8 Delete-Job 0 Service-Lookup 7 Query-Job 9 Withdraw-Allocation This demo runs a simple end-to-end test with a job being submitted running past it’s wallclock limit

6 General Progress SSSRMAP v2.0 interface specification has been implemented and tested by most RMWG components [all except Silver Meta-Scheduler] –Includes both HTTP Wire Protocol and XML Message Format –Includes implementation of SSS Job Object definition v2.0 Implemented default SSSRMAP v2.0 security authentication for most components [all except Silver Meta-Scheduler] –HMAC-SHA1 digital signature with shared secret key –Canonicalizes the XML System testing nearly complete for SSSRMAP v2 (on xtorc)

7 General Progress Created Node Object Specification version 1.0 –Differentiates between configured, available and utilized node properties Proposed set of response/status codes drafted –Success, warning and error response codes –Allows for component-specific and application-specific error- codes –Supports multiple levels of specificity Portability testing is underway (for initial release components) –Cobbed together a bunch of machines to test on

8 Portability Testing Progress

9 General Progress Re-release of v1.0 Initial SSS Resource Management Suite –OpenPBS-SSS 2.3.15-3 + sss_pbs_svr 1.0.3 –Maui Scheduler 3.2.6 –QBank 2.10.5 + sss_qbank_svr 1.0.3 Webpage for RMWG recreated –Relevant documentation –Software downloads (tarballs and rpms) –Mailing lists –Links –Bug tracking

10 Scheduler Progress Implemented SSSRMAP v2 Wire Protocol (all clients, Resource Manager and Allocation Manager interfaces) Implemented SSSRMAP v2 Message Format (75% of clients, job object for Resource Manager and Allocation Manager) Implemented SSSRMAP authentication via shared secret keys Added 64-bit support for HMAC algorithm

11 Scheduler Progress Added limited support for SSSRMAP error codes Improved multi-taskgroup support Added man pages and command-line usage documentation Added resource manager, allocation manager and grid scheduler diagnostics to assist in configuration and troubleshooting

12 Queue Manager Progress Settled on a name: "Bamboo" Implemented SSSRMAP v2 wire protocol (including authentication) Support for SSSRMAP v2 XML support for single step jobs Data storage implemented via flat files or ODBC compliant database. Many improvements in build system, run-time configuration, and logging.

13 Accounting and Allocation Manager Progress QBank –Portability testing has begun Linux, HP-UX, AIX and IRIX completed Gold –Implemented SSSRMAP v2.0 encryption Using 3DES & session key generated from shared secret key Uses compression algorithm –Tested and verified SSSRMAP v2.0 authentication –Support added for Role Based Access Control (fine- grained command authorization)

14 Accounting and Allocation Manager Progress Gold –Object interface in web-based GUI has been implemented (gives you powerful low-level access to allocation manager objects) –Reimplemented the Gold client in Perl to overcome latency issues inherent in java startup overhead –Created a suite of full-featured Perl command-line clients Manages Users, Projects, Machines, Allocations (deposits, withdrawals, refunds, transfers, balance,…), Reservations, Quotations, Jobs, Usage, Transactions –Installed Gold on PNNL 11.8TF Linux cluster in transparent mode to test coherency and stability –Slow progress on open source front Blanket approval letter came out June 24th (CC03-0754) Decision not to commercialize Public domain vs. open source (copyright issues)

15 Meta-Scheduler Progress Documentation for installation, configuration, and troubleshooting

16 Future Work User Oriented Problem Response System Complete portability testing for initial release components –(at least Linux, AIX, +other_UNIX) Release alpha versions of new components –(Bamboo, Silver, Gold) Begin portability testing for new components Create per-component interface specification documents (binding to SSSRMAP) Complete Design Specification documents for new components

17 Future Work Local Scheduler Complete integration of SSSRMAP v2 for queue and node objects Support full suite of allocation manager interface calls Full support for error codes Enhance dynamic job support with queue/task manager

18 Future Work Local Scheduler Support multi-source resource management interface Continued progress in resource limit enforcement and tracking Full resource limit enforcement and tracking configuration Integrate with checkpoint/restart capability (when available)

19 Future Work Queue manager Finish prologue/epilogue support once exit codes are available from process manager Interface with Node Monitor (probably after initial release) IO staging (may need API from process manager) Full multi step job support Package code for distribution. Add support for optional site job submission verification script.

20 Future Work Accounting and Allocation manager User and Admin interface for Gold Web-based GUI will be developed Integration with Directory Service Open source gold (BSD license) SSL over web gui and password authentication Production testing of Gold on 11.8TF Linux cluster (side-by-side with QBank)

21 Future Work Meta Scheduler Implement SSSRMAP v2 Wire Protocol and Message Format Add allocation manager interface support Add threaded support for cluster (local) scheduler interface

22 Issues requiring inter-group discussion Response Codes SC03 User Oriented Problem Response System Need process exit codes from process manager Cluster Monitor Open source


Download ppt "Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting September 11-12, 2003 Washington D.C."

Similar presentations


Ads by Google