Resource Management Working Group SSS Quarterly Meeting November 28, 2001 Dallas, Tx.

Slides:



Advertisements
Similar presentations
TeraGrid Deployment Test of Grid Software JP Navarro TeraGrid Software Integration University of Chicago OGF 21 October 19, 2007.
Advertisements

Accounting Manager Taking resource usage into your own hands Scott Jackson
Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
Accounting Manager Taking resource usage into your own hands Scott Jackson Pacific Northwest National Laboratory
Software Process Models
Software Engineering 1 Evolutionary Processes Lesson 11.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
Latest techniques and Applications in Interprocess Communication and Coordination Xiaoou Zhang.
02/12/00 E-Business Architecture
ARCS Data Analysis Software An overview of the ARCS software management plan Michael Aivazis California Institute of Technology ARCS Baseline Review March.
Chapter 15 Design, Coding, and Testing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Design Document The next step in the Software.
Introduction to z/OS Basics © 2006 IBM Corporation Chapter 8: Designing and developing applications for z/OS.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Effort in hours Duration Over Weeks Or Months Inception Launch Web Lifecycle Methodology Maintenance Phases Copyright Wonderlane Studios.
Configuration Management
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Release & Deployment ITIL Version 3
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. Chapter 7: Designing and developing applications for z/OS.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting February 24-25, 2003.
Effective Methods for Software and Systems Integration
Web Development Process Description
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
IIIAURJCUPV Task 7.1 Software architecture and computation model E. Marcos C. Acuña Task 7.2 Multiagent System Platform A. Espinosa Task.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
RUP Fundamentals - Instructor Notes
1 IBM Software Group ® Mastering Object-Oriented Analysis and Design with UML 2.0 Module 1: Best Practices of Software Engineering.
Chapter 2 The process Process, Methods, and Tools
Software Configuration Management
Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving.
Rational Unified Process Fundamentals Module 4: Disciplines II.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Aug 26-27, 2004 Argonne, IL.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 5-6, 2003.
CS 360 Lecture 3.  The software process is a structured set of activities required to develop a software system.  Fundamental Assumption:  Good software.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 13-14, 2002.
1 ISA&D7‏/8‏/ ISA&D7‏/8‏/2013 Systems Development Life Cycle Phases and Activities in the SDLC Variations of the SDLC models.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Jan 25-26, 2005 Washington D.C.
Capability Maturity Models Software Engineering Institute (supported by DoD) The problems of software development are mainly caused by poor process management.
Sep 30, 2000XML Workshop Talk, IIT Bombay XML Standardization for Business Applications Dr. Vasudev Kamath Persistent Systems.
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
Process Management Working Group Process Management “Meatball” Dallas November 28, 2001.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May Grid Monitoring Services Robin Middleton RAL/PPD24-May-01.
Software Quality Assurance
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting January 15-16, 2004 Argonne, IL.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting September 11-12, 2003 Washington D.C.
Notes of Rational Related cyt. 2 Outline 3 Capturing business requirements using use cases Practical principles  Find the right boundaries for your.
Fifth Lecture Hour 9:30 – 10:20 am, September 9, 2001 Framework for a Software Management Process – Life Cycle Phases (Part II, Chapter 5 of Royce’ book)
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting May 10-11, 2005 Argonne, IL.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
JRA Execution Plan 13 January JRA1 Execution Plan Frédéric Hemmer EGEE Middleware Manager EGEE is proposed as a project funded by the European.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Overview of RUP Lunch and Learn. Overview of RUP © 2008 Cardinal Solutions Group 2 Welcome  Introductions  What is your experience with RUP  What is.
Test Plan: Introduction o Primary focus: developer testing –Implementation phase –Release testing –Maintenance and enhancement o Secondary focus: formal.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Information Architecture BOF: Report of the Fall 2003 Meeting October 28, 2003 Dan Crichton, NASA/JPL.
State of Georgia Release Management Training
Software Development Process CS 360 Lecture 3. Software Process The software process is a structured set of activities required to develop a software.
INFSO-RI JRA2 Test Management Tools Eva Takacs (4D SOFT) ETICS 2 Final Review Brussels - 11 May 2010.
PDS4 Project Report PDS MC F2F University of Maryland Dan Crichton March 27,
Methodologies and Algorithms
Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016
IEEE Std 1074: Standard for Software Lifecycle
Proposed SysML v2 Submission Plan
Wide Area Workload Management Work Package DATAGRID project
Presentation transcript:

Resource Management Working Group SSS Quarterly Meeting November 28, 2001 Dallas, Tx

Resource Management and Accounting Working Group Working group scope and components Working group scope and components Progress made Progress made Current and future issues Current and future issues Next steps Next steps

Working Group Scope The Resource Management Working Group encompasses the areas of resource management, scheduling and accounting. This working group will focus on the following software components: Queue Manager Queue Manager Scheduler Scheduler Allocation Manager Allocation Manager Meta Scheduler Meta Scheduler Our charter will also encompass the following capabilities: Accounting Accounting Usage Reports Usage Reports

Phase 1 Milestones 6 months:Contribute to checkpoint/restart report with regard to scheduling related aspects 6 months:Contribute to checkpoint/restart report with regard to scheduling related aspects 12 months: Establish and release initial resource management interface specifications 12 months: Establish and release initial resource management interface specifications 12 months: Establishment of the CVS repository and module structure, agreement on document conventions 12 months: Establishment of the CVS repository and module structure, agreement on document conventions 12 months: Finalized API for system initiated checkpoint/restart of parallel MPI jobs on Linux systems 12 months: Finalized API for system initiated checkpoint/restart of parallel MPI jobs on Linux systems 18 months: Release v1.0 of the Center’s resource management system based on existing open source code and the results of the scalability testing. 18 months: Release v1.0 of the Center’s resource management system based on existing open source code and the results of the scalability testing.

High Level Progress Establishing high level design covering initial component functionality and required interfaces Establishing high level design covering initial component functionality and required interfaces Determining inter-group requirements (GUI, security, IS, process management, etc) Determining inter-group requirements (GUI, security, IS, process management, etc) Preparing existing tools (Maui, Silver, QBank) for use within SSS Preparing existing tools (Maui, Silver, QBank) for use within SSS Creating infrastructure within which to develop and test RM deliverables Creating infrastructure within which to develop and test RM deliverables Creating infrastructure within which to develop and test intra- and inter-group interfaces Creating infrastructure within which to develop and test intra- and inter-group interfaces

Proposed Component Architecture Queue Manager Allocation Manager Collector Meta Scheduler Node Manager Process Manager Security System Information Service Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Config and Infrastructure

Component Interaction Diagram Job submitted to Queue Manager User Interface CollectorMeta Scheduler Queue Manager Allocation Manager SchedulerProcess Manager

Component Interaction Trace Job submitted to Queue Manager 1.A user submits a job to the Queue Manager 2.The Queue Manager does a sanity balance check with the Bank 3.The Queue Manager notifies the Scheduler that a new job has arrived 4.The Scheduler queries node and job status until job can run 5.A bank reservation is made with the Allocation Manager 6.The Scheduler requests the Queue Manager to run the job 7.The Queue Manager passes job control to the Process Manager 8.The Process Manager notifies Queue Manager of job completion 9.The Queue Manager notifies Scheduler of job completion 10.A bank withdrawal is made with the Allocation Manager 11.The user is notified of job completion

Component Interaction Diagram Job submitted to Meta Scheduler User Interface CollectorMeta Scheduler Queue Manager Allocation Manager SchedulerProcess Manager

Component Interaction Trace Job submitted to Meta Scheduler 1.A user submits a job to the Meta Scheduler 2.The Meta Scheduler contacts Schedulers to determine which systems could run the job the soonest 3.The Schedulers request quotes from Allocation Banks to determine which systems would run the job for the lowest cost 4.A Scheduler reservation is created for the job on the resource providing the best service -- this reservation can be moved or improved upon until the job is staged 5.The job is staged and queued at the system where it is to run 6.The Queue Manager notifies the Scheduler that a new job has arrived 7.The Scheduler queries node and job status until job can run 8.A bank reservation is made with the Allocation Manager 9.The Scheduler requests the Queue Manager to run the job 10.The Queue Manager passes job control to the Process Manager 11.The Process Manager notifies Queue Manager of job completion 12.The Queue Manager notifies Scheduler of job completion 13.A bank withdrawal is made with the Allocation Manager 14.The Scheduler notifies the Meta Scheduler of job completion 15.The user is notified of job completion

Design/Interface Progress Initial high level RMS architecture defined Initial high level RMS architecture defined Resource management dictionary created defining objects within resource management ‘world’ Resource management dictionary created defining objects within resource management ‘world’ Object ‘tokens’ declared for major objects Object ‘tokens’ declared for major objects Component functional interfaces identified Component functional interfaces identified Initial XML request/response syntax proposed Initial XML request/response syntax proposed Prototypes being constructed to test communication protocols Prototypes being constructed to test communication protocols Initial detailed extra-group component requirements document created Initial detailed extra-group component requirements document created

Local Scheduler Rationale Local interfaces with majority of inter and intra RM components Establish test platform from which interfaces can be tested Leverage existing capabilities to accelerate SSS development Establish infrastructure within which scheduling and metascheduling services and capabilities can be developed Establish ‘driver’ to evaluate other resource management components

Local Scheduler Progress Baseline scheduler established (Maui 3.2) for SSS scheduling services integrating production and development capabilities Baseline scheduler established (Maui 3.2) for SSS scheduling services integrating production and development capabilities Prototype interface enabling XML communication with queue manager, metascheduler, and node manager Prototype interface enabling XML communication with queue manager, metascheduler, and node manager Extended QoS infrastructure integrated Extended QoS infrastructure integrated Extended Job prioritization infrastructure integrated Extended Job prioritization infrastructure integrated Prototype created for object-oriented data access Prototype created for object-oriented data access Advanced metascheduling interface integrated Advanced metascheduling interface integrated

Meta Scheduler Progress Initial distribution packaging created to allow collaborative development Initial distribution packaging created to allow collaborative development Documentation enhanced and extended Documentation enhanced and extended Prototype XML scheduler to metascheduler query interface developed Prototype XML scheduler to metascheduler query interface developed Initial fault tolerance framework designed Initial fault tolerance framework designed

Queue Manager Design Established need for unified queue manager design common to Scheduler and Metascheduler Established need for unified queue manager design common to Scheduler and Metascheduler Queue manager will interface directly with Process manager Queue manager will interface directly with Process manager In process of refining the queue manager tasks In process of refining the queue manager tasks Queue manager will provide an interface to obtain information about any job regardless of job state including completed jobs (i.e. it will maintain a job information archive) Queue manager will provide an interface to obtain information about any job regardless of job state including completed jobs (i.e. it will maintain a job information archive)

Allocation Manager Progress QBank placed under revision control QBank placed under revision control Java prototype created which sends requests in XML Java prototype created which sends requests in XML Experimenting with protocol frameworks (simple octet-counting, octet-stuffing, SOAP, BEEP) Experimenting with protocol frameworks (simple octet-counting, octet-stuffing, SOAP, BEEP)

Next Steps (In Progress) Software Lifecycle Infrastructure Software Lifecycle Infrastructure –Online intra-RM schedule and dependencies document –Detailed extra-RM working group requirements –Coordinate creation of component level regression test suite –Bug tracking systems activated (used to track internal defects and development plans) Interface Interface –Produce validating intra-RM XML schema –Produce prototype RM components communicating in initial protocol Feature Enhancements Feature Enhancements –Contribution to checkpoint/restart report –Creation of queue manager prototype

Next Steps (6 Months) Usability Usability –GUI-server interface, GUI format, security determined and prototypes created –Documentation of initial meta job constraints/features and specification language Inter-group Collaboration Inter-group Collaboration –Creation of early scheduler XML implementation for use as RM driver –Development of initial dynamic job scheduler-queue manager interface –Extension of RM specifications/requirement document –Extension of internal component test infrastructure –Determination of ‘best practices’ in documentation maintenance –Evaluation and adoption of web project management and collaboration tools –Creation of prototype queue manager with scheduler/task manager interfaces

Next Steps (6 Months) Fault Tolerance Fault Tolerance –Enhance metascheduler to ‘survive’ local daemon failure –Enhancement of threaded scheduling interface. –Development of threaded metascheduling interface. Resource Optimization Resource Optimization –Development of local optimization features of meta workload Feature Enhancements Feature Enhancements –Creation of resource manager extension features. –Development of direct metascheduler to queue manager staging roadmap. Interfaces Interfaces –Specification of ‘best guess’ security infrastructure and evaluation of impact on system internals and communication protocols

Next Steps (1 year) Software Lifecycle Infrastructure Software Lifecycle Infrastructure –Create multi-component regression tests –Generate ‘alpha’ package of scheduling, metascheduling, and allocation management packages. Interfaces Interfaces –Development of functional XML interfaces for all components –Early adoption of security infrastructure –Creation of optional information service interfaces –Admin and end-user GUI’s proposed to enable use of new functionality Inter-group Collaboration Inter-group Collaboration –Enhanced suspend/resume and checkpoint/restart features with detailed roadmap specified for all remaining suspend/resume and checkpoint restart deliverables

Current Issues Should there be an enveloping protocol framework which handles framing (where the XML document begins and ends), authentication, multiplexing, streaming data, etc? (should we look at something like BEEP, or start from scratch and invent something of our own?) Should there be an enveloping protocol framework which handles framing (where the XML document begins and ends), authentication, multiplexing, streaming data, etc? (should we look at something like BEEP, or start from scratch and invent something of our own?) The queue manager/collector to node/process manager functionality and data interface requires further refinement. The queue manager/collector to node/process manager functionality and data interface requires further refinement. Queue manager/collector and node/process manager development schedules must be determined and coordinated. Queue manager/collector and node/process manager development schedules must be determined and coordinated.

Issues Issues Continued effort is required to complete an ‘intra-RM’ XML schema to handle initial RMS interaction needs. Boundaries between internal ‘intra-RM’ and global XML schema is needed. Continued effort is required to complete an ‘intra-RM’ XML schema to handle initial RMS interaction needs. Boundaries between internal ‘intra-RM’ and global XML schema is needed. Understanding of open source requirements (I.e. can software be included in SSS distribution that requires registration and usage agreements) Understanding of open source requirements (I.e. can software be included in SSS distribution that requires registration and usage agreements)

Inter-Group Issues Need for coordination of resource management system across working groups – so that the pieces all function together properly and no part is overlooked. Need to coordinate schedules for delivery of RMWG-dependent non-RMWG components. Need for coordination of resource management system across working groups – so that the pieces all function together properly and no part is overlooked. Need to coordinate schedules for delivery of RMWG-dependent non-RMWG components. Early vendor/industry collaborations (We’d better do this while it can still influence our design. Need to talk to decision makers and develop business plans) Early vendor/industry collaborations (We’d better do this while it can still influence our design. Need to talk to decision makers and develop business plans)

Inter-group Issues Information service – should we rather be looking for something existing? (i.e. MDS2) Information service – should we rather be looking for something existing? (i.e. MDS2) Need to solidify SSS-wide standards for packaging, revision control, documentation content, format, and packaging, problem tracking, … and establish mechanisms and places to home them. Need to solidify SSS-wide standards for packaging, revision control, documentation content, format, and packaging, problem tracking, … and establish mechanisms and places to home them. Creation of regression and integration test suite (w/ Validation and Testing WG – we need this from an early stage) Creation of regression and integration test suite (w/ Validation and Testing WG – we need this from an early stage)

Conclusions Questions… Questions…