Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.

Slides:



Advertisements
Similar presentations
Accounting Manager Taking resource usage into your own hands Scott Jackson Pacific Northwest National Laboratory
Advertisements

CSF4, SGE and Gfarm Integration Zhaohui Ding Jilin University.
A Computation Management Agent for Multi-Institutional Grids
LUNARC, Lund UniversityLSCS 2002 Transparent access to finite element applications using grid and web technology J. Lindemann P.A. Wernberg and G. Sandberg.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Distributed Application Management Using PLuSH Jeannie Albrecht, Christopher Tuttle, Alex C. Snoeren, and Amin Vahdat UC San Diego CSE {jalbrecht, ctuttle,
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Workload Management Massimo Sgaravatto INFN Padova.
Understanding and Managing WebSphere V5
Minerva Infrastructure Meeting – October 04, 2011.
Configuration Management Supplement 67 Robert Horn, Agfa Healthcare.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting February 24-25, 2003.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
A View from the Top End of Year 1 Al Geist October Houston TX.
KARMA with ProActive Parallel Suite 12/01/2009 Air France, Sophia Antipolis Solutions and Services for Accelerating your Applications.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Aug 26-27, 2004 Argonne, IL.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 5-6, 2003.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Towards a Javascript CoG Kit Gregor von Laszewski Fugang Wang Marlon Pierce Gerald Guo
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 13-14, 2002.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Jan 25-26, 2005 Washington D.C.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Process Management Working Group Process Management “Meatball” Dallas November 28, 2001.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
INFSO-RI Module 01 ETICS Overview Alberto Di Meglio.
Resource Management Working Group SSS Quarterly Meeting November 28, 2001 Dallas, Tx.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
INFSO-RI Module 01 ETICS Overview Etics Online Tutorial Marian ŻUREK Baltic Grid II Summer School Vilnius, 2-3 July 2009.
Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
London e-Science Centre GridSAM Job Submission and Monitoring Web Service William Lee, Stephen McGough.
Grid Workload Management Massimo Sgaravatto INFN Padova.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting January 15-16, 2004 Argonne, IL.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting September 11-12, 2003 Washington D.C.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting May 10-11, 2005 Argonne, IL.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
EMI INFSO-RI ARC tools for revision and nightly functional tests Jozef Cernak, Marek Kocan, Eva Cernakova (P. J. Safarik University in Kosice, Kosice,
Service Proforma Middleware Workshop. Notes Please complete as much of this proforma as possible – it will help make the workshop more informative & productive.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Data Manipulation with Globus Toolkit Ivan Ivanovski TU München,
Tool Integration with Data and Computation Grid “Grid Wizard 2”
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
ATLAS Database Access Library Local Area LCG3D Meeting Fermilab, Batavia, USA October 21, 2004 Alexandre Vaniachine (ANL)
Process Manager Specification Rusty Lusk 1/15/04.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
Review of PARK Reflectometry Group 10/31/2007. Outline Goal Hardware target Software infrastructure PARK organization Use cases Park Components. GUI /
Architecture Review 10/11/2004
DUCKS – Distributed User-mode Chirp-Knowledgeable Server
Leigh Grundhoefer Indiana University
Module 01 ETICS Overview ETICS Online Tutorials
Wide Area Workload Management Work Package DATAGRID project
Condor-G: An Update.
Presentation transcript:

Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002

Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving larger group

Working Group Scope The Resource Management Working Group is involved in the areas of resource management, scheduling and accounting. This working group will focus on the following software components: Queue Manager (/Job Manager) Scheduler Accounting and Allocation Manager Meta Scheduler Other critical resource management components are being developed in the Process Management and Monitoring Working Group: Process Manager Node Monitor

Proposed Component Architecture Queue Manager Allocation Manager Node Monitor Meta Scheduler Local Scheduler Node Manager Process Manager Security System Information Service Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Infrastructure Services Event Manager

Resource Management Prototype Demonstration Queue Manager Allocation Manager Node Monitor Local Scheduler Process Manager Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Job Submission Client 1 Submit-Job 3 Query-Node 6 Exec-Process 4 Create-Reservation 2 Query-Job 5 Run-Job 8 Delete-Job 0 Service-Lookup 7 Query-Job 9 Withdraw-Allocation This demo runs a simple end-to-end test with a job being submitted running past it’s wallclock limit

General Progress Initial draft of Scalable Systems Software Resource Management and Accounting Protocol (SSSRMAP) completed Requirements documents nearly complete for all components All components under revision control

Scheduler Progress Extended internal XML Usage Implemented SSSRMAP XML interface for queue manager, node monitor and allocation/accounting manager Enhanced internal scalability to support up to 50,000 nodes Added support for HTTP framing protocol Added internal suspend/resume and checkpoint/requeue management code (interfaced to PBS, LSF, and LL) Created subset of XML-based job control and state control clients for use with GUI tools Significant testing and documentation of existing features (priority and QOS enhancements)

Queue Manager Progress Conformance to the SSSRMAP XML specification Synchronization of the job attribute types with PBS SSS front-end Full wire protocol compatibility with basic, challenge, and ANL versions of basic and challenge Multiple server ports employed to allow multiple client protocols simultaneously New interface with Event Manager Added job signaling support with the Process Manager

Allocation Manager Progress Requirements and survey sent out to 15 sites and vendors Allocation management component placed under bitkeeper Implemented HTTP framing protocol and tested performance Support for expression grouping in queries Journaling implemented – undo and redo working Got SHA1-HMAC security working with QBank/Maui Reframed bank objects (accounts, users, allocations, etc.) as dynamically introduced objects Object actions defined in metadata cache Creation of dynamic web-GUI using PHP and javascript (forms for object creation, querying, modification, deletion and undeletion)

Meta Scheduler Progress Development of submission client Support for PBS ‘command file’ keywords and semantics Ability to run jobs end-to-end Fault tolerance improvements (Cluster scheduler reconnection and global JobId tracking) Added interfaces to interoperate with grid systems (Globus) Improved user interface Partial XML local scheduler-meta scheduler language defined and implemented

Current Issues Job State Management for Queue Manager Data staging Job signaling Support for Job steps Integration with Node Monitor

Next Work Prepare for SC demos Scalability Testing Release v1.0 of Resource Management System for existing components Basic documentation Security authentication Need to solidify RMS-wide standards for packaging, build procedure, revision control, and distribution home.

Scheduler Future Integrate SSS security protocols Extend GUI support Full support for XML allocation manager language Extend SSS language to support suspend/resume and checkpoint/requeue Test TM interface fault tolerance features (corrupt data, bad connections, etc.)

Queue Manager Future Add Epilogue/Prologue support Add job submission verification script Interface with Node Monitor Full PBS qsub compatibility Add interface with Node Manager to support job dependent node OS image installation

Allocation Manager Future Focus on getting QBank ready for bundling and release with SSS RMS system (security, use key, improved installation procedure) Focus effort on open source of new Allocation Manager (gold) Implementation of enhanced allocation, reservation mechanisms which utilize simple pricing engine and log job and usage data Security authentication (gold) Support for operations on returned fields (sort, sum, max, unique, group by, etc.) Integrate SSSLIB connection protocol & discovery service

Meta Scheduler Future Fault tolerance improvements Initial data management (data stage- in/stage-back) Full XML local scheduler-meta scheduler language defined and implemented

Issues requiring inter-group coordination Resource controller for handling switch allocation, licenses, resource limit enforcement (logical partioning) How is checkpointing and suspend/resume routed through Who manages node access control? Dynamic jobs