Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting May 10-11, 2005 Argonne, IL.

Slides:



Advertisements
Similar presentations
How We Manage SaaS Infrastructure Knowledge Track
Advertisements

TeraGrid Deployment Test of Grid Software JP Navarro TeraGrid Software Integration University of Chicago OGF 21 October 19, 2007.
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Microsoft ® System Center Configuration Manager 2007 R3 and Forefront ® Endpoint Protection Infrastructure Planning and Design Published: October 2008.
Accounting Manager Taking resource usage into your own hands Scott Jackson Pacific Northwest National Laboratory
Introduction to Systems Management Server 2003 Tyler S. Farmer Sr. Technology Specialist II Education Solutions Group Microsoft Corporation.
GridFTP: File Transfer Protocol in Grid Computing Networks
Presented by: Priti Lohani
Distributed components
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
Sensor Grid: Integration of Wireless Sensor Networks and the Grid Authors: Hock Beng Lim, Yong Meng Teo, Protik Mukherjee, Vihn The Lam, Weng Fai Wong,
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Workload Management Massimo Sgaravatto INFN Padova.
Next Generation of Apache Hadoop MapReduce Arun C. Murthy - Hortonworks Founder and Architect Formerly Architect, MapReduce.
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting February 24-25, 2003.
CHAPTER FIVE Enterprise Architectures. Enterprise Architecture (Introduction) An enterprise-wide plan for managing and implementing corporate data assets.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Aug 26-27, 2004 Argonne, IL.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 5-6, 2003.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 13-14, 2002.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Jan 25-26, 2005 Washington D.C.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Architecting Web Services Unit – II – PART - III.
Process Management Working Group Process Management “Meatball” Dallas November 28, 2001.
Resource Management Working Group SSS Quarterly Meeting November 28, 2001 Dallas, Tx.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
SSS Test Results Scalability, Durability, Anomalies Todd Kordenbrock Technology Consultant Scalable Computing Division Sandia is a multiprogram.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Project 2003 Presentation Ben Howard 15 th July 2003.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
Grid Workload Management Massimo Sgaravatto INFN Padova.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
SOA-14: Deploying your SOA Application David Cleary Principal Software Engineer.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting January 15-16, 2004 Argonne, IL.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting September 11-12, 2003 Washington D.C.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
OS and System Software for Ultrascale Architectures – Panel Jeffrey Vetter Oak Ridge National Laboratory Presented to SOS8 13 April 2004 ack.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
OSIsoft High Availability PI Replication
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Data Manipulation with Globus Toolkit Ivan Ivanovski TU München,
Tool Integration with Data and Computation Grid “Grid Wizard 2”
ATLAS Database Access Library Local Area LCG3D Meeting Fermilab, Batavia, USA October 21, 2004 Alexandre Vaniachine (ANL)
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Process Manager Specification Rusty Lusk 1/15/04.
- GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
Module 6: Administering Reporting Services. Overview Server Administration Performance and Reliability Monitoring Database Administration Security Administration.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Next Generation of Apache Hadoop MapReduce Owen
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
OSIsoft High Availability PI Replication Colin Breck, PI Server Team Dave Oda, PI SDK Team.
Allocation Management Solutions for High Performance Computing Scott M. Jackson Workshop on Scheduling and Resource Management for Parallel and Distributed.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Workload Management Workpackage
Wide Area Workload Management Work Package DATAGRID project
Presentation transcript:

Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting May 10-11, 2005 Argonne, IL

Resource Management and Accounting Working Group Working group scope Progress since last face-to-face Future Work Other issues

Working Group Scope The Resource Management Working Group is involved in the areas of resource management, scheduling and accounting. This working group will focus on the following software components: Queue Manager Scheduler Accounting and Allocation Manager Meta Scheduler Other critical resource management components are being developed in the Process Management and Monitoring Working Group: Process Manager Cluster Monitor

Resource Management Component Architecture Queue Manager Allocation Manager Node Monitor Grid Scheduler Cluster Scheduler Node Manager Process Manager Security System Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Infrastructure Services Event Manager

Resource Management Prototype Demonstration Queue Manager Allocation Manager Node Monitor Cluster Scheduler Process Manager Discovery Service Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Job Submission Client 1 Submit-Job 3 Query-Node 6 Exec-Process 4 Create-Reservation 2 Query-Job 5 Run-Job 8 Delete-Job 0 Service-Lookup 7 Query-Job 9 Withdraw-Allocation This demo runs a simple end-to-end test with a job being submitted running past it’s wallclock limit

General Progress New release of RMWG components made available from SSS web site –Bamboo Queue Manager v1.1 –Maui Scheduler v3.2.6p13 –Gold Accounting and Allocation Manager v2.b2.10.2

General Progress Continued Adoption of SSS components and interfaces –SSS suite running on additional systems in Ames –Gold being used in production on University of Utah’s Icebox cluster

General Progress Working on integration of SSSRMAP into ssslib –Bill Pitre -- implementing the SSSRMAP Message Format SDK (Python classes) –Craig Steffen -- integrating SSSRMAP Wire Level protocol into ssslib

General Progress Paper accepted for presentation and publication at a conference –Title: Allocation Management Solutions for High Performance Computing –Conference: Parallel and Distributed Processing Techniques and Applications (PDPTA'05) –Workshop on “Scheduling and Resource Management for Parallel and Distributed Systems”

General Progress New Documents in SSS RMWG Notebook –Considerations for using SOAP as the basis for SSSRMAP v4 –Fault Tolerance with Gold –Last Quarter’s Weekly RMWG Meeting Notes

Queue Manager Progress V1.1 release of Bamboo made available SSS suite running on several systems in Ames. Support for Task Groups and Node Properties added to server. Added a new mailing feature New fountain component created to pull node information from multiple sources. –Simple node information now supported. –Working on adding support for SuperMon, Ganglia and NWPerf

Accounting and Allocation Manager Progress New release of Gold available – 2 nd Gold Beta v2.b –v2.b2.7.0 incorporated into OSCAR release Gold being used in production on University of Utah’s Icebox cluster Implemented and tested design for distributed accounting and multi-organizational negotiation in job launching Implemented fault tolerance to 50% cluster loss by adding support for a backup gold server. –Clients can failover to a backup gold server if defined –The database can be made fault tolerant by utilizing a synchronous multi-master replication system such as pgcluster. –documented in RMWG notebook

Accounting and Allocation Manager Progress Simplified ease of use for allocation management for basic configurations by adding ability to hide account abstraction layer –enabled account auto-generation, project-level deposits, etc. Ported Gold to Tier3 and Tier4 OS’s –(OS-X, IRIX, HP-UX, Solaris) - unable to get access to Unicos Enabled support for mysql database

Cluster Scheduler Progress Migrated latest MCOM library into Maui –includes support for encryption, scalability enhancements, sss return codes, job description extensions, etc. Enabled support for partitions, node features Enhanced recovery modes for failures and unexpected conditions Additional QOS modes for Allocation Manager –fallback QOS, QOS requested vs. delivered Fixed additional packaging bugs, buffer overflows Started work on multi-taskgroup jobs

Grid Scheduler Progress Added support for multi-site authentication (per peer-service symmetric keys) Rolling X.509 credential management into MCOM library Enabled support for Globus 3.x (had to workaround a lot of Globus bugs) Enhanced grid job queue and launch Reliability - completed Globus failure diagnostics, logging and auto- recovery Data Staging - completed Globus/non-Globus data staging failure auto- recovery Fairness - implemented Priority, Fairshare, and Usage Limit based policy enforcement Statistics - added credential, job, and cluster based usage statistics

Future Work General release of all components –Including new Silver Meta-scheduler Increase deployment base Integrate SSSRMAP into ssslib Portability testing for all components Fault Tolerance supporting 25% cluster loss

Future Work Queue manager Add job group support (mainly for submission) Add Job Submission filter Finish final missing portions of PBS style job language support.

Future Work Accounting and Allocation manager General release to be made available by mid-year Production deployment of Gold on additional sites Port Gold GUI from JSP to Perl CGI Add support for multi-site authentication (each site having its own symmetric key) Documentation to include object customization

Future Work Cluster Scheduler Add support for multi-taskgroup SSS jobs Support SSS job extensions and job-level policies Peer Diagnostics - add auto-recovery to failed service interfaces Resource Utilization - complete development of all resource utilization objectives Resource Limits - complete development of all resource limits objectives Checkpoint Restart – test with LBNL and optimize resource management for suspended jobs Get X.509 credential management working

Future Work Grid Scheduler Release Silver meta-scheduler –targeting end of June for alpha release –need to test Maui/Silver interoperability with new MCOM lib Need to test –Priority, Fairshare, and Usage Limit based policy enforcement –credential, job, and cluster based usage statistics Optimization - add network co-allocation reservation General - mature client commands to provide status reporting in more intuitive manner