Oak Ridge National Laboratory -- U.S. Department of Energy 1 SSS Deployment using OSCAR John Mugler, Thomas Naughton & Stephen Scott May 2005, Argonne,

Slides:



Advertisements
Similar presentations
Introduction to Maven 2.0 An open source build tool for Enterprise Java projects Mahen Goonewardene.
Advertisements

Introduction to Systems Management Server 2003 Tyler S. Farmer Sr. Technology Specialist II Education Solutions Group Microsoft Corporation.
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
1 1 Roadmap to an IEPD What do developers need to do?
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
11 MAINTAINING THE OPERATING SYSTEM Chapter 5. Chapter 5: MAINTAINING THE OPERATING SYSTEM2 CHAPTER OVERVIEW  Understand the difference between service.
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting February 24-25, 2003.
Oak Ridge National Laboratory — U.S. Department of Energy 1 The ORNL Cluster Computing Experience… John L. Mugler Stephen L. Scott Oak Ridge National Laboratory.
BMC Software confidential. BMC Performance Manager Will Brown.
A View from the Top End of Year 1 Al Geist October Houston TX.
SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.
Maven & Bamboo CONTINUOUS INTEGRATION. QA in a large organization In a large organization that manages over 100 applications and over 20 developers, implementing.
SUSE Linux Enterprise Server Administration (Course 3037) Chapter 4 Manage Software for SUSE Linux Enterprise Server.
Progress on Integration, Vote on APIs SC2003, and SW release Al Geist September 11-12, 2003 Rockville, MD.
1 Documentum 6 Reviews from Early Adopters Technology Services Group, Inc. October 10, 2007.
Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving.
Open Source Cluster Applications Resources. Overview What is O.S.C.A.R.? History Installation Operation Spin-offs Conclusions.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Aug 26-27, 2004 Argonne, IL.
SC04 Release, API Discussions, SDK, and FastOS Al Geist August 26-27, 2004 Chicago, ILL.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 5-6, 2003.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 13-14, 2002.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
Presented by Open Source Cluster Application Resources (OSCAR) Stephen L. Scott Thomas Naughton Geoffroy Vallée Network and Cluster Computing Computer.
Oak Ridge National Laboratory — U.S. Department of Energy 1 The ORNL Cluster Computing Experience… Stephen L. Scott Oak Ridge National Laboratory Computer.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Jan 25-26, 2005 Washington D.C.
Process Management Working Group Process Management “Meatball” Dallas November 28, 2001.
Working Group updates, SSS-OSCAR Releases, API Discussions, External Users, and SciDAC Phase 2 Al Geist May 10-11, 2005 Chicago, ILL.
Resource Management Working Group SSS Quarterly Meeting November 28, 2001 Dallas, Tx.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
Install Software. UNIX Shell The UNIX/LINUX shell is a program important part of a Unix system. interface between the user & UNIX kernel starts running.
SSS Test Results Scalability, Durability, Anomalies Todd Kordenbrock Technology Consultant Scalable Computing Division Sandia is a multiprogram.
Progress on Release, API Discussions, Vote on APIs, and Quarterly Report Al Geist May 6-7, 2004 Chicago, ILL.
Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting January 15-16, 2004 Argonne, IL.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting September 11-12, 2003 Washington D.C.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting May 10-11, 2005 Argonne, IL.
SciDAC SSS Quarterly Report Sandia Labs August 27, 2004 William McLendon Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed.
Process Management & Monitoring WG Quarterly Report January 25, 2005.
OS and System Software for Ultrascale Architectures – Panel Jeffrey Vetter Oak Ridge National Laboratory Presented to SOS8 13 April 2004 ack.
Presented by Open Source Cluster Application Resources (OSCAR) Stephen L. Scott Thomas Naughton Geoffroy Vallée Computer Science Research Group Computer.
Erik P. DeBenedictis Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
14th Oct 2005CERN AB Controls Development Process of Accelerator Controls Software G.Kruk L.Mestre, V.Paris, S.Oglaza, V. Baggiolini, E.Roux and Application.
A View from the Top Al Geist June Houston TX.
SSS Build and Configuration Management Update February 24, 2003 Narayan Desai
National Energy Research Scientific Computing Center (NERSC) CHOS - CHROOT OS Shane Canon NERSC Center Division, LBNL SC 2004 November 2004.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
Jan Hatje, DESY SNL Editor and Debugger EPICS collaboration meeting SNL Editor and Debugger EPICS collaboration meeting 2008 Shanghai Institute.
Process Manager Specification Rusty Lusk 1/15/04.
EGEE is a project funded by the European Union under contract IST Installation and configuration of gLite services Robert Harakaly, CERN,
SPI Software Process & Infrastructure Project Plan 2004 H1 LCG-PEB Meeting - 06 April 2004 Alberto AIMAR
Scribe Technical Workshop Adapter for OLE DB Import-Export Wizard September 13, 2007.
SciDAC CS ISIC Scalable Systems Software for Terascale Computer Centers Al Geist SciDAC CS ISIC Meeting February 17, 2005 DOE Headquarters Research sponsored.
Process Management & Monitoring WG Quarterly Report August 26, 2004.
SciDAC SSS Quarterly Report Sandia Labs January 25, 2005 William McLendon Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed.
OSCAR Symposium – Quebec City, Canada – June 2008 Proposal for Modifications to the OSCAR Architecture to Address Challenges in Distributed System Management.
Tools Report Engineering Node August 2007
GWE Core Grid Wizard Enterprise (
WP4-install status update
Computing Experience…
Leanne Guy EGEE JRA1 Test Team Manager
Storage SIG State and Future
The JSF Tools Project – WTP (internal) release review
Presentation transcript:

Oak Ridge National Laboratory -- U.S. Department of Energy 1 SSS Deployment using OSCAR John Mugler, Thomas Naughton & Stephen Scott May 2005, Argonne, IL SSS Face-to-face meeting

Oak Ridge National Laboratory -- U.S. Department of Energy 2 OSCAR: Cluster Toolkit Framework for cluster management –simplifies installation, configuration and operation –reduces time/learning curve for cluster build requires: pre-installed headnode w. supported Linux distribution thereafter: wizard guides user thru setup/install of entire cluster Package-based framework –Content: Software + Configuration, Tests, Docs –Types: Core: SIS, C3, Switcher, ODA, OPD, (Support Libs) Non-core: selected & third-party –Access: repositories accessible via OPD/OPDer

Oak Ridge National Laboratory -- U.S. Department of Energy 3 OSCAR Wizard * OSCAR-3.0 release

Oak Ridge National Laboratory -- U.S. Department of Energy 4 Using OSCAR for SSS Problem: Helping users obtain and install SSS software. Solution: Leverage OSCAR framework to package and distribute the SSS suite, sss-oscar. sss-oscar  A release of OSCAR containing all SSS software in single downloadable bundle.

Oak Ridge National Laboratory -- U.S. Department of Energy 5 OSCAR-ized SSS Components Bamboo – Queue/Job Manager BLCR – Berkeley Checkpoint/Restart Gold – Accounting & Allocation Management System LAM/MPI (w/ BLCR) – Checkpoint/Restart enabled MPI MAUI-SSS – Job Scheduler SSSLib – SSS Communication library –Includes: SD, EM, PM, BCM, NSM, NWI Warehouse – Distributed System Monitor MPD2 – MPI Process Manager * As of May 2005

Oak Ridge National Laboratory -- U.S. Department of Energy 6 Current Status Released v1.0 at SC’04 –Based on oscar-3.0 (using Red Hat 9/x86) –All SSS components represented Testing for v1.1 release –Small update release –Still oscar-3.0 based Synchronize with OSCAR release schedule –oscar-4.1 released –Shift to oscar-4.1 in sss-oscar-1.2 release (2Q2005)

Oak Ridge National Laboratory -- U.S. Department of Energy 7 OSCAR v4.1 Highlights SSS’s APItest tool integrated into v4.1 release Improved use of DepMan/PackMan abs. layer Distributions supported in v4.1 –x86: RH 9, FC2, MDK 10.0 –x86 & ia64: RH EL 3 Initial work started for Debian –Not in v4.1 release but working with 4.x devel tree

Oak Ridge National Laboratory -- U.S. Department of Energy 8 TODO: SSS Short term –Complete testing for v1.1beta & release –Update SSS documentation Medium term –Migrate to new FRE testbed and repository (pending approval) –New/more Linux distribution/architecture/kernel support Longer term –Extend SSS component tests 1) Installation, 2) Validation, 3) Durability/Stress, 4) Performance –Track oscar-4.x releases for v5.0 compatibility –Distribute as OSCAR “Package Set” Pending feature support in OSCAR –OPKG ordering within a phase Pending feature support in OSCAR

Oak Ridge National Laboratory -- U.S. Department of Energy 9 SSS-OSCAR Release Schedule SSS Version Freeze Date Release Timeframe Based on OSCAR v1.1Feb 15Mayoscar-3.0 v1.2Jun 15Julyoscar-4.1 v1.3Aug 15Septoscar-4.x v1.4/2.0Oct 15Nov - SC’05oscar-5.0 Add features to /

Oak Ridge National Laboratory -- U.S. Department of Energy 10 Roadmap 1.2 (frz: jun, rel: jul) –Fedor Core 2 / Pkg rebuild BLCR upgrade to linux-2.6 –Improved install/validation tests –oscar-4.1 opkg modifications (updates) Updates to HOWTO as needed Simplify XML meta file –Close (most) open tracker issues 2.0 (frz: aug, rel: sep) –LRS change over –Fedora Core 4 / Pkg rebuild –Improved install/validation tests –Add performance/stress tests? –oscar-4.x opkg modifications (updates) Updates to HOWTO as needed –Meta-scheduler (Silver)? (frz: oct, rel: nov) [SC’05] –Any bugfixes/minor updates 2.02 –SSS oscar-pkg set

Oak Ridge National Laboratory -- U.S. Department of Energy 11 Goals for sss-oscar-2.0 Release v2.0 at SC’05 Compatible with oscar-5.0 Support current Linux distribution(s) Improve interoperability with standard OSCAR –Users obtain via “SSS OSCAR Pkg Repository” –Likely leverage “Package Sets” for logical grouping –Clarify SSS package dependencies What about outside of SSS-OSCAR? Improved testing –Supply thorough installation/validation/performance tests Documentation –Specifications for component interfaces (schemas), etc.

Oak Ridge National Laboratory -- U.S. Department of Energy 12 Comments/Discussion Provide a lower cost of entry –Doc to help knit system together Clarify dependencies/interactions –Intra-component and inter-component Feedback to help Ron O. for testing/validation –Tests to verify against component specs. –Ex. The PM specs state X capability & it work in this build –Effectively conformance tests to “optional” SSS specs. What do we need to help coming releases? –Louder drum for Thomas? –Dedicated integration periods (face-to-face and/or virtual)?

Oak Ridge National Laboratory -- U.S. Department of Energy 13 Resources ORNL test clusters –Systems: sss-xtorc, test1, test2 –Access via ORNL SSH Login Server –Must do reservations/coordinate use (Note, no remote power mgmt) Investigating ORNL “FRE” (enclaves) –Add “testX” system to alleviate ORNL SSH Login Server SSS-OSCAR Project page –Hosted at OSCAR Homepage – –Includes “HOWTO: Create an OSCAR Package” document

Oak Ridge National Laboratory -- U.S. Department of Energy 14 BCWG RMWG PMWG Color code: Accounting File System Event Manager Service Directory Meta Scheduler Meta Monitor Meta Manager Scheduler User DB Allocation Management Process Manager Usage Reports User Utilities High Performance Communication & I/O Application Environment Meta Services System & Job Monitor Checkpoint / Restart Grid Interfaces Job Queue Manager These Interface To all Node Configuration & Build Manager Components written in any mixture of C, C++, Java, Perl, and Python can be integrated into the Scalable Systems Software Suite Standard XML interfaces Working Components and Interfaces (bold) Spr’03 Validation & Testing VIWG N/A

Oak Ridge National Laboratory -- U.S. Department of Energy 15 OSCAR Features Slated for 4.x/5.0* v4.1 (freeze Feb 15) –Integrate APItest –Smarter RPM uninstall (PackageInUn) –“oscar-release” RPM –Intel Compilers OPkg v4.3 (freeze Aug 15) –Smarter RPM uninstall (NEST) –Support for “Package Sets” –Debian/x86 support? –VServer testing harness v4.2 (freeze May 15) –New DB schema –New DB API/Library (v1) –NEST for OPkg mgmt –Debian PackMan/DepMan v5.0 (freeze Oct 10) –“LibOPkg” available –SGE –Fedora Core 3/4, x86, x86-64 * This list is speculative and only highlights items that would likely help/effect SSS releases.

Oak Ridge National Laboratory -- U.S. Department of Energy 16 Tentative OSCAR Release Schedule OSCAR Version Release Timeframe Freeze Date v4.1MayFeb 15 v4.2JuneMay 15 v4.3AugustAug 15 v4.4/5.0SC’05Oct 10