Oak Ridge National Laboratory -- U.S. Department of Energy 1 SSS Deployment using OSCAR John Mugler, Thomas Naughton & Stephen Scott May 2005, Argonne, IL SSS Face-to-face meeting
Oak Ridge National Laboratory -- U.S. Department of Energy 2 OSCAR: Cluster Toolkit Framework for cluster management –simplifies installation, configuration and operation –reduces time/learning curve for cluster build requires: pre-installed headnode w. supported Linux distribution thereafter: wizard guides user thru setup/install of entire cluster Package-based framework –Content: Software + Configuration, Tests, Docs –Types: Core: SIS, C3, Switcher, ODA, OPD, (Support Libs) Non-core: selected & third-party –Access: repositories accessible via OPD/OPDer
Oak Ridge National Laboratory -- U.S. Department of Energy 3 OSCAR Wizard * OSCAR-3.0 release
Oak Ridge National Laboratory -- U.S. Department of Energy 4 Using OSCAR for SSS Problem: Helping users obtain and install SSS software. Solution: Leverage OSCAR framework to package and distribute the SSS suite, sss-oscar. sss-oscar A release of OSCAR containing all SSS software in single downloadable bundle.
Oak Ridge National Laboratory -- U.S. Department of Energy 5 OSCAR-ized SSS Components Bamboo – Queue/Job Manager BLCR – Berkeley Checkpoint/Restart Gold – Accounting & Allocation Management System LAM/MPI (w/ BLCR) – Checkpoint/Restart enabled MPI MAUI-SSS – Job Scheduler SSSLib – SSS Communication library –Includes: SD, EM, PM, BCM, NSM, NWI Warehouse – Distributed System Monitor MPD2 – MPI Process Manager * As of May 2005
Oak Ridge National Laboratory -- U.S. Department of Energy 6 Current Status Released v1.0 at SC’04 –Based on oscar-3.0 (using Red Hat 9/x86) –All SSS components represented Testing for v1.1 release –Small update release –Still oscar-3.0 based Synchronize with OSCAR release schedule –oscar-4.1 released –Shift to oscar-4.1 in sss-oscar-1.2 release (2Q2005)
Oak Ridge National Laboratory -- U.S. Department of Energy 7 OSCAR v4.1 Highlights SSS’s APItest tool integrated into v4.1 release Improved use of DepMan/PackMan abs. layer Distributions supported in v4.1 –x86: RH 9, FC2, MDK 10.0 –x86 & ia64: RH EL 3 Initial work started for Debian –Not in v4.1 release but working with 4.x devel tree
Oak Ridge National Laboratory -- U.S. Department of Energy 8 TODO: SSS Short term –Complete testing for v1.1beta & release –Update SSS documentation Medium term –Migrate to new FRE testbed and repository (pending approval) –New/more Linux distribution/architecture/kernel support Longer term –Extend SSS component tests 1) Installation, 2) Validation, 3) Durability/Stress, 4) Performance –Track oscar-4.x releases for v5.0 compatibility –Distribute as OSCAR “Package Set” Pending feature support in OSCAR –OPKG ordering within a phase Pending feature support in OSCAR
Oak Ridge National Laboratory -- U.S. Department of Energy 9 SSS-OSCAR Release Schedule SSS Version Freeze Date Release Timeframe Based on OSCAR v1.1Feb 15Mayoscar-3.0 v1.2Jun 15Julyoscar-4.1 v1.3Aug 15Septoscar-4.x v1.4/2.0Oct 15Nov - SC’05oscar-5.0 Add features to /
Oak Ridge National Laboratory -- U.S. Department of Energy 10 Roadmap 1.2 (frz: jun, rel: jul) –Fedor Core 2 / Pkg rebuild BLCR upgrade to linux-2.6 –Improved install/validation tests –oscar-4.1 opkg modifications (updates) Updates to HOWTO as needed Simplify XML meta file –Close (most) open tracker issues 2.0 (frz: aug, rel: sep) –LRS change over –Fedora Core 4 / Pkg rebuild –Improved install/validation tests –Add performance/stress tests? –oscar-4.x opkg modifications (updates) Updates to HOWTO as needed –Meta-scheduler (Silver)? (frz: oct, rel: nov) [SC’05] –Any bugfixes/minor updates 2.02 –SSS oscar-pkg set
Oak Ridge National Laboratory -- U.S. Department of Energy 11 Goals for sss-oscar-2.0 Release v2.0 at SC’05 Compatible with oscar-5.0 Support current Linux distribution(s) Improve interoperability with standard OSCAR –Users obtain via “SSS OSCAR Pkg Repository” –Likely leverage “Package Sets” for logical grouping –Clarify SSS package dependencies What about outside of SSS-OSCAR? Improved testing –Supply thorough installation/validation/performance tests Documentation –Specifications for component interfaces (schemas), etc.
Oak Ridge National Laboratory -- U.S. Department of Energy 12 Comments/Discussion Provide a lower cost of entry –Doc to help knit system together Clarify dependencies/interactions –Intra-component and inter-component Feedback to help Ron O. for testing/validation –Tests to verify against component specs. –Ex. The PM specs state X capability & it work in this build –Effectively conformance tests to “optional” SSS specs. What do we need to help coming releases? –Louder drum for Thomas? –Dedicated integration periods (face-to-face and/or virtual)?
Oak Ridge National Laboratory -- U.S. Department of Energy 13 Resources ORNL test clusters –Systems: sss-xtorc, test1, test2 –Access via ORNL SSH Login Server –Must do reservations/coordinate use (Note, no remote power mgmt) Investigating ORNL “FRE” (enclaves) –Add “testX” system to alleviate ORNL SSH Login Server SSS-OSCAR Project page –Hosted at OSCAR Homepage – –Includes “HOWTO: Create an OSCAR Package” document
Oak Ridge National Laboratory -- U.S. Department of Energy 14 BCWG RMWG PMWG Color code: Accounting File System Event Manager Service Directory Meta Scheduler Meta Monitor Meta Manager Scheduler User DB Allocation Management Process Manager Usage Reports User Utilities High Performance Communication & I/O Application Environment Meta Services System & Job Monitor Checkpoint / Restart Grid Interfaces Job Queue Manager These Interface To all Node Configuration & Build Manager Components written in any mixture of C, C++, Java, Perl, and Python can be integrated into the Scalable Systems Software Suite Standard XML interfaces Working Components and Interfaces (bold) Spr’03 Validation & Testing VIWG N/A
Oak Ridge National Laboratory -- U.S. Department of Energy 15 OSCAR Features Slated for 4.x/5.0* v4.1 (freeze Feb 15) –Integrate APItest –Smarter RPM uninstall (PackageInUn) –“oscar-release” RPM –Intel Compilers OPkg v4.3 (freeze Aug 15) –Smarter RPM uninstall (NEST) –Support for “Package Sets” –Debian/x86 support? –VServer testing harness v4.2 (freeze May 15) –New DB schema –New DB API/Library (v1) –NEST for OPkg mgmt –Debian PackMan/DepMan v5.0 (freeze Oct 10) –“LibOPkg” available –SGE –Fedora Core 3/4, x86, x86-64 * This list is speculative and only highlights items that would likely help/effect SSS releases.
Oak Ridge National Laboratory -- U.S. Department of Energy 16 Tentative OSCAR Release Schedule OSCAR Version Release Timeframe Freeze Date v4.1MayFeb 15 v4.2JuneMay 15 v4.3AugustAug 15 v4.4/5.0SC’05Oct 10