OS and System Software for Ultrascale Architectures – Panel Jeffrey Vetter Oak Ridge National Laboratory Presented to SOS8 13 April 2004 ack
JSV2 Panel Charge Each panelist has an short opportunity to stand on a soap box and start a riot if he wishes. The overall purpose of the panel is to raise the issues and suggest paths forward for OS and system software for Ultrascale architectures... [Standard disclaimers apply… ]
JSV3 Many Objectives for Ultrascale System Software – more than Performance Performance efficiency is critical However, other system software qualities can be equally important for effective Ultrascale computing –Functionality (compatibility) –Reliability, Availability, Serviceability –Usability, Administration How can we make the proper tradeoffs that balance performance with these other qualities?
JSV4 Some questions Imagine if you gain X% performance on your application by changing the system software, –Is it acceptable to make Y% of applications, libraries, tools incompatible? –Is it acceptable to make the system software Z% less reliable than before? Others –How many FTE’s should it take to keep an Ultrascale system up and running 24/7 ? –FTE scaling rate? Source code scaling rate? –Performance stability? How many possible configurations? Myopic focus on performance (or any single factor) can have long-term detrimental effects on overall Ultrascale system effectiveness
JSV5 ORNL is Involved in Several Projects that Span these Objectives OSCAR –Cluster building and installation toolkit Scalable Systems Software –Scalable, standardized management tools and interfaces for system management HARNESS –Customizable, runtime infrastructure for scientific computing
JSV6 OSCAR: Cluster Toolkit Framework for cluster management –Wizard based cluster software installation –Operating system –Cluster environment –Automatically configures cluster components –Increases consistency among cluster builds –Reduces time to build / install a cluster –Reduces need for expertise –requires: pre-installed headnode w. supported Linux distribution –thereafter: wizard guides user thru setup/install of entire cluster Package-based framework –Content: Software + Configuration, Tests, Docs –Types: –Core: SIS, C3, Switcher, ODA, OPD, (Support Libs) –Non-core: selected & third-party –Access: repositories accessible via OPD/OPDer Many partners… Over 120,000 downloads on Sourceforge!
JSV7 What’s next for OSCAR: High Availability release Goals: –COTS-based HPC solution towards non-stop services –Linux clustering production quality –Ease of build, operation, maintenance HA-OSCAR 1.0 Beta release (March 2004) –The first known field-grade HA Beowulf cluster release –Self-configuration Multi-head Beowulf system –HA and HPC clustering techniques to enable critical HPC infrastructure –Self-healing with 3-5 sec automatic failover time –1-1.5 hour to self-build failover headnodes w/o preloaded OS –Optional Image Server for disaster recovery –Support existing HPC App (e.g. MPI) without any modification
IBM Cray Intel SGI Scalable Systems Software Participating Organizations ORNL ANL LBNL PNNL NCSA PSC SDSC SNL LANL Ames Clemson Collectively (with industry) define standard interfaces between systems components for interoperability Create scalable, standardized management tools for efficiently running our large computing centers Problem Goals Impact Computer centers use incompatible, ad hoc set of systems tools Present tools are not designed to scale to multi-Teraflop systems Reduced facility mgmt costs. More effective use of machines by scientific applications. Resource Management Accounting & user mgmt System Build & Configure Job management System Monitoring learn more visit Schedulers, Job Managers System Monitors Accounting & User management Checkpoint/Restart Build & Configuration systems
JSV9 SSS Status Currently doing testing on 2nd pre-release * –Bundled for distribution via OSCAR –Builds full working cluster with current SSS pkgs –sss-oscar-0.2a4-v3.0 * Release information as of 3/29/04
JSV10 Grid Interfaces Accounting Event Manager Service Directory Meta Scheduler Meta Monitor Meta Manager Scheduler Node State Manager Allocation Management Process Manager Usage Reports Meta Services System & Job Monitor Job Queue Manager Node Configuration & Build Manager Working Components and Interfaces (bold) authentication communication Components written in any mixture of C, C++, Perl, Java, and Python can be integrated into the Scalable Systems Software Suite through defined XML interfaces Checkpoint / Restart Validation & Testing Hardware Infrastructure Manager OSCAR (Open Source Cluster Resources) used to package, build, and install the suite OSCAR-SSS Release of Scalable Systems Software Integrated Suite
JSV11 Harness Key ideas for Harness –Parallel plug-in interface that allows users or applications to dynamically customize, adapt, and extend the environment’s features –Distributed peer-to-peer control that prevents single point of failure –Multiple distributed virtual machines that can collaborate, merge, or split Collaborative effort between ORNL, University of Tennessee, and Emory University Design –Uses pluggable framework in C and Java –Allows to dynamically customize the computing environment to suit the applications needs –Manages a set of plug-ins as directed by the scientific application w/ lightweight kernel
JSV12 Harness Implementation The kernel provides only basic functions, such as dynamic loading and unloading of plug-ins Plug-ins offer a wide variety of services, where parallel plug-ins enable distributed services. –These services include: numerical libraries, parallel programming models, networking, resource discovery and distributed control –FT-MPI and PVM plug-ins available The pluggable remote method invocation framework RMIX provides IPC with standard protocols (e.g. RPC, JRMP and SOAP)
JSV13 Summary Performance is one important criteria Other important criteria include –Functionality (compatibility) –Reliability, Availability, Serviceability –Usability, Administration We need measurements, historical data –Costs, reliability, FTE levels ORNL is involved in several projects to address these issues –OSCAR, SSS, and HARNESS
Bonus Slides