Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.

Similar presentations


Presentation on theme: "Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear."— Presentation transcript:

1 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. A Framework for Adaptable Operating and Runtime Systems Ron Brightwell Scalable Computing Systems, 9223 Sandia National Laboratories Albuquerque, New Mexico, USA SciDAC Scalable System Software Meeting August 26, 2004

2 Project Details Sandia National Laboratories –Neil Pundit, Project Director –Ron Brightwell, Coordinating PI –Rolf Riesen University of New Mexico –Barney Maccabe, PI –Patrick Bridges California Institute of Technology –Thomas Sterling, PI FY05 budget –$1.05M

3 What’s wrong with current operating systems?

4 Cluster Network Hardware DMA between NIC and host memory –Physical addresses on NIC –Can be user- or kernel- space –Memory descriptors on NIC Benefits associated with offloading –Reduced overhead –Increased bandwidth –Reduced latency

5 OS Bypass Versus Splintering Cluster Architecture Distributing little bits of the OS

6 Other Issues General-purpose operating systems –Generality comes at the cost of performance for all applications –Assume a generic architectural model Difficult to expose novel features Lightweight operating systems –Limited functionality –Difficult to add new features –Designed to be used in the context of a specific usage model Operating system is an impediment to new architectures and programming models

7 Factors Impacting OS Design

8 LWK Influences Lightweight OS –Small collection of apps Single programming model –Single architecture –Single usage model –Small set of shared services –No history Puma/Cougar –MPI –Distributed memory –Space-shared –Parallel file system –Batch scheduler

9 Programming Models

10 Usage Models

11 Current and Future System Demands Architecture –Modern ultrascale machines have widely varying system-level and node-level architectures –Future systems will have further hardware advances (e.g., multi-core chips, PIMs) Programming model –MPI, Thread, OpenMP, PGAS, … External services –Parallel file systems, dynamic libraries, checkpoint/restart, … Usage model –Single, large, long-running simulation –Parameter studies with thousands of single-processor, short- running jobs

12 Project Goals Realize a new generation of scalable, efficient, reliable, easy to use operating systems for a broad range of future ultrascale high-end computing systems based on both conventional and advanced hardware architectures and in support of diverse, current and emerging parallel programming models. Devise and implement a prototype system that provides a framework for automatically configuring and building lightweight operating and runtime system based on the requirements presented by an application, system usage model, system architecture, and the combined needs for shared services.

13 Approach Define and build a collection of micro-services –Small components with well-defined interfaces –Implement an indivisible portion of service semantics –Fundamental elements of composition and re-use Combine micro-services specifically for an application and a target platform Develop tools to facilitate the synthesis of required micro-services

14 Tools for Combining Micro-Services Need to insure that required micro-services are available Need to insure that applications are isolated from one another within the context of a given usage model Verifying that a set of constraints are met Further work will allow for reasoning about additional system properties, such as performance based on feedback from previous runs

15 Building Custom Operating/Runtime Systems

16 Signal Delivery Example

17 Timetable 12 months –Define basic framework and micro-service APIs –Define initial micro-services for supporting a lightweight kernel equivalent –Identify applications and related metrics for evaluating resulting systems 24 months –Demonstrate configuration and linking tools with multiple lightweight kernel configurations –Define application-specific micro-services for optimizing application performance –Define shared-service micro-services for common application services (e.g. TCP/IP)

18 Timetable (cont’d) 36 months –Demonstrate instance of framework for PIM-based system on base-level PIM architecture simulator –Demonstrate application/kernel configurability using application-specific and shared-service micro-services –Release complete software package as open source –Provide detailed report summarizing completed and future work

19 Related Work Microkernels –K42, L4, Pebble, Mach, … –Exo-kernel Extensible operating systems –Spin, Vino, sandboxing, … –Modules Configurable OS/Runtime –Scout, Think, Flux OSKit, eCos, TinyOS –SREAMS, x-kernel, CORDS

20 More Info “Highly Configurable Operating Systems for Ultrascale Systems,” Maccabe et al. In Proceedings of the First International Workshop on Operating Systems, Programming Environments and Management Tools for High- Performance Computing on Clusters (COSET-1), June 2004. (http://coset.irisa.fr/)http://coset.irisa.fr/ Out-of-date web pages coming soon


Download ppt "Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear."

Similar presentations


Ads by Google