Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.

Slides:



Advertisements
Similar presentations
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Advertisements

Department of Computer Science and Engineering University of Washington Brian N. Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
Ensuring Operating System Kernel Integrity with OSck By Owen S. Hofmann Alan M. Dunn Sangman Kim Indrajit Roy Emmett Witchel Kent State University College.
Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin.
1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Extensible Kernels Edgar Velázquez-Armendáriz September 24 th 2009.
Automated Analysis and Code Generation for Domain-Specific Models George Edwards Center for Systems and Software Engineering University of Southern California.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
July Terry Jones, Integrated Computing & Communications Dept Fast-OS.
Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
Chapter 13 Embedded Systems
An Agent-Oriented Approach to the Integration of Information Sources Michael Christoffel Institute for Program Structures and Data Organization, University.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Slide 3-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 3 Operating System Organization.
Automated Tests in NICOS Nightly Control System Alexander Undrus Brookhaven National Laboratory, Upton, NY Software testing is a difficult, time-consuming.
Computer Science 101 The Virtual Machine: Operating Systems.
Chapter 2 Operating System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Wind River VxWorks Presentation
Computer System Architectures Computer System Software
Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.
SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.
Providing a Software Quality Framework for Testing of Mobile Applications Dominik Franke and Carsten Weise RWTH Achen University Embedded Software Laboratory.
A Cloud is a type of parallel and distributed system consisting of a collection of inter- connected and virtualized computers that are dynamically provisioned.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
LiNK: An Operating System Architecture for Network Processors Steve Muir, Jonathan Smith Princeton University, University of Pennsylvania
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Providing Policy Control Over Object Operations in a Mach Based System By Abhilash Chouksey
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Floating-Point Reuse in an FPGA Implementation of a Ray-Triangle Intersection Algorithm Craig Ulmer June 27, 2006 Sandia is a multiprogram.
N. GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
SciDAC SSS Quarterly Report Sandia Labs August 27, 2004 William McLendon Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed.
Numerical Libraries Project Microsoft Incubation Group Mary Beth Hribar Microsoft Corporation CSCAPES Workshop June 10, 2008 Copyright Microsoft Corporation,
System Architecture: Near, Medium, and Long-term Scalable Architectures Panel Discussion Presentation Sandia CSRI Workshop on Next-generation Scalable.
1 Choices “Our object-oriented system architecture embodies the notion of customizing operating systems to tailor them to support particular hardware configuration.
EXTENSIBILITY, SAFETY AND PERFORMANCE IN THE SPIN OPERATING SYSTEM
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO.
Threading Opportunities in High-Performance Flash-Memory Storage Craig Ulmer Sandia National Laboratories, California Maya GokhaleLawrence Livermore National.
A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.
Erik P. DeBenedictis Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of.
Other Tools HPC Code Development Tools July 29, 2010 Sue Kelly Sandia is a multiprogram laboratory operated by Sandia Corporation, a.
March 4, 2003SOS-71 FAST-OS Arthur B. (Barney) Maccabe Computer Science Department The University of New Mexico SOS 7 Durango, Colorado March 4, 2003.
Performing Fault-tolerant, Scalable Data Collection and Analysis James Jolly University of Wisconsin-Madison Visualization and Scientific Computing Dept.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory.
Virtual Directory Services and Directory Synchronization May 13 th, 2008 Bill Claycomb Computer Systems Analyst Infrastructure Computing Systems Department.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
SciDAC SSS Quarterly Report Sandia Labs January 25, 2005 William McLendon Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
COMPSCI 110 Operating Systems
For Massively Parallel Computation The Chaotic State of the Art
Distributed System Concepts and Architectures
Many-core Software Development Platforms
Ray-Cast Rendering in VTK-m
Collaborative Offloading for Distributed Mobile-Cloud Apps
Department of Computer Science University of California, Santa Barbara
Automated Analysis and Code Generation for Domain-Specific Models
The Anatomy and The Physiology of the Grid
Chapter 2 Operating System Overview
Department of Computer Science University of California, Santa Barbara
Lecture Topics: 11/1 Hand back midterms
Presentation transcript:

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL A Framework for Adaptable Operating and Runtime Systems Ron Brightwell Scalable Computing Systems, 9223 Sandia National Laboratories Albuquerque, New Mexico, USA SciDAC Scalable System Software Meeting August 26, 2004

Project Details Sandia National Laboratories –Neil Pundit, Project Director –Ron Brightwell, Coordinating PI –Rolf Riesen University of New Mexico –Barney Maccabe, PI –Patrick Bridges California Institute of Technology –Thomas Sterling, PI FY05 budget –$1.05M

What’s wrong with current operating systems?

Cluster Network Hardware DMA between NIC and host memory –Physical addresses on NIC –Can be user- or kernel- space –Memory descriptors on NIC Benefits associated with offloading –Reduced overhead –Increased bandwidth –Reduced latency

OS Bypass Versus Splintering Cluster Architecture Distributing little bits of the OS

Other Issues General-purpose operating systems –Generality comes at the cost of performance for all applications –Assume a generic architectural model Difficult to expose novel features Lightweight operating systems –Limited functionality –Difficult to add new features –Designed to be used in the context of a specific usage model Operating system is an impediment to new architectures and programming models

Factors Impacting OS Design

LWK Influences Lightweight OS –Small collection of apps Single programming model –Single architecture –Single usage model –Small set of shared services –No history Puma/Cougar –MPI –Distributed memory –Space-shared –Parallel file system –Batch scheduler

Programming Models

Usage Models

Current and Future System Demands Architecture –Modern ultrascale machines have widely varying system-level and node-level architectures –Future systems will have further hardware advances (e.g., multi-core chips, PIMs) Programming model –MPI, Thread, OpenMP, PGAS, … External services –Parallel file systems, dynamic libraries, checkpoint/restart, … Usage model –Single, large, long-running simulation –Parameter studies with thousands of single-processor, short- running jobs

Project Goals Realize a new generation of scalable, efficient, reliable, easy to use operating systems for a broad range of future ultrascale high-end computing systems based on both conventional and advanced hardware architectures and in support of diverse, current and emerging parallel programming models. Devise and implement a prototype system that provides a framework for automatically configuring and building lightweight operating and runtime system based on the requirements presented by an application, system usage model, system architecture, and the combined needs for shared services.

Approach Define and build a collection of micro-services –Small components with well-defined interfaces –Implement an indivisible portion of service semantics –Fundamental elements of composition and re-use Combine micro-services specifically for an application and a target platform Develop tools to facilitate the synthesis of required micro-services

Tools for Combining Micro-Services Need to insure that required micro-services are available Need to insure that applications are isolated from one another within the context of a given usage model Verifying that a set of constraints are met Further work will allow for reasoning about additional system properties, such as performance based on feedback from previous runs

Building Custom Operating/Runtime Systems

Signal Delivery Example

Timetable 12 months –Define basic framework and micro-service APIs –Define initial micro-services for supporting a lightweight kernel equivalent –Identify applications and related metrics for evaluating resulting systems 24 months –Demonstrate configuration and linking tools with multiple lightweight kernel configurations –Define application-specific micro-services for optimizing application performance –Define shared-service micro-services for common application services (e.g. TCP/IP)

Timetable (cont’d) 36 months –Demonstrate instance of framework for PIM-based system on base-level PIM architecture simulator –Demonstrate application/kernel configurability using application-specific and shared-service micro-services –Release complete software package as open source –Provide detailed report summarizing completed and future work

Related Work Microkernels –K42, L4, Pebble, Mach, … –Exo-kernel Extensible operating systems –Spin, Vino, sandboxing, … –Modules Configurable OS/Runtime –Scout, Think, Flux OSKit, eCos, TinyOS –SREAMS, x-kernel, CORDS

More Info “Highly Configurable Operating Systems for Ultrascale Systems,” Maccabe et al. In Proceedings of the First International Workshop on Operating Systems, Programming Environments and Management Tools for High- Performance Computing on Clusters (COSET-1), June ( Out-of-date web pages coming soon