Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.

Slides:



Advertisements
Similar presentations
Threads, SMP, and Microkernels
Advertisements

S. Venkatesan Department of Computer Science Spring 2003 Operating Systems Principles Class 1.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
14 Macintosh OS X Internals. © 2005 Pearson Addison-Wesley. All rights reserved The Macintosh Platform 1984 – first affordable GUI Based on Motorola 32-bit.
Palacios and Kitten: New High Performance Operating Systems For Scalable Virtualized and Native Supercomputing John R. Lange and Kevin Pedretti Trammell.
Chorus and other Microkernels Presented by: Jonathan Tanner and Brian Doyle Articles By: Jon Udell Peter D. Varhol Dick Pountain.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
1. Overview  Introduction  Motivations  Multikernel Model  Implementation – The Barrelfish  Performance Testing  Conclusion 2.
1 Virtual Machine Resource Monitoring and Networking of Virtual Machines Ananth I. Sundararaj Department of Computer Science Northwestern University July.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
INTRODUCTION OS/2 was initially designed to extend the capabilities of DOS by IBM and Microsoft Corporations. To create a single industry-standard operating.
OPERATING SYSTEM OVERVIEW
Figure 1.1 Interaction between applications and the operating system.
Why Linux is a Bad Idea as a Compute Node OS (for Balanced Systems) Ron Brightwell Sandia National Labs Scalable Computing Systems Department
Hardware/Software Concepts Tran, Van Hoai Department of Systems & Networking Faculty of Computer Science & Engineering HCMC University of Technology.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.
Stack Management Each process/thread has two stacks  Kernel stack  User stack Stack pointer changes when exiting/entering the kernel Q: Why is this necessary?
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Optimizing Threaded MPI Execution on SMP Clusters Hong Tang and Tao Yang Department of Computer Science University of California, Santa Barbara.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Ceng Operating Systems
Computer System Architectures Computer System Software
AICS Café – 2013/01/18 AICS System Software team Akio SHIMADA.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
ICOM Noack Operating Systems - Administrivia Prontuario - Please time-share and ask questions Info is in my homepage amadeus/~noack/ Make bookmark.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
An architecture for space sharing HPC and commodity workloads in the cloud Jack Lange Assistant Professor University of Pittsburgh.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
G-JavaMPI: A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports Lin Chen, Cho-Li Wang, Francis C. M. Lau and.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Processes Introduction to Operating Systems: Module 3.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
An OBSM method for Real Time Embedded Systems Veronica Eyo Sharvari Joshi.
EXTENSIBILITY, SAFETY AND PERFORMANCE IN THE SPIN OPERATING SYSTEM
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
Security Architecture and Design Chapter 4 Part 1 Pages 297 to 319.
A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
Full and Para Virtualization
Interconnection network network interface and a case study.
Operating-System Structures
Wide-Area Parallel Computing in Java Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences vrije Universiteit.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Background Computer System Architectures Computer System Software.
Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Software Architecture of Sensors. Hardware - Sensor Nodes Sensing: sensor --a transducer that converts a physical, chemical, or biological parameter into.
Introduction to Operating Systems Concepts
Unit OS2: Operating System Principles
Chapter 3: Windows7 Part 1.
Lecture Topics: 11/1 General Operating System Concepts Processes
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Designing a PC Farm to Simultaneously Process Separate Computations Through Different Network Topologies Patrick Dreher MIT.
Cluster Computers.
Presentation transcript:

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract DE-AC04-94AL A Lightweight Kernel Operating System for PetaFLOPS-Era Supercomputers (AKA The Lightweight Kernel Project) Overview and Current Status Ron Brightwell Sandia National Laboratories

Outline History Project overview Current status Future directions

Original LWK Project Goals Three-year project to design and implement next- generation lightweight kernel for compute nodes of a distributed memory massively parallel system Assess the performance and reliability of a lightweight kernel versus a traditional monolithic kernel Investigate efficient methods of supporting dynamic operating system services Leverage open-source OS projects as much as possible

Original Approach Port Cougar LWK to Cplant™ cluster and perform a direct comparison with Linux –Performance –Scalability –Determinism –Reliability

Limitations of Original Approach Cougar –Not open-source –Export controlled –Not portable –Old Cplant™ –Alpha is gone –Old

Current Approach Short-term –Compare Cougar and Linux on ASCI/Red hardware Beyond that –Figure out how best to leverage Linux or other open-source operating systems to achieve important characteristics of previous LWKs –Provide a basis for future OS research activities

Motivation for Linux/Cougar Comparison No direct comparison of LWK versus full-service OS since SUNMOS versus OSF1/AD nearly ten years ago Much has changed (improved?) since A direct comparison between a LWK and Linux is important for providing insight into what is important Platform balance is important Need real numbers to show people like (Beckman|Minnich|Riesen|Camp)

ASCI Red Hardware 4640 compute nodes –Dual 333 MHz Pentium II Xeons –256 MB RAM 400 MB/sec bi-directional network links 38x32x2 mesh topology Red/Black switchable First machine to demonstrate 1+ TFLOPS 2.38/3.21 TFLOPS Deployed in 1997

ASCI Red Development Systems Polaris –8 nodes –200 MHz Pentium Pro –Everything else is the same Same memory subsystem Nighten –144 nodes –Identical hardware as production ASCI Red machine

ASCI Red Compute Node Software Puma lightweight kernel –Follow-on to Sandia/UNM Operating System (SUNMOS) –Developed for 1024-node nCUBE-2 in 1993 by Sandia/UNM –Ported to 1800-node Intel Paragon in 1995 by Sandia/UNM –Ported to ASCI Red in 1996 by Intel and Sandia –Productized as “Cougar” by Intel

ASCI Red Software (cont’d) Cougar –Space-shared model –Exposes all resources to applications –Consumes less than 1% of compute node memory –Four different execution modes for managing dual processors –Portals 2.0 High-performance message passing Avoid buffering and memory copies Supports multiple user-level libraries (MPI, Intel N/X, Vertex, etc.)

Cougar Goals Targets high performance scientific and engineering applications on tightly coupled distributed memory architectures Scalable to tens of thousands of processors Fast message passing and execution Small memory footprint Persistent (fault tolerant)

Cougar Approach Separate policy decision from policy enforcement Move resource management as close to application as possible Protect applications from each other Let user processes manage resources Get out of the way

Cougar General Structure Q-Kernel: message passing, memory protection App. 1 libmpi.a libc.a PCT App. 3 libvertex.a libc.a App. 2 libnx.a libc.a

Cougar Quintessential Kernel (QK) Policy enforcer Initializes hardware Handles interrupts and exceptions Maintains hardware virtual addressing No virtual memory support Static size Small size Non-blocking Few, well defined entry points

Cougar Process Control Thread (PCT) Runs in user space More privileged than user applications Policy maker –Process loading –Process scheduling –Virtual address space management –Name server –Fault handling

Cougar PCT (cont’d) Customizable –Single-tasking or multi-tasking –Round robin or priority scheduling –High performance, debugging, or profiling version Changes behavior of OS without changing the kernel

Cougar Processor Modes Chosen at job launch time Heater mode (proc 0) –QK/PCT and application process on system CPU Message co-processor mode (proc 1) –QK/PCT on system CPU –Application process on second CPU Compute co-processor mode (proc 2) –QK/PCT and application process on system CPU –Application co-routines on on second CPU Virtual node mode (proc 3) –QK/PCT and application process on system CPU –Second application process on second CPU

Linux on ASCI Red RedHat Linux Adapted Linux bootloader and startup code to work with bootmesh protocol Service node receives Linux kernel via bootmesh and root filesystem from attached SCSI disk Compute nodes mount root filesystem from service node Sparse compute node services –sshd for remote access –Enough libraries for MPI jobs to run

Linux IP Implementation for ASCI Red Implemented a Linux network driver for CNIC –Interrupt-driven ring buffer –Based on isa-skeleton.c Varying IP MTU from 4 KB (1 page) to 16 KB (4 pages) showed no noticeable difference in bandwidth Bandwidth is CPU limited –45 MB/s for 333 Mhz processors –32 MB/s for 200 MHz processors Custom raw device achieved 310 MB/s

Linux Processor Modes Modified CNIC driver to support Cougar processor modes –Little difference in performance due to interrupts Virtual node mode is default

MPI Ping-Pong Latency

NPB CG

NPB IS

NPB MG

CTH Family of Codes Models complex multi-dimensional, multi-material problems characterized by large deformations and/or strong shocks Uses two-step, second-order accurate finite- difference Eulerian solution Material models for equations of state, strength, fracture, porosity, and high explosives Impact, penetration, perforation, shock compression, high explosive initiation and detonation problems

CTH Steps CTHGEN –Problem setup Create computational mesh, insert materials, calculate volume fraction of each material in cells –Assign material properties and run-time controls Broadcasting data is main type of message passing –Generate initial restart file, one file per node CTH –Read initial restart file, one file per node –Simulate shock wave physics Many nearest-neighbor communications, a few global reductions per time step –Write results to restart, history, and viz files –Performance measured in grind time Time to compute all calculations on a single cell for a single time step

CTH Performance

Issues Compilers and runtime –Cougar numbers are from (old) PGI compilers –Linux numbers are from (new) Intel compilers Determinism –No variability in Cougar execution times Even on a loaded machine –Significant (>5%) variability in Linux execution times Level of effort –Maintaining LWK may be equivalent to maintaining a Linux driver

Ongoing Activities Completed implementation of Portals 3.2 CNIC driver in Linux –55 µs latency, 296 MB/s Currently gathering data for NPB and CTH –Need to debug MPI implementation and runtime system Linux 2.5 –Large page support Cougar –Provide a modern set of compilers/libraries

Conclusions Don’t have a real apples-to-apples comparison yet Will have a Granny Smith-to-Red Delicious comparison soon

Acknowledgments Project team –Marcus Epperson, Mike Levenhagen, Rolf Riesen, Keith Underwood, Zhaofang Wen (Sandia) –Kurt Ferreira (HP) –Trammell Hudson (OS Research, Inc.) –Patrick Bridges, Barney Maccabe (UNM)