Why Linux is a Bad Idea as a Compute Node OS (for Balanced Systems) Ron Brightwell Sandia National Labs Scalable Computing Systems Department

Slides:



Advertisements
Similar presentations
User-Mode Linux Ken C.K. Lee
Advertisements

VMWare to Hyper-V FOR SERVER What we looked at before migration  Performance – Hyper-V performs at near native speeds.  OS Compatibility – Hyper-V.
Beowulf Supercomputer System Lee, Jung won CS843.
1 Cplant I/O Pang Chen Lee Ward Sandia National Laboratories Scalable Computing Systems Fifth NASA/DOE Joint PC Cluster Computing Conference October 6-8,
Introduction to Operating Systems CS-2301 B-term Introduction to Operating Systems CS-2301, System Programming for Non-majors (Slides include materials.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Bugnion et al. Presented by: Ahmed Wafa.
1 BGL Photo (system) BlueGene/L IBM Journal of Research and Development, Vol. 49, No. 2-3.
G Robert Grimm New York University Disco.
Nooks: an architecture for safe device drivers Mike Swift, The Wild and Crazy Guy, Hank Levy and Susan Eggers.
Operating System Structure. Announcements Make sure you are registered for CS 415 First CS 415 project is up –Initial design documents due next Friday,
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
Operating Systems - Introduction S H Srinivasan
CS 300 – Lecture 23 Intro to Computer Architecture / Assembly Language Virtual Memory Pipelining.
1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, and Mendel Rosenblum, Stanford University, 1997.
CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY URL:
OPERATING SYSTEMS Introduction
Introduction Operating Systems’ Concepts and Structure Lecture 1 ~ Spring, 2008 ~ Spring, 2008TUCN. Operating Systems. Lecture 1.
Threads CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
Operating System Organization
Slide 3-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 3 Operating System Organization.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Dreams in a Nutshell Steven Sommer Microsoft Research Institute Department of Computing Macquarie University.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Chapter 3 Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-7 Memory Management (1) Department of Computer Science and Software.
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
ICOM Noack Operating Systems - Administrivia Prontuario - Please time-share and ask questions Info is in my homepage amadeus/~noack/ Make bookmark.
An architecture for space sharing HPC and commodity workloads in the cloud Jack Lange Assistant Professor University of Pittsburgh.
Tessellation: Space-Time Partitioning in a Manycore Client OS Rose Liu 1,2, Kevin Klues 1, Sarah Bird 1, Steven Hofmeyr 3, Krste Asanovic 1, John Kubiatowicz.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview Part 2: History (continued)
Comparison of Distributed Operating Systems. Systems Discussed ◦Plan 9 ◦AgentOS ◦Clouds ◦E1 ◦MOSIX.
Slide 3-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 3.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.
Processes Introduction to Operating Systems: Module 3.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Types of Operating Systems 1 Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
Lecture Topics: 11/24 Sharing Pages Demand Paging (and alternative) Page Replacement –optimal algorithm –implementable algorithms.
Full and Para Virtualization
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
CSE 451: Operating Systems Winter 2015 Module 25 Virtual Machine Monitors Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska,
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Background Computer System Architectures Computer System Software.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Introduction to Operating Systems Concepts
Computer System Structures
Virtual Machine Monitors
Kernel Design & Implementation
Operating System Structure
Is System X for Me? Cal Ribbens Computer Science Department
Chapter 8: Main Memory.
Page Replacement.
Lecture Topics: 11/1 General Operating System Concepts Processes
LINUX System : Lecture 7 Lecture notes acknowledgement : The design of UNIX Operating System.
Operating Systems: A Modern Perspective, Chapter 3
CS703 - Advanced Operating Systems
Operating System Concepts
Operating System Concepts
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Presentation transcript:

Why Linux is a Bad Idea as a Compute Node OS (for Balanced Systems) Ron Brightwell Sandia National Labs Scalable Computing Systems Department

Target Architecture Distributed memory, message passing Partition model of resources –Compute nodes Small number of CPUs Diskless High performance network –Peak processor speed in MHz is near peak network bandwidth in MB/s –Service nodes –Disk I/O nodes –Network I/O nodes

Target Applications Resource constrained –Can consume at least one resource (memory, memory bandwidth, processing, network, etc.) –All resources are precious A single run may consume the entire system for days Primary concern is execution time

Why Linux for Cplant™? Free (speech & beer) Large developer community Kernel modules –No need to reboot during development –Supports partition model Supported on several platforms Familiarity with Linux –Ported Linux 2.0 to ASCI/Red

Results Cplant™ is now open source Large developer community is a wash –Most developers not focused on HPC and scaling issues –Extreme Linux helped –Extreme Linux isn’t very extreme Modules –Big help in developing the networking stack Portals over any network device –Myrinet –RTS/CTS to skbufs –Portals over IP –Portals over IP in kernel Cplant™ runs on Alpha, x86, IA-64 Linux changes too often to really be familiar

Other Observations Reliability –Linux likely hasn’t been the cause of any machine interrupts But we can’t really be sure –Main selling point of Linux for the server market Application development environment more extensive –Compilers, debuggers, tools Lots of stuff we don’t have to worry about –Device drivers: Ethernet, Serial –BIOS’s –Hardware bugs Linux works OK for Cplant™ and commodity-based clusters

Technical Issues Predictability – avoid work unrelated to the computation –Linux on Alpha takes 1000 interrupts per second - to keep time –Daemons: init, inetd, ipciod –Kernel threads: kswapd, kflushd, kupdate, kpiod –Seen as much as a 10x variability in execution time –Inappropriate resource management strategies VM system –Adverse impact on message passing –No physically contiguous memory –Must pin memory pages –Must maintain page tables for NIC –Fighting the page cache How much memory is there?

Technical Issues (cont’d) Requires a filesystem –fork/exec model –Not appropriate for diskless compute nodes Complexity –We haven’t done anything with Linux because it’s not easy –Virtual node mode added to LWK by two relatively inexperienced kernel developers in 6 months

Social Issues Kernel development moves fast –Significant resources needed to keep up Distributions and development environments also change frequently –Tool vendors have trouble keeping up Linus changed out the VM system in the middle of the 2.4 kernels! –2.4.9 – van Riel VM system – – Arcangeli VM system 150+ patches to the van Riel VM system – Linux fork? Server vs. multimedia desktop –Not HPC

Social Issues (cont’d) Forced to take the good with the bad –Want NFS v3, don’t want OOM killer Fairly fixed set of requirements –Linux doesn’t allow us to concentrate on those Staying focused –Linux community not addressing HPC issues Linux is going to die soon anyway –Ron Minnich says so

LWK NOP Trap Performance Proc 0 mode – µs Proc 1 mode – µs Proc 3 mode – µs – µs Proc 0 mode with –share – µs – µs

Lightweight Kernel? Puma –section size –.text –.data –.bss –Total PCT –section size –.text –.data –.bss –Total

Memory Use Breakdown Executable –164833(.text) (.data) (.bss) = Proc 0 / Proc 1 mode –Text = –Data = –Stack = –Heap = –NX Heap = Proc 3 mode –Text = –Data = –Stack = –Heap = –NX Heap =

Source Lines Of Code (SLOC)* QK –9078 CPU independent –10061 x86-specific –6088 i860-specific PCT yod Libraries –I/O and C – –MPI Device independent Portals – 5234 Linux (kernel, init, mm, include/linux) *Generated using 'SLOCCount' by David A. Wheeler

It’s not the size – it’s what you do with it

Rolf’s Points Linux is well-known –So is Windows Let other people do the work –The other people don’t care about HPC Daemons don’t really do anything –Then why have them? Use a real-time variant of Linux –Real-time scares me - PROSE Somebody needs to tweak Linux –Tweaking is not easy

Summary Linux works OK for Cplant™ and commodity clusters –CPU performance is acceptable for cluster bytes-to- FLOPS ratio Likely performance issues for large-scale platforms with a reasonable bytes-to-FLOPS ratio Community is a mixed blessing