Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Processes Management.
CGrid 2005, slide 1 Empirical Evaluation of Shared Parallel Execution on Independently Scheduled Clusters Mala Ghanesh Satish Kumar Jaspal Subhlok University.
CoMPI: Enhancing MPI based applications performance and scalability using run-time compression. Rosa Filgueira, David E.Singh, Alejandro Calderón and Jesús.
High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center.
2. Computer Clusters for Scalable Parallel Computing
The TickerTAIP Parallel RAID Architecture P. Cao, S. B. Lim S. Venkatraman, J. Wilkes HP Labs.
Introduction CSCI 444/544 Operating Systems Fall 2008.
Communication Pattern Based Node Selection for Shared Networks
JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Designing Parallel Operating Systems using Modern Interconnects Pitfalls.
Computer Systems/Operating Systems - Class 8
CS 3013 & CS 502 Summer 2006 Scheduling1 The art and science of allocating the CPU and other resources to processes.
Tao Yang, UCSB CS 240B’03 Unix Scheduling Multilevel feedback queues –128 priority queues (value: 0-127) –Round Robin per priority queue Every scheduling.
3.5 Interprocess Communication
MS 9/19/97 implicit coord 1 Implicit Coordination in Clusters David E. Culler Andrea Arpaci-Dusseau Computer Science Division U.C. Berkeley.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 1 CCS-3 P AL A NEW APPROACH.
Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 1 CCS-3 P AL A CASE STUDY.
Wk 2 – Scheduling 1 CS502 Spring 2006 Scheduling The art and science of allocating the CPU and other resources to processes.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Chapter 2 Computer Clusters Lecture 2.1 Overview.
PMIT-6102 Advanced Database Systems
Computer System Architectures Computer System Software
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
I-SPAN’05 December 07, Process Scheduling for the Parallel Desktop Designing Parallel Operating Systems using Modern Interconnects Process Scheduling.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Chapter 10 Multiprocessor and Real-Time Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community.
Multiprocessor and Real-Time Scheduling Chapter 10.
Chapter 101 Multiprocessor and Real- Time Scheduling Chapter 10.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit:
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
1 Coscheduling in Clusters: Is it a Viable Alternative? Gyu Sang Choi, Jin-Ha Kim, Deniz Ersoz, Andy B. Yoo, Chita R. Das Presented by: Richard Huang.
Chapter 2 Processes and Threads Introduction 2.2 Processes A Process is the execution of a Program More specifically… – A process is a program.
Computer and Computational Sciences Division Los Alamos National Laboratory On the Feasibility of Incremental Checkpointing for Scientific Computing Jose.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
GACOP JACCA Meeting - February 27, 2004 P AL A New Approach in the System Software Design for Large-Scale Parallel Computers Juan Fernández 1,2, Eitan.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Parallel IO for Cluster Computing Tran, Van Hoai.
LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Advanced Operating Systems CS6025 Spring 2016 Processes and Threads (Chapter 2)
OPERATING SYSTEMS CS 3502 Fall 2017
Processes and threads.
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
For Massively Parallel Computation The Chaotic State of the Art
Operating Systems (CS 340 D)
Processes and Threads Processes and their scheduling
OPERATING SYSTEMS CS3502 Fall 2017
Operating Systems (CS 340 D)
Department of Computer Science University of California, Santa Barbara
Multiprocessor and Real-Time Scheduling
CS703 – Advanced Operating Systems
Department of Computer Science University of California, Santa Barbara
Lecture Topics: 11/1 Hand back midterms
Presentation transcript:

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using Modern Interconnects Eitan Frachtenberg With Fabrizio Petrini, Juan Fernandez, Dror Feitelson, Jose-Carlos Sancho, Kei Davis Computer and Computational Sciences Division Los Alamos National Laboratory Ideas that change the world

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Cluster Supercomputers n Growing in prevalence and performance, 7 out of 10 top supercomputers n Running parallel applications n Advanced, high-end interconnects

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Distributed vs. Parallel Distributed and parallel applications (including operating systems) may be distinguished by their use of global and collective operations n Distributed—local information, relatively small number of point-to-point messages n Parallel—global synchronization: barriers, reductions, exchanges

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 System Software Components Job Scheduling Fault Tolerance Parallel I/O Communication Library Resource Management System Software System Software

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Problems with System Software Independent single-node OS (e.g. Linux) connected by distributed dæmons: u Redundant components u Performance hits u Scalability issues u Load balancing issues

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 OS’s Collective Operations Many OS tasks are inherently global or collective operations: n Job launching, data dissemination n Context switching n Job termination (normal and forced) n Load balancing

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Local Operating System Resource Management Parallel I/O Fault Tolerance Job Scheduling User-Level Communication Local Operating System Resource Management Parallel I/O Fault Tolerance Job Scheduling User-Level Communication Node 1 Node 2 Global Parallel Operating System Job SchedulingFault ToleranceCommunicationParallel I/OResource Mgmt

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 The Vision n Modern interconnects are very powerful u collective operations u programmable NICs u on-board RAM n Use a small set of network mechanisms as parallel OS infrastructure n Build upon this infrastructure to create unified system software n System software Inherits scalability and performance from network features

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Example: ASCI Q Barrier [HotI’03]

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Parallel OS Primitives n System software built atop three primitives u Xfer-And-Signal F Transfer block of data to a set of nodes F Optionally signal local/remote event upon completion u Compare-And-Write F Compare global variable on a set of nodes F Optionally write global variable on the same set of nodes u Test-Event F Poll local event

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Core Primitives on QsNet n System software built atop three primitives u Xfer-And-Signal (QsNet): F Node S transfers block of data to nodes D 1, D 2, D 3 and D 4 F Events triggered at source and destinations SD1D1 D2D2 D4D4 D3D3 Source Event Destination Events

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Core Primitives (cont.) n System software built atop three primitives u Compare-And-Write (QsNet): F Node S compares variable V on nodes D 1, D 2, D 3 and D 4 S D1D1 D2D2 D4D4 D3D3 Is V { ,  , >} to Value?

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Core Primitives (cont.) n System software built atop three primitives u Compare-And-Write (QsNet): F Node S compares variable V on nodes D 1, D 2, D 3 and D 4 F Partial results are combined in the switches S D1D1 D2D2 D4D4 D3D3

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 System Software Components Job Scheduling Fault Tolerance Parallel I/O Communication Library ResourceManagement System Software System Software

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 - Inherits scalability from network primitives: - Data dissemination and coordination - Interactive job launching speeds - Context-switching at milliseconds level - Described in [SC’02] Scalable Tool for Resource Management

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 State of the Art in Resource Management Resource managers (e.g. PBS, LSF, RMS, LoadLeveler, Maui) are typically implemented using u TCP/IP—favors portability over performance, u Poorly-scaling algorithms for the distribution/collection of data and control messages u Favoring development time over performance Scalable performance not important for small clusters but crucial for large ones. There exists a need for fast and scalable resource management.

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Experimental Setup n 64 nodes/256 processors ES40 Alphaserver cluster n 2 independent network rails of Quadrics Elan3 n Files are placed in ramdisk in order to avoid I/O bottlenecks and expose the performance of the resource management algorithms

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Launch Times (Unloaded System) The launch time is constant when we increase the number of processors. STORM is highly scalable

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Launch Times (Loaded System, 12 MB) Worst case: 1.5  seconds to launch a 12 MB file on 256 processors

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Measured and Estimated Launch Times The model shows that in an ES40-based Alphaserver a 12MB binary can be launched in 135ms on 16,384 nodes

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Comparative Evaluation (Measured & Modeled)

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 System Software ComponentsJobScheduling Fault Tolerance Parallel I/O Communication Library Resource Management System Software System Software

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Job Scheduling n Controls the allocation of space and time resources to jobs n HPC apps have special requirements u Multiple processing and network resources u Synchronization ( < 1ms granularity) u Potentially memory hogs with little locality n Has significant effect on throughput, responsiveness, and utilization

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 First-Come-First-Serve (FCFS)

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Gang Scheduling (GS)

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Implicit CoScheduling

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Hybrid Methods n Combine global synchronization & local information n Rely on scalable primitives for global coordination and information exchange n First implementation of two novel algorithms: u Flexible CoScheduling (FCS) u Buffered CoScheduling (BCS)

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Flexible CoScheduling (FCS) n Measure communication characteristics, such as granularity and wait times n Classify processes based on synchronization requirements n Schedule processes based on class n Described in [IPDPS’03]

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 FCS Classification Granularity Block times Fine Coarse Short Long CS Always gang-scheduled F Preferably gang-scheduled DC Locally scheduled

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Methodology n Synthetic, controllable MPI programs n Workload u Static: all jobs start together u Dynamic: different sizes, arrival and run times n Various schedulers implemented: u FCFS, GS, FCS, SB (ICS), BCS n Emulation vs. simulation u Actual implementation takes into account all the overhead and factors of a real system

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Hardware Environment n Environment ported to three architectures and clusters: u Crescendo: 32x2 Pentium III, 1GB u Accelerando: 32x2 Itanium II, 2GB u Wolverine: 64x4 Alpha ES40, 8GB

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Synthetic Application n Bulk synchronous, 3ms basic granularity n Can control: granularity, variability and Communication pattern

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Synthetic Scenarios Balanced Complementing Imbalanced Mixed

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Turnaround Time

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Dynamic Workloads [JSSPP’03] n Static workloads are simple and offer insights, but are not realistic n Most real-life workloads are more complex n Users submit jobs dynamically, of varying time and space requirements

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Dynamic Workload Methodology n Emulation using a workload model [Lublin03] n 1000 jobs, approx. 12 days, shrunk to 2 hrs n Varying load by factoring arrival times n Using same synthetic application, with random: u Arrival time, run time, and size, based on model u Granularity (fine, medium, coarse) u communication pattern (ring, barrier, none) n Recent study with scientific apps (yet unpublished)

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Load – Response Time

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Load – Bounded Slowdown

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Timeslice – Response Time

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 System Software Components Job Scheduling Fault Tolerance Parallel I/O CommunicationLibrary Resource Management System Software System Software

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Buffered CoScheduling (BCS) n Buffer all communications n Exchange information about pending communication every time slice n Schedule and execute communication n Implemented mostly on the NIC n Requires fine-grained heartbeats n Described in [SC’03]

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Design and Implementation n Global synchronization u Strobe sent at regular intervals (time slices) F Compare-And-Write + Xfer-And-Signal (Master) F Test-Event (Slaves) u All system activities are tightly coupled n Global Scheduling u Exchange of communication requirements F Xfer-And-Signal + Test-Event u Communication scheduling u Real transmission F Xfer-And-Signal + Test-Event

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Design and Implementation n Implementation in the NIC u Application processes interact with NIC threads F MPI primitive  Descriptor posted to the NIC F Communications are buffered u Cooperative threads running in the NIC F Synchronize F Partial exchange of control information F Schedule communications F Perform real transmissions and reduce computations u Comp/comm completely overlapped

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Design and Implementation n Non-blocking primitives: MPI_Isend/Irecv

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Design and Implementation n Blocking primitives: MPI_Send/Recv

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Performance Evaluation n BCS MPI vs. Quadrics MPI u Experimental Setup F Benchmarks and Applications NPB (IS,EP,MG,CG,LU) - Class C SWEEP3D - 50x50x50 SAGE - timing.input F Scheduling parameters 500μs communication scheduling time slice (1 rail) 250μs communication scheduling time slice (2 rails)

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Performance Evaluation n Benchmarks and Applications (C) ApplicationSlowdown IS (32PEs)10.40% EP (49PEs) 5.35% MG (32PEs) 4.37% CG (32PEs)10.83% LU (32PEs)15.04% SWEEP3D(49PEs) -2.23% SAGE (62PEs) -0.42%

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Performance Evaluation n SAGE - timing.input (IA32) 0.5% SPEEDUP

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Blocking Communication Blocking vs. Non-blocking SWEEP3D (IA32) MPI_Send/Recv  MPI_Isend/Irecv + MPI_Waitall

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 System Software Components Job Scheduling FaultTolerance Parallel I/O Communication Library Resource Management System Software System Software

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Fault Tolerance Today Fault tolerance is commonly achieved, if at all, by n Checkpointing n Segmentation of the machine n Removal of fault-prone components Massive hardware redundancy is not considered economically feasible

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Our Approach to Fault Tolerance n Recent work shows that scalable, system-level fault-tolerance is within reach with current technology, with low overhead, can be achieved through a global operating system n Two results provide the basis for this claim 1. Buffered CoScheduling that enforces frequent, global recovery lines and global control 2. Feasibility of incremental checkpoint

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Checkpointing and Recovery n Simplicity F Easy implementation n Cost-effective F No additional hardware support Critical aspect: Bandwidth requirements Saving process state

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Reducing Bandwidth n Incremental checkpointing F Only the memory modified from the previous checkpoint is saved to stable storage Full Process state Incremental

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Enabling Automatic Checkpointing Low User intervention Checkpoint data Low Hardware Operating system Run-time library Application High automatic

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 The Bandwidth Challenge Does the current technology provide enough bandwidth? Frequent Automatic

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Methodology n Quantifying the Bandwidth Requirements F Checkpoint intervals: 1s to 20s F Comparing with the current bandwidth available 900 MB/s 75 MB/s Sustained network bandwidth Quadrics QsNet II Single sustained disk bandwidth Ultra SCSI controller

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Memory Footprint Sage-1000MB954.6MB Sage-500MB497.3MB Sage-100MB103.7MB Sage-50MB55MB Sweep3D105.5MB SP Class C40.1MB LU Class C16.6MB BT Class C76.5MB FT Class C118MB Increasing memory footprint 64 Itanium II processors

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Bandwidth Requirements Bandwidth (MB/s) Timeslices (s) 78.8MB/ s 12.1MB/ s Decreases with the timeslices Sage-1000MB

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Bandwidth Requirements for 1 second Increases with memory footprint Single SCSI disk performance Most demanding

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Increasing Memory Footprint Size Average Bandwidth (MB/s) Timeslices (s) Increases sublinearly

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Increasing Processor Count Average Bandwidth (MB/s) Timeslices (s) Decreases slightly with processor count Weak-scaling

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Technological Trends Performance of applications bounded by memory improvements Increases at a faster pace Performance Improvement per year

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Conclusions n As clusters grow, interconnection technology advances: u Better bandwidth and latency u On-board programmable processor, RAM u Hardware support for collective operations Allows the development of common system infrastructure that is a parallel program in itself

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Conclusions (cont.) n On top of infrastructure we built: u Scalable resource management (STORM) u Novel job scheduling algorithms u Simplified system design and communication library u Possible basis for transparent fault tolerance

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Conclusions (cont.) n Experimental performance evaluation demonstrates: u Scalable interactive job launching and context- switching u Multiprogramming parallel jobs is feasible u Adaptive scheduling algorithms adjust to different job requirements, improving response times and slowdown in various workloads u Transparent, frequent checkpoint within current reach

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 References Eitan’s web page Fabrizio’s web page PAL team web page:

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Resource Overlapping

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Turnaround Time

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Response Time

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Timeslice – Bounded Slowdown

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 FCFS vs. GS and MPL

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 FCFS vs. GS and MPL (2)

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Backfilling n Backfilling is a technique to move jobs forward in queue n Can be combined with time-sharing schedulers such as GS when all timeslots are full

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Backfilling n Backfilling is a technique to move jobs forward in queue n Can be combined with time-sharing schedulers such as GS when all timeslots are full

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Effect of Backfilling

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Characterization Data initialization Regular processing bursts Sage-1000MB

Eitan Frachtenberg MIT, 20-Sep PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Communication Interleaved Sage-1000MB Regular communication bursts