The “State” and “Future” of Middleware for HPEC Anthony Skjellum RunTime Computing Solutions, LLC & University of Alabama at Birmingham HPEC 2009.

Slides:



Advertisements
Similar presentations
1 Introducing Collaboration to Single User Applications A Survey and Analysis of Recent Work by Brian Cornell For Collaborative Systems Fall 2006.
Advertisements

Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
Software Evolution Managing the processes of software system change
Chapter 13 Embedded Systems
1 CMSC 132: Object-Oriented Programming II Software Development III Department of Computer Science University of Maryland, College Park.
Building software from reusable components.
An Overview of Virtual Machine Architectures by J.E. Smith and Ravi Nair presented by Sebastian Burckhardt University of Pennsylvania CIS 700 – Virtualization.
Chapter 9 – Software Evolution and Maintenance
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse 2.
New Direction Proposal: An OpenFabrics Framework for high-performance I/O apps OFA TAC, Key drivers: Sean Hefty, Paul Grun.
© 2002 Mercury Computer Systems, Inc. © 2002 MPI Software Technology, Inc. For public release. Data Reorganization Interface (DRI) Overview Kenneth Cain.
Lecture # 22 Software Evolution
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 18 Slide 1 Software Reuse.
Software Engineering Muhammad Fahad Khan
Software Reuse Prof. Ian Sommerville
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 14Slide 1 Design with Reuse l Building software from reusable components.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.
Software evolution. Objectives l To explain why change is inevitable if software systems are to remain useful l To discuss software maintenance and maintenance.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
SOFTWARE ENGINEERING1 Introduction. Software Software (IEEE): collection of programs, procedures, rules, and associated documentation and data SOFTWARE.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Future Airborne Capability Environment (FACE)
Reinventing Explicit Parallel Programming for Improved Engineering of High Performance Computing Software Anthony Skjellum, Purushotham Bangalore, Jeff.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
DOE PI Meeting at BNL 1 Lightweight High-performance I/O for Data-intensive Computing Jun Wang Computer Architecture and Storage System Laboratory (CASS)
Cousins HPEC 2002 Session 4: Emerging High Performance Software David Cousins Division Scientist High Performance Computing Dept. Newport,
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Department of Computer and Information Sciences September 22, 2011 Toward the Embedded Petascale Architecture: The Dual Roles of.
Experts in numerical algorithms and HPC services Compiler Requirements and Directions Rob Meyer September 10, 2009.
Experiences from Representing Software Architecture in a Large Industrial Project Using Model Driven Development Andres Mattsson 1 Björn Lundell 2 Brian.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 18 Slide 1 Software Reuse.
LESSON 3. Properties of Well-Engineered Software The attributes or properties of a software product are characteristics displayed by the product once.
1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.
Programmability Hiroshi Nakashima Thomas Sterling.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 21 Slide 1 Software evolution.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
1 VSIPL++: Parallel Performance HPEC 2004 CodeSourcery, LLC September 30, 2004.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Benchmarking and Applications. Purpose of Our Benchmarking Effort Reveal compiler (and run-time systems) weak points and lack of adequate automatic optimizations.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
Open Source and Business Issues © 2004 Northrop Grumman Corp. All rights reserved. 1 Grid and Beowulf : A Look into the Near Future NorthNorth F-18C Weapons.
Software Reuse. Objectives l To explain the benefits of software reuse and some reuse problems l To discuss several different ways to implement software.
Chapter 4 – Thread Concepts
Chapter 4: Multithreaded Programming
Introduction Edited by Enas Naffar using the following textbooks: - A concise introduction to Software Engineering - Software Engineering for students-
CIM Modeling for E&U - (Short Version)
Welcome: Intel Multicore Research Conference
MPI Software Technology, Inc. VSIPL for Diverse Architectures
Chapter 18 Maintaining Information Systems
Enabling machine learning in embedded systems
Multicore, Multithreaded, Multi-GPU Kernel VSIPL Standardization, Implementation, & Programming Impacts Anthony Skjellum, Ph.D.
Chapter 4 – Thread Concepts
OCP: High Performance Computing Project
Introduction Edited by Enas Naffar using the following textbooks: - A concise introduction to Software Engineering - Software Engineering for students-
Chapter 4: Threads.
Chapter 4: Threads.
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Alternative Processor Panel Results 2008
An Overview of Virtual Machine Architectures
Middleware, Services, etc.
Presents: Rally To Java Conversion Suite
Chapter 8 Software Evolution.
VSIPL++: Parallel Performance HPEC 2004
Presentation transcript:

The “State” and “Future” of Middleware for HPEC Anthony Skjellum RunTime Computing Solutions, LLC & University of Alabama at Birmingham HPEC 2009

Overview Context Tradeoffs Standards Futures Summary

COTS Computing -> COTS APIs and Middleware With COTS computing, the combined goals for –Performance –Portability –Productivity And lack of vendor “lock” has motivated use of High Performance Middleware

High Performance Computing Meets Software Engineering Kinds of software maintenance –Perfective –Adaptive –Corrective Standards allow for adaptive maintenance, including porting to new platforms Standards “allow” vendors to deliver performance and productivity in narrow areas without “undoing” COTS; compete within confines of interfaces Standards remove need for each development project to make a layer of adaptive middleware Porting to new platforms important in COTS world

Tradeoffs Layers can’t add performance, unless they raise the level of specification and granularity of operations and allow compile-time and/or runtime-optimization More lines of standard code vs. concise unportable fast vendor specs? Small reductions of achievable performance can yield huge in portability and productivity There is no guarantee that users use standards or vendor-specific codes optimally

Tradeoffs Middleware standards are specified as if they are libraries and API calls Middleware standards are implemented as libraries with API calls Middleware standards could be implemented as domain-specific languages and/or compiler-based notations, with lower cost of portability (but implementations more expensive)

Parallel Compilers for HPEC There is no effective automatic parallel compilation… no change in sight OpenMP directives helpful but not a general solution for thread parallelization (e.g., in GCC 4.x) High Performance Middleware still has to “pull its weight”

Standards that emerged in HPEC Drawing on HPC –MPI –MPI/RT (with DARPA support) Unique to HPEC –VSIPL –DRI –VSIPL++ CORBA

Two classes of adoption Significant –VSIPL –MPI –VSIPL++ (which also leverages DRI) Not significant –MPI/RT –DRI (although ideas used elsewhere) CORBA successfully used, but not for HPC purposes usually, rather distributed computing; so important, but “adjacent”

Why MPI is successful Successful in HPC space –revolutionized software development in DOD/DOE/academia –Even limited commercial application space Widely implemented –free versions –Commercial productions Widely available training in colleges Easy to write a program in MPI Matches the CSP model of embedded multicomputers Developers generally use MPI 1.x subset only in HPEC space (standardized in 1994)

Why VSIPL Successful Close mapping to “FPS library” notation from the past Removes vendor lock from math functions (e.g., non-portable functions) Eliminates “accidental complexity” in apps Math functions are hard to implement on modern processors ADT abstractions (object based) useful Very good performance in practice Not hard to use

Why VSIPL++ is Successful For those that use C++ –More performance possible –Less lines of code possible DRI concepts included Parallel abstraction included Acceptance of the parallel model, C++, and template meta programming is a big leap for some developers/programs, but is worthwhile

DRI - Still Important DARPA Data Reorganization Initiative Works to replace 1 function family in MPI - Alltoall Key HPEC abstraction (Corner Turn) All the best ideas for Corner Turn in the field put in the standard Created at end of “DARPA Push” in the area… no customer demand; picked up in VSIPL++ Best ideas already in vendor libraries like PAS Next steps: Try to get into MPI-3 standard

MPI/RT Not Successful Easy to implement without QoS QoS needed OS support (lacking) Hard to write programs in MPI/RT Addressed performance-portability issues in MPI-1.1 area, but not compelling enough Implementations done on CSPI, Mercury, Sky?, Radstone, Linux Cluster… but did not reach critical mass of customer demand/acceptance MPI/RT Concepts remain important for future middleware standardization Huge effort invested, no disruptive change in development

I. Performance-Productivity- Portability Space Goal is to tradeoff You can’t have it all –Choose at Most 1 –Choose at most 2 Those tradeoffs are different as you move to different COTS architectures –Tuning always needed Performance-Portability in Middleware helps Productivity Matters too, but often sacrificed to get the other two factors

Price of Portability Due to Richard Games (Ken Cain involved too) Classical example: –MPI_Alltoall* function –Much slower than hand coded –Just proves that MPI specified this function about as suboptimally as it could have –Side effect: motivated DRI Better examples: –MPI Send/Receive vs. low level vendor DMA –VSIPL vs. chained optimized pulse compression in vendor-specific math API

Price of Productivity Productivity doesn’t have to lower Performance (e.g., meta programming) But it often does Productivity enhancing interfaces in HPEC… the key one is VSIPL++

Price of Resilience (A 4th Dimension) All standards discussed so far lack a fault tolerant model Space-based and even terrestrial embedded multicomputers have fault scenarios No significant effort to refactor successful APIs done yet HPEC approaches to fault awareness needed Aspect-oriented type approaches possible

II. Investment in Further Standardization There is very limited current investment in these areas MPI-3 - not working on performance issues still bogging down MPI-1.x functions used in HPEC [NSF funding MPI-3 meeting attendance] VSIPL - dormant; latest standard efforts were not key to addressing more performance or more productivity No significant investments from DARPA now… DARPA and/or NSF need to drive Actually, there was always limited investment :-) it is just more limited now

III. Broader Adoption Must be driven by ultimate customers (e.g., USN)… if not required by customer, why will primes or other developers use standards? Legacy codes that mix standards and vendor APIs are still vendor locked Economic models that show value must be enhanced Standards must be enhanced for newer, more complex heterogeneous architectures Standards must be implemented well on target platforms [vendors fully embrace]

IV. Newer Architectures Heterogeneous Multithreaded / multicore GPU + CPU CPU + FPGA Balance changes in communication, computation, and I/O Need for better overlap All mean that the programming models of existing standards have to be augmented or revamped Commercial standards like OpenCL and CUDA… we have to interoperate with these

Summary Overall, middleware, and principally MPI and VSIPL, have driven up the capability of defense and other HPEC applications to be performance- oriented and portable at the same time. Significant legacy applications have been developed, These standards have proven useful. Others as well (e.g., POSIX)

Summary How do we get more benefit from middleware? Where do we invest, what private and government stakeholders make those investments, when, and how? Investment long overdue and can be extremely beneficial, and even enhance the competitiveness of HPEC systems vendors. Leverage from HPC? Does a lean budget drive more interest or less?