One step ahead. The Challenges of Architectures that Grow to Petascale and can be Sustained Economically Steve Reinhardt Principal Engineer, SGI spr at.

Slides:

Advertisements

Similar presentations

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.

Advertisements

Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

Implementation methodology for Emerging Reconfigurable Systems With minimum optimization an appreciable speedup of 3x is achievable for this program with.

GPUs on Clouds Andrew J. Younge Indiana University (USC / Information Sciences Institute) UNCLASSIFIED: 08/03/2012.

Introduction to MIMD architectures

Silicon Graphics, Inc. Poster Presented by: SGI Proprietary Technologies for Breakthrough Research Rosario Caltabiano North East Higher Education & Research.

Introduction CS 524 – High-Performance Computing.

Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Evaluating the Tera MTA Allan Snavely, Wayne Pfeiffer et.

What is Grid Computing? Grid Computing is applying the resources of many computers in a network to a single entity at the same time;  Usually to a scientific.

1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.

Figure 1.1 Interaction between applications and the operating system.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.

Grid IO APIs William Gropp Mathematics and Computer Science Division.

Clinic to Cloud Provides an Electronic Medical Records System to Doctors in Australia, Hosted by Highly Secure Microsoft Azure Data Centers MICROSOFT AZURE.

Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

The Old World Meets the New: Utilizing Java Technology to Revitalize and Enhance NASA Scientific Legacy Code Michael D. Elder Furman University Hayden.

Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

SGI Contributions to Supercomputing by 2010 Steve Reinhardt Director of Engineering

Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

Helping scientists collaborate BioCAD. ©2003 All Rights Reserved.

Architectures of distributed systems Fundamental Models

Directed Reading 2 Key issues for the future of Software and Hardware for large scale Parallel Computing and the approaches to address these. Submitted.

The roots of innovation Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on:

© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.

Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.

MAPLD Reconfigurable Computing Birds-of-a-Feather Programming Tools Jeffrey S. Vetter M. C. Smith, P. C. Roth O. O. Storaasli, S. R. Alam

Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

Processes Introduction to Operating Systems: Module 3.

Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.

Interactive Supercomputing Update IDC HPC User’s Forum, September 2008.

CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.

Accumulus Delivers Enterprise Class Subscription Billing and Automation Solutions for Gaming, Retail, and More on the Scalable Microsoft Azure Platform.

Built on Azure, Moodle Helps Educators Create Proprietary Private Web Sites Filled with Dynamic Courses that Extend Learning Anytime, Anywhere MICROSOFT.

Simics: A Full System Simulation Platform Synopsis by Jen Miller 19 March 2004.

Full and Para Virtualization

Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Copyright ©2003 Dell Inc. All rights reserved. Scaling-Out with Oracle® Grid Computing on Dell™ Hardware J. Craig Lowery, Ph.D. Software Architect and.

Tackling I/O Issues 1 David Race 16 March 2010.

Background Computer System Architectures Computer System Software.

Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.

BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.

The Future of Whole Human Genome Data Management and Analysis, Available on the Microsoft Azure Platform Today MICROSOFT AZURE APP BUILDER PROFILE: SPIRAL.

Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.

Clouds , Grids and Clusters

Grid Computing.

Architecture & Organization 1

Architecture & Organization 1

DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.

Excelian Grid as a Service Offers Compute Power for a Variety of Scenarios, with Infrastructure on Microsoft Azure and Costs Aligned to Actual Use MICROSOFT.

CSc4730/6730 Scientific Visualization

CLUSTER COMPUTING.

Media365 Portal by Ctrl365 is Powered by Azure and Enables Easy and Seamless Dissemination of Video for Enhanced B2C and B2B Communication MICROSOFT AZURE.

Chapter 8: Memory management

Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.

Chapter 1 Introduction.

Outline Chapter 2 (cont) OS Design OS structure

Software Acceleration in Hybrid Systems Xiaoqiao (XQ) Meng IBM T. J

System calls….. C-program->POSIX call

Presentation transcript:

One step ahead

The Challenges of Architectures that Grow to Petascale and can be Sustained Economically Steve Reinhardt Principal Engineer, SGI spr at sgi.com

SGI’s systems are evolving to enable ultrascale versions of today’s applications and enable a new type of computational science, while remaining economically sustainable.

Agenda Besides Architecture… Enabling Ultra-scale Applications Enabling New Computational Science Sustaining Economically

Besides Hardware Architecture... Efficient execution environment RAS OS architecture –Linux scaled aggressively, with multiples in ultrascale configurations Robust scheduling RAS Packaging density / heat dissipation RAS

Agenda Besides Architecture… Enabling Ultra-scale Applications Enabling New Computational Science Sustaining Economically

Local Performance: Needed Flexibility of Memory Access Note: Original (Jan2003) models used for both X1 and Altix Price Performance Absolute Performance Driven by focus of engineering team Driven by cost of large engineering team Driven by parts replication cost

Ideal Machine (Technical/Economic Balance) Price PerformanceAbsolute Performance High, cost-effective cache bandwidth of mass market parts Highest cost-effective memory bandwidth Design focus on gather/scatter Note: For O(100KP) petascale machines, value of O(5X) processor performance advantage is less than today

Local Performance: Multi-Paradigm Low Data locality High Low Compute high Intensity Vector-like PIM-like Scalar Application-specific

Ultraviolet : Concept Architecture MPU UV Petascale GAM. Globally Addressable. Low Latency. High Bandwidth. O(100K) Ports GPU I/O APU

Global Performance Communications –grids becoming more dynamic -> low latency essential –processor counts growing -> low latency essential –low latency -> global address space –in clock periods, remote memory getting further away –bandwidth-conserving operations needed –high absolute link performance Synchronization –current mechanisms insufficient for ultrascale –optimizations will help, but maybe not enough –new mechanisms needed Dynamic load balancing –mechanisms need to mature, and interfaces become standard

Challenges Clear virtual machine and performance models for these new mechanisms Compilers/tools that exploit these mechanisms mostly automatically and accept user hints Appropriate performance balance for typical uses Need to gain successful experience at very large scale (10-30KP) before going to ultrascale (100KP)

Agenda Besides Architecture… Enabling Ultra-scale Applications Enabling New Computational Science Sustaining Economically

Scientific Process Observe existing data for patterns Hypothesize models that match the data Test those models to understand accuracy (i.e., add new data) **Believed first coined by Scott Studham et al., PNNL

Scientific Process Observe existing data for patterns Hypothesize models that match the data Test those models to understand accuracy (i.e., add new data) “First Principles” computing; most of current HPC “Dynamic Network Inference” computing** Query: When we know what we want and how to ask for it Inference: When we know only somewhat what we want Exploration: When we know little, but anticipate more “ planned serendipity ” **Believed first coined by Scott Studham et al., PNNL

Example: Post-Genomic Biology <10% of the human genome is known to code for proteins Selective pressure generally removes unused genetic material What is the other 90% of the genome doing? –Have the raw data (genome) –Need to add other types of data (e.g., protein association info) –Multi-petabytes of data all told –Probably not a purely computational problem

Differences from First Principles Data access patterns ~impossible to predict a priori -> low latency / global address space New tools for data exploration needed –need to automatically search for new, perhaps-vaguely-defined, patterns (that foster new theory) –highly interactive/coupled with the scientist’s thought process –but beware difficulty of launching new languages Contents of memory much more valuable –RAS

“ and now for something completely different ”: Star-P Developed by Alan Edelman and colleagues at MIT, etc. Simple extensions to the MATLAB® language –data parallel, MIMD, and mixed Builds on the existing base of MATLAB programs –broadening the market for HPC systems New back-end server implemented for parallel execution Preserves key MATLAB strengths: –very high level language –interactivity / exploration –easy visualization “Put the fun back in supercomputing”

Agenda Besides Architecture… Enabling Ultra-scale Applications Enabling New Computational Science Sustaining Economically

Key Points SGI retains system focus …but uses commodity components wherever practical –Exploit best mass-market processors (Itanium™) augment to make suitable for wider range of HPC apps –Use Linux fully reap the cost benefits of reduced support of proprietary Unix™ variant –IFB cables, EFI firmware Innovations for ultrascale must be relevant for wider markets –e.g., multi-paradigm computing must accelerate ISV apps Use new technologies to broaden the market –e.g., Star-P

SGI’s systems are evolving to enable ultrascale versions of today’s applications and enable a new type of computational science, while remaining economically sustainable.

One step ahead

“There are no technology-independent lessons in computer science.” Butler Lampson, Xerox PARC