March 6, 20031WHAT WILL SUN DO FOR SUPERCOMPUTING IN THIS DECADE? Sun’s HPC Approach John L. Gustafson, Ph.D. Sun Labs HPC Workload Characterization and.

Slides:



Advertisements
Similar presentations
U Computer Systems Research: Past and Future u Butler Lampson u People have been inventing new ideas in computer systems for nearly four decades, usually.
Advertisements

Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.
Introduction to Grid Application On-Boarding Nick Werstiuk
May International and European M.Eng. Introduction of 5th year technical options Master of Engineering for 2003 Introduction for students to the.
25 July Navigating Doors. 25 July Starting the Program A Doors shortcut icon is on the computer desktop Double-click the icon to start Doors.
IT253: Computer Organization
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman Pat Hanrahan GCafe December 10th, 2003.
OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
DSPs Vs General Purpose Microprocessors
Computer Abstractions and Technology
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
THQ/Gas Powered Games Supreme Commander and Supreme Commander: Forged Alliance Thread for Performance.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
Henry Hexmoor1 Chapter 7 Henry Hexmoor Registers and RTL.
Introduction CS 524 – High-Performance Computing.
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
Chapter 4 Assessing and Understanding Performance
IT Systems Memory EN230-1 Justin Champion C208 –
Lecture 1: Introduction to High Performance Computing.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
CPU Describe the purpose of the CPU
Counters and Registers
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Programming for High Performance Computers John M. Levesque Director Cray’s Supercomputing Center Of Excellence.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Yavor Todorov. Introduction How it works OS level checkpointing Application level checkpointing CPR for parallel programing CPR functionality References.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Practical PC, 7th Edition Chapter 17: Looking Under the Hood
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
IT253: Computer Organization
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
© 2009 IBM Corporation Motivation for HPC Innovation in the Coming Decade Dave Turek VP Deep Computing, IBM.
M U N - February 17, Phil Bording1 Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Distributed Programming CA107 Topics in Computing Series Martin Crane Karl Podesta.
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Outline Why this subject? What is High Performance Computing?
B5: Exascale Hardware. Capability Requirements Several different requirements –Exaflops/Exascale single application –Ensembles of Petaflop apps requiring.
Succeeding with Technology Chapter 2 Hardware Designed to Meet the Need The Digital Revolution Integrated Circuits and Processing Storage Input, Output,
HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Tackling I/O Issues 1 David Race 16 March 2010.
Background Computer System Architectures Computer System Software.
Introduction CSE 410, Spring 2005 Computer Systems
High Performance Computing (HPC)
Conclusions on CS3014 David Gregg Department of Computer Science
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Green cloud computing 2 Cs 595 Lecture 15.
CSE 410, Spring 2006 Computer Systems
Is System X for Me? Cal Ribbens Computer Science Department
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Summary Background Introduction in algorithms and applications
Caching Principles and Paging Performance
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Adaptive Single-Chip Multiprocessing
Chapter 1 Introduction.
Caching Principles and Paging Performance
The University of Adelaide, School of Computer Science
Utsunomiya University
Husky Energy Chair in Oil and Gas Research
Presentation transcript:

March 6, 20031WHAT WILL SUN DO FOR SUPERCOMPUTING IN THIS DECADE? Sun’s HPC Approach John L. Gustafson, Ph.D. Sun Labs HPC Workload Characterization and System Analysis Team

March 6, 20032WHAT WILL SUN DO FOR SUPERCOMPUTING IN THIS DECADE? Caveats  The need for non-disclosure protection is at an all-time high at Sun, severely restricting what we can reveal here.  I will present Sun’s priorities, philosophy, and approach for HPC system design.  Sun Labs creates enabling technologies; it doesn’t design products. This talk is a Sun Labs perspective.

March 6, 20033WHAT WILL SUN DO FOR SUPERCOMPUTING IN THIS DECADE? Background/Perspective  Sun’s boom in e-business and database apps has masked the HPC side of the company.  During the boom, Sun absorbed pieces of Cray, Thinking Machines, Genias, Maximum Strategy, and other HPC strongholds.  The DARPA HPCS program has had an incredibly big effect steering Sun ($3M steering a $12B company).  We are starting with a clean slate and designing from HPC customer workloads.

March 6, 20034WHAT WILL SUN DO FOR SUPERCOMPUTING IN THIS DECADE? The “Speed of Light” Isn’t What It Used To Be  The clock can’t make it across a 20 mm die at current GHz rates  Either we go to multiple cores or use Non-Uniform Cache Access (NUCA)  This is a physical limit, not technological [Source: Chuck Moore, UT-Austin] 130 nm100 nm 70 nm 35 nm 20 mm

March 6, 20035WHAT WILL SUN DO FOR SUPERCOMPUTING IN THIS DECADE? Bandwidth Burns Energy; Maybe Our Measure of Work is… Joules? 1300 to 1900 pJMove 32 bits off chip 100 pJMove 32 bits across 10 mm chip 50 pJRead 32 bits from 8K RAM 10 pJ32-bit register read 5 pJ32-bit ALU operation Energy (130 nm, 1.2 V)Operation [source: Bill Dalley, Stanford]

March 6, 20036WHAT WILL SUN DO FOR SUPERCOMPUTING IN THIS DECADE? HPC Hardware Philosophy  First-order constraint: heat and power.  Heat removal=> Power density => lower latency and affordable high bandwidth  Make every watt count. (Note that this might go counter to speculative computing).  Once again, supercomputing means aggressive cooling design.

March 6, 20037WHAT WILL SUN DO FOR SUPERCOMPUTING IN THIS DECADE? Asynchronous Computing You won’t be able to ask “what’s the clock rate?” much longer. There won’t be one. Saves power, enables incredible bandwidth gains, relieves RF noise, and lets every part of a system go as fast as it can. Only drawback: hell to design and test.

March 6, 20038WHAT WILL SUN DO FOR SUPERCOMPUTING IN THIS DECADE? HPC Software Philosophy  Consider domain specialist, programmer, and user to be different people in general.  Look for ways to free those people from having to do any job that’s not theirs to do.  Look for new things that the computer can do to unburden humans in the development task, especially in the run-time environment.

March 6, 20039WHAT WILL SUN DO FOR SUPERCOMPUTING IN THIS DECADE? Mom & Apple Pie Comment Page  Big systems will always have some part in failure mode, and must have resilience built in.  Must be cheaper or more capable than commodity Linux clusters, or why bother?  Support legacy languages. Duh.  Will we support MPI, OpenMP, or CAF-type program model? Yes.  Libraries are for cleverness-intensive tasks.

March 6, WHAT WILL SUN DO FOR SUPERCOMPUTING IN THIS DECADE? Design for Hardest Case  Design for sparse matrices, and dense matrices are easy.  Design for irregular grids, and regular grids are easy.  Design for dynamic loads and static loads are easy.  Design for high-quality arithmetic, and low-precision tasks are easy. And if you do it right, the nonassociativity problem goes away…  Design for high I/O optionally, and those who don’t need it don’t have to pay for it. …but Don’t try to do HPC and commercial applications with a single architecture!

March 6, WHAT WILL SUN DO FOR SUPERCOMPUTING IN THIS DECADE? Inflammatory Comments Page  Current HPC benchmarks are grossly inadequate for guiding vendors to make the right choices.  Complaining about “efficiency” as a fraction of peak FLOPS is a throwback to times when FLOPS dominated cost. Get real. Over 95% of cost is the memory system!  It’s going to get hard to count “processors” as things get weird. Thread/node/processor/module, whatever.  3 bytes per flop is still way too low, and puts too much pressure on programmers to be clever.

March 6, WHAT WILL SUN DO FOR SUPERCOMPUTING IN THIS DECADE? An HPC Taxonomy Electronic Design Nuclear Applications Mechanical Design Mechanical Engineering Radar Cross-Section Crash Simulation Fluid Dynamics Weather, Climate modeling Signal/Image Processing Encryption Life Sciences Financial Modeling Petroleum For each of these, we seek to match Problem size Data locality Predominate data type Dynamic behavior in time Spatial irregularity Demands on I/O To avoid the “toy benchmark” problem.

March 6, WHAT WILL SUN DO FOR SUPERCOMPUTING IN THIS DECADE? Example of Workload Dimension: Data Parallelism (estimated) 100% Data- Parallel content 0% Gene-matching, SETI, factoring large integers Vibrational analysis, ray-tracing Dense matrix multiply, factoring; convolution and filtering Eulerian fluid dynamics (decomposed spatially) N-Body problems Stress-strain analysis with finite elements; crash testing Lagrangian fluid dynamics (decomposed by fluid element) “Easy” databases (conflict-free, few updates) FFTs (frequency-space filtering) Particle simulations with imposed fields, game-tree exploration Circuit simulation Economic simulations Typical database applications; transaction processing

March 6, WHAT WILL SUN DO FOR SUPERCOMPUTING IN THIS DECADE? SUMMARY  Sun will contribute several disruptive technologies to HPC in the coming decade.  Asynchronous computing is public now, but several other breakaway technologies will have as much or more impact.  We will take pressure off programmers by providing a simpler, more capable model.