SOS7 What will Cray do for Supercomputing in this Decade? Asaph Zemach Cray Inc.

Slides:



Advertisements
Similar presentations
Distributed Systems CS
Advertisements

CSCI-455/522 Introduction to High Performance Computing Lecture 2.
Computer Architecture Introduction to MIMD architectures Ola Flygt Växjö University
GRAPHICS AND COMPUTING GPUS Jehan-François Pâris
Bill Camp, Jim Tomkins & Rob Leland.
Background Computer System Architectures Computer System Software.
1 Introduction to MIMD Architectures Sima, Fountain and Kacsuk Chapter 15 CSE462.
1 Threads, SMP, and Microkernels Chapter 4. 2 Process: Some Info. Motivation for threads! Two fundamental aspects of a “process”: Resource ownership Scheduling.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Bugnion et al. Presented by: Ahmed Wafa.
NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim CRAY T90 vs. Tera MTA: The Old Champ Faces a New Challenger.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
Parallel Computer Architectures
Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
XMT BOF SC09 XMT Status And Roadmap Shoaib Mufti Director Knowledge Management.
Computer System Architectures Computer System Software
Tera MTA (Multi-Threaded Architecture) Thriveni Movva (CMPS 5433)
Different CPUs CLICK THE SPINNING COMPUTER TO MOVE ON.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  Why computer organization is important  Logistics  Modern trends.
Architectural Considerations for Petaflops and beyond Bill Camp Sandia National Lab’s March 4,2003 SOS7 Durango, CO, USA -
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Dr Mark Parsons Commercial Director, EPCC HECToR The latest UK National High Performance Computing Service.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.
3 rd Party Software Gail Alverson August 5, 2005.
October 12, 2004Thomas Sterling - Caltech & JPL 1 Roadmap and Change How Much and How Fast Thomas Sterling California Institute of Technology and NASA.
Eldorado John Feo Cray Inc. 2 Outline  Why multithreaded architectures  The Cray Eldorado  Programming environment  Program examples.
Future of parallel computing: issues and directions Laxmikant Kale CS433 Spring 2000.
Cray Environmental Industry Solutions Per Nyberg Earth Sciences Business Manager Annecy CAS2K3 Sept 2003.
Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
Extreme Computing’05 Parallel Graph Algorithms: Architectural Demands of Pathological Applications Bruce Hendrickson Jonathan Berry Keith Underwood Sandia.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY The Center for Computational Sciences 1 State of the CCS SOS 8 April 13, 2004 James B. White.
Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
Initial Kernel Timing Using a Simple PIM Performance Model Daniel S. Katz 1*, Gary L. Block 1, Jay B. Brockman 2, David Callahan 3, Paul L. Springer 1,
© 2004 IBM Corporation Power Everywhere POWER5 Processor Update Mark Papermaster VP, Technology Development IBM Systems and Technology Group.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
ORNL is managed by UT-Battelle for the US Department of Energy Musings about SOS Buddy Bland Presented to: SOS20 Conference March 25, 2016 Asheville, NC.
MAHARANA PRATAP COLLEGE OF TECHNOLOGY SEMINAR ON- COMPUTER PROCESSOR SUBJECT CODE: CS-307 Branch-CSE Sem- 3 rd SUBMITTED TO SUBMITTED BY.
The Cray X1 Multiprocessor and Roadmap
CS 6560: Operating Systems Design
Lecture 1: CS/ECE 3810 Introduction
Chapter 4 Threads.
Spatial Analysis With Big Data
Lecture 1: CS/ECE 3810 Introduction
CMSC 611: Advanced Computer Architecture
Levels of Parallelism within a Single Processor
Lecture 1: Parallel Architecture Intro
Chapter 4: Threads.
ECE/CS 757: Advanced Computer Architecture II
Interconnect with Cache Coherency Manager
The Memory-Processor Gap
Levels of Parallelism within a Single Processor
Chapter 4 Multiprocessors
CS 286 Computer Organization and Architecture
Week1 software - Lecture outline & Assignments
Chip&Core Architecture
CSE 542: Operating Systems
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
Facts About High-Performance Computing
Interconnection Network and Prefetching
Presentation transcript:

SOS7 What will Cray do for Supercomputing in this Decade? Asaph Zemach Cray Inc

SOS7 – Durango, CO – March Page: 2 An Apology I am not Burton. Sorry. Where is Burton? Maybe? More likely…

SOS7 – Durango, CO – March Page: 3 What Have You Done For Me Lately? Cray MTA-2 –Accepted by NRL Sept ‘02 –UMA Shared Memory –Latency Tolerant: 128 contexts in processor. Red Storm for Sandia –Contract signed Oct ‘02 –10,000 AMD X86-64 –High Speed Network Cray X1 –FCS Dec 31, 2002 –Scalable vector MSPs –NUMA Shared Memory

SOS7 – Durango, CO – March Page: 4 Cray Products: The Near Future 2003 X GF 35GB/s/p mem BW 76GB/s/p cache BW End of ‘04 X1e Technology Upgrade Faster clock Denser Package Mix&Match with X X2 (Blackwidow) Bigger Faster Cheaper 2003 Red Storm (Development) End of ‘04 Red Storm (Install) Catamount LWK Linux service AMD 2GHz X Red Storm Product (?) Linux Service Compute OS? Synergy? I/O? Synergy? I/O?

SOS7 – Durango, CO – March Page: 5 Cray Products: Not So Near Future Shared Memory Locales –UMA, NUMA Heavy Weight Processors –Multi threading, Vectors, Streams PIM (LWP) 2006(?)2008(?)2010 Cascade X2e BIGGER FASTER CHEAPER X2f BIGGER!! FASTER!! CHEAPER!!

SOS7 – Durango, CO – March Page: 6 SW Controlled Data Cache SW Controlled Data Cache Cascade Locale Heavy Weight Proc Vector MT Streams Heavy Weight Proc Vector MT Streams Multithreaded PIM DRAM Multithreaded PIM DRAM Multithreaded PIM DRAM Multithreaded PIM DRAM Multithreaded PIM DRAM Multithreaded PIM DRAM Locale Interconnect Locale Interconnect Router To other Locales Multithreaded PIM DRAM Multithreaded PIM DRAM Multithreaded PIM DRAM Multithreaded PIM DRAM Multithreaded PIM DRAM Multithreaded PIM DRAM

SOS7 – Durango, CO – March Page: 7 HWP Memory Generic Data Somewhat Localized Data Highly Localized Data Cascade: Lazy Localization Initially all data is considered generic – equally far from everywhere. To improve performance stage generic data near HWP that manipulates it. To improve performance even more, partition data between PIMS. All data always universally accessible but performance varies.

SOS7 – Durango, CO – March Page: 8 Cascade: Software Investigations Compiler controlled cache Compartmentalized OS-es –Introspection using PIM Relative Debugging Abstract locales: virtualize locality management –What needs to be near what –What can/should be distributed (& how)

SOS7 – Durango, CO – March Page: 9 Cascade People Burton Smith David Callahan Steve Scott –Cray Thomas Sterling Larry Bergman Hans Zima –JPL, CalTech Jay Brockman Peter Kogge –Notre Dame Bill Daly –Stanford