Introduction and History of Cray Supercomputers

Slides:



Advertisements
Similar presentations
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Computer Architecture A.
Advertisements

The University of Adelaide, School of Computer Science
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Parallell Processing Systems1 Chapter 4 Vector Processors.
Supercomputers Daniel Shin CS 147, Section 1 April 29, 2010.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.

State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.
Prince Sultan College For Woman
Advanced Computer Architectures
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Computer Organization & Assembly Language © by DR. M. Amer.
Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.
The fetch-execute cycle. 2 VCN – ICT Department 2013 A2 Computing RegisterMeaningPurpose PCProgram Counter keeps track of where to find the next instruction.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
Vector and symbolic processors
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
CPIT Program Execution. Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the.
What’s going on here? Can you think of a generic way to describe both of these?
Addressing modes, memory architecture, interrupt and exception handling, and external I/O. An ISA includes a specification of the set of opcodes (machine.
Introduction to Computers - Hardware
Chapter 1 Introduction.
These slides are based on the book:
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Single Instruction Multiple Data
CMSC 611: Advanced Computer Architecture
buses, crossing switch, multistage network.
Parallel Processing - introduction
Embedded Systems Design
CS 147 – Parallel Processing
Super Computing By RIsaj t r S3 ece, roll 50.
Architecture & Organization 1
INTRODUCTION TO MICROPROCESSORS
Morgan Kaufmann Publishers
COMP4211 : Advance Computer Architecture
Parallel and Multiprocessor Architectures
Pipelining and Vector Processing
Array Processor.
Architecture & Organization 1
Microprocessor & Assembly Language
Mihir Awatramani Lakshmi kiran Tondehal Xinying Wang Y. Ravi Chandra
Multivector and SIMD Computers
Parallel Processing Architectures
Chapter 2: Data Manipulation
buses, crossing switch, multistage network.
Overview Parallel Processing Pipelining
Chap. 9 Pipeline and Vector Processing
Part 2: Parallel Models (I)
Chapter 2: Data Manipulation
COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING
Course Outline for Computer Architecture
Chapter 2: Data Manipulation
Presentation transcript:

Introduction and History of Cray Supercomputers CS 350: Computer Organization Section 001 Term Project Ryan Smith Jon Soper Jamie Vigliotta Title Slide

What are Supercomputers? A supercomputer is defined simply as the most powerful class of computers at an point in time.” (Cray Inc) - Definition of Super Computer

Uses of Super Computers Computer Engineering Chemistry Fluid Dynamics Bioinformatics Briefly Discuss Each use: Computer Engineering AT&T Bell Laboratories were using supercomputers to design chip circuits and study the chemistry and physics of chips The supercomputer is especially useful in modeling activities such as electromagnetic scattering, heat transfer, and distribution of energy on the chip Chemistry Dupont used their Cray supercomputer to simulate breakage thresholds and patterns in the composite materials they created complex calculations and simulations in the areas of molecular dynamics and molecular orbital theory to generate fractals that were used to examine the how cracks distributed themselves in composite materials quantum chemistry and zeolites Fluid Dynamics alternative to wind tunnels less costly and less time-consuming process Bioinformatics Study of Human Genome Use by National Cancer Society

Uses of Supercomputers Cont’ Creating Cleaner power Developing drugs to treat HIV and slow it’s reproduction Cont’ Cleaner Power George Richards, leader of the National Energy Technology Laboratory's combustion dynamics team, takes on the challenge of converting fuel to energy without creating pollutants by using simulations on PSC's Cray T3E HIV Treatment PSC's (Projects in Scientific Computing) Marcela Madrid simulates an HIV enzyme on the Cray T3E to help develop drugs that shut down HIV replication

Cray Supercomputers

Two General Categories Vector Processing Parallel Processing Describe the two briefly

Vector Processing Vector Registers Pipelining Segmentation A + B = S A Describe Basic Vector Processing Attributes The vector computer was designed to efficiently handle arithmetic operations on elements of arrays, or vectors. Such machines are useful in applications involving high-performance scientific computing where matrices and vectors are common Vector Register – registers containing operands and results from operands Pipeling -- explicit segmentation of an arithmetic unit into different parts, each of which performs a sub function on a pair of operands Compare to Assembly Station Segmentation – each station is a segment S

Vector Processing Example Input Vector A Input Vector B A0 A1 A2 A3 A4 B0 B1 B2 B3 B4 A0 B0 Arithmetic Process X Result y

Vector Processing Example Input Vector A Input Vector B A0 A1 A2 A3 A4 B0 B1 B2 B3 B4 A1 A0 B1 B0 Arithmetic Process X Result y

Parallel Processing t time for one processor, t/p for p processors Single Instruction Multiple Data (SIMD) Multiple Instruction Multiple Data (MIMD) This situation is ideal for modeling which has the same calculations being preformed on differing data sets. Multiple data sets could be calculated simultaneously - Single Instruction has a control unit which synchronizes.. MIMD must be handled via hardware unit or software

Historical Models Cray-1 Cray X-MP 160 Megaflops 64 bit bus width Restricted memory Bandwidth Cray X-MP Based on Cray-1 architecture Used four Cray-1 Processors Cray-1 Supported both Scalar and Vector registers Introduced pipeling feature of vector processing Memory bandwidth couldn’t keep up with register to register arithmetic Cray-XMP First multiprocessor Cray Higher memory bandwidth than Cray-1 Shorter clock cycle

Historical Models Cont’ Cray-2 Same as Cray X-MP, faster clock cycle Cray Y-MP Multiple processors running at 333 megaflops Combined to reach 2.3 gigaflops Cray-2 64bit word as the basic addressable memory unit 4.1 nano-second clock period and performing 2's compliment arithmetic, scalar and vector processing modes, nine fully segmented functional unite per CPU Cray Y-MP

Present Models Cray MTA Cray Sv1 Designed for parralel processing environment. 192 gigaflops, 256 processors Send data to and from memory at full processor rate. Cray Sv1 256 gigaflops 128 gigabytes memory 284 gigabytes SSD Scalable up to 32 units 1 teraflop peak Cray MTA With network interconnectivity capable of sending data to and from memory at full processor rate, this system was designed for the parallel processing environment. The Cray MTA high-end model supports 64 bit data, addresses, and instructions. Utilizing 256 processors, this model is capable of up to 192 gigaflops. All synchronization is done at negligible cost to the processor, resulting in an efficient use of parallel processing. Cray SV1 The leading Cray Vector computer currently on the market is the Cray SV1, which was introduced in 2001. Its higher end models are capable 256 gigaflops, support up to 128 gigabytes of main memory, and 284 gigabytes of SSD. Additionally, these systems are scalable, allowing interconnectivity of up to 32 units, providing over one teraflop of peak CPU capacity. Together with fast memory and I/O systems, this system allows the ability to run vector computations through high-bandwidth, vector cache memory.