SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.

Slides:

Advertisements

Similar presentations

Multiple Processor Systems

Advertisements

1 Uniform memory access (UMA) Each processor has uniform access time to memory - also known as symmetric multiprocessors (SMPs) (example: SUN ES1000) Non-uniform.

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.

Today’s topics Single processors and the Memory Hierarchy

Computer Architecture Guidance Keio University AMANO, Hideharu ． ics ． keio ． ac ． jp.

CS 213: Parallel Processing Architectures Laxmi Narayan Bhuyan Lecture3.

1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.

Classification of Distributed Systems Properties of Distributed Systems n motivation: advantages of distributed systems n classification l architecture.

CSCI-455/522 Introduction to High Performance Computing Lecture 2.

Computer Architecture Introduction to MIMD architectures Ola Flygt Växjö University

CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.

Introduction to MIMD architectures

OGO 2.1 SGI Origin 2000 Robert van Liere CWI, Amsterdam TU/e, Eindhoven 11 September 2001.

IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)

CS 284a, 7 October 97Copyright (c) , John Thornley1 CS 284a Lecture Tuesday, 7 October 1997.

1 Lecture 1 Parallel Processing for Scientific Applications.

Multiprocessors ELEC 6200 Computer Architecture and Design Instructor: Dr. Agrawal Yu-Chun Chen 10/27/06.

DAP Spr.‘98 ©UCB 1 Lecture 18: Review. DAP Spr.‘98 ©UCB 2 Cache Organization (1) How do you know if something is in the cache? (2) If it is in the cache,

Parallel Processing Architectures Laxmi Narayan Bhuyan

Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.

1 CSE SUNY New Paltz Chapter Nine Multiprocessors.

Parallel Processing Group Members: PJ Kulick Jon Robb Brian Tobin.

CPE 731 Advanced Computer Architecture Multiprocessor Introduction

MultiIntro.1 The Big Picture: Where are We Now? Processor Control Datapath Memory Input Output Input Output Memory Processor Control Datapath  Multiprocessor.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)

CENG 546 Dr. Esma Yıldırım. Copyright © 2012, Elsevier Inc. All rights reserved What is a computing cluster?  A computing cluster consists of.

Computer System Architectures Computer System Software

1 Parallel Computing Basics of Parallel Computers Shared Memory SMP / NUMA Architectures Message Passing Clusters.

Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.

CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.

MIMD Shared Memory Multiprocessors. MIMD -- Shared Memory u Each processor has a full CPU u Each processors runs its own code –can be the same program.

Parallel and Distributed Computing References Introduction to Parallel Computing, Second Edition Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar.

Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.

Department of Computer Science University of the West Indies.

1 CS 6823 ASU Chapter 2 Architecture.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.

Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,

Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.

Computer System Architecture Dept. of Info. Of Computer. Chap. 13 Multiprocessors 13-1 Chap. 13 Multiprocessors n 13-1 Characteristics of Multiprocessors.

Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.

Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.

MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,

PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.

Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.

Lec 6 Chap. 13Multiprocessors

Outline Why this subject? What is High Performance Computing?

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.

Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.

CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.

Background Computer System Architectures Computer System Software.

Multiprocessors & Multicomputers

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

Overview Parallel Processing Pipelining

Parallel Architecture

Multiprocessor Systems

Parallel Computers Definition: “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.”

CS 147 – Parallel Processing

Chapter 17 Parallel Processing

CS 213: Parallel Processing Architectures

Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory.

Chapter 4 Multiprocessors

An Overview of MIMD Architectures

Multiprocessor System Interconnects

Presentation transcript:

SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters

SGI’2000Parallel Programming Tutorial MIMD Multiprocessors Single Address space Shared Memory Multicomputers Multiple Address spaces UMA Central Memory NUMA distributed memory NORMA no-remote memory access PVP (Cray T90) SMP (Intel SHV, SUN E10000, DEC 8400 SGI Power Challenge, IBM R60, etc.) COMA (KSR-1, DDM) CC-NUMA (SGI Origin2000, SN1 (SGI3000), Cray T3E, HP Exemplar, Sequent NUMA-Q, Data General) NCC-NUMA (Cray T3D, IBM SP3) Cluster (IBM SP2, DEC TruCluster, Microsoft Wolfpack, “Beowolf”, etc.) loosely coupled, multiple OS “MPP” (Intel TFLOPS,TM-5) tightly coupled & single OS MIMDMultiple Instruction s Multiple DataPVP Parallel Vector Processor UMAUniform Memory Access SMP Symmetric Multi-Processor NUMANon-Uniform Memory Access COMA Cache Only Memory Architecture NORMANo-Remote Memory Access CC-NUMA Cache-Coherent NUMA MPPMassively Parallel Processor NCC-NUMA Non-Cache Coherent NUMA Classification of Computers

SGI’2000Parallel Programming Tutorial Design Space of Competing Computer Architecture

SGI’2000Parallel Programming Tutorial Processor Cache Processor Cache I/O Main Memory Main Memory Main Memory Main Memory Processor Cache Central Bus Structure of an SMP System (1) Does NOT scale due to Bus- saturation Bus is a very complex Component High Memory- Latency due to the Complexity

SGI’2000Parallel Programming Tutorial Central Crossbar Processor Cache Processor Cache I/O Main Memory Main Memory Main Memory Main Memory Processor Cache Structure of an SMP System (2) Scales very well Crossbar is a very complex Component High Memory- Latency due to the Complexity

SGI’2000Parallel Programming Tutorial ^Nodeboard I/O Structure of an SMP System (3) Origin SGI NUMA Architecture SGI NUMA hypercube Global Switch Interconnect N N R R R RR R R R N N N N N N N N N N N N NN ^Nodeboard I/O

SGI’2000Parallel Programming Tutorial Systems are built from Modules Deskside (Module) Rack (2 Modules) Multi-rack (4 Modules) Etc CPUs 16 CPUs..128 CPUs 32 CPUs

SGI’2000Parallel Programming Tutorial SGI Origin 3200 SGI Onyx 3200 SGI Origin 3400 SGI Onyx 3400 SGI Origin 3800 SGI Onyx 3800 New High-End Products Origin 3000 Servers – Onyx 3 Systems IRIX 6.5

SGI’2000Parallel Programming Tutorial SGI 3800 System (16-512p) Minimum (16p) System 128p System 128P System Topology R Rack 1 C C C C R C C C C R Rack 2 C C C C R C C C C R Rack 3 C C C C R C C C C R Rack 4 C C C C R C C C C 1234 Power Bay I-Brick C-Brick Power Bay R-Brick C-Brick R-Brick C-Brick Power Bay C-Brick Power Bay R-Brick C-Brick R-Brick C-Brick Power Bay C-Brick Power Bay R-Brick C-Brick R-Brick C-Brick Power Bay C-Brick Power Bay R-Brick C-Brick R-Brick C-Brick Power Bay C-Brick Power Bay I-Brick P, I, or, X-Brick Power Bay P, I, or, X-Brick Power Bay P, I, or, X-Brick Power Bay P, I, or, X-Brick R-Brick 8-port router C-Brick Power Bay R-Brick C-Brick Power Bay

SGI’2000Parallel Programming Tutorial ASCI Blue Mountain Los Alamos National Laboratories o Origin 2000 with 3+ Tflops peak o 1+ Tflop Application Performance o 48 Systems with 128 CPUs each = 6144 CPUs o 1536 Gbyte Memory o 76 Tbyte Diskspace

SGI’2000Parallel Programming Tutorial Speed of Access 1/clock 64reg 32KB (L1) 8MB (L2) ~ s GB Cache subsystemmemory Device Capacity (size) ~4000 cy ~ cy (NUMA) ~10 cy ~2-3 cy disk Memory hierarchy p4p8p16p32p64p128p256p512p Remote Latency (ns) SN-MIPS Latency Origin2000 Latency

SGI’2000Parallel Programming Tutorial I/O Web serving Weather simulation CPU Storage Repository / archive Signal processing Media streaming Traditional big supercomputer Scale in Any and All Dimensions NUMAflex™ Flexible Configuration