CS 213 Commercial Multiprocessors. Origin2000 System – Shared Memory Directory state in same or separate DRAMs, accessed in parallel Upto 512 nodes (1024.

Slides:

Advertisements

Similar presentations

Multiple Processor Systems

Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Chapter 8-1 : Multiple Processor Systems Multiple Processor Systems Multiple Processor Systems Multiprocessor Hardware Multiprocessor Hardware UMA Multiprocessors.

♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.

SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.

.1 Network Connected Multi’s [Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005]

Multiple Processor Systems

Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.

Nov COMP60621 Concurrent Programming for Numerical Applications Lecture 6 Chronos – a Dell Multicore Computer Len Freeman, Graham Riley Centre for.

OGO 2.1 SGI Origin 2000 Robert van Liere CWI, Amsterdam TU/e, Eindhoven 11 September 2001.

IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)

1 BGL Photo (system) BlueGene/L IBM Journal of Research and Development, Vol. 49, No. 2-3.

Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

An Overview of Myrinet By: Ralph Zajac. What is Myrinet? n LAN designed for clusters n Based on USCD’s ATOMIC LAN n Has many characteristics of MPP message-passing.

Lappeenranta University of Technology June 2015Igor Monastyrnyi HPC2N - High Performance Computing Center North System hardware.

Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.

1 Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.

Parallel Processing Architectures Laxmi Narayan Bhuyan

Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.

IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.

Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.

General Purpose Node-to-Network Interface in Scalable Multiprocessors CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley.

CS252/Patterson Lec /28/01 CS162 Computer Architecture Lecture 16: Multiprocessor 2: Directory Protocol, Interconnection Networks.

Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.

Storage area network and System area network (SAN)

Prepared by Careene McCallum-Rodney Hardware specification of a computer system.

9/20/6Lecture 3 - Instruction Set - Al1 Interfacing Devices to the

Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

INFO1119 (Fall 2012) INFO1119: Operating System and Hardware Module 2: Computer Components Hardware – Part 2 Hardware – Part 2.

 Chasis / System cabinet  A plastic enclosure that contains most of the components of a computer (usually excluding the display, keyboard and mouse)

1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)

HyperTransport™ Technology I/O Link Presentation by Mike Jonas.

Figure 1-2 Inside the computer case

Multiprocessor systems Objective n the multiprocessors’ organization and implementation n the shared-memory in multiprocessor n static and dynamic connection.

Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.

Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.

Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.

1 Computer System Organization I/O systemProcessor Compiler Operating System (Windows 98) Application (Netscape) Digital Design Circuit Design Instruction.

Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.

Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963.

Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.

Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.

MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,

Embedded Network Interface (ENI). What is ENI? Embedded Network Interface Originally called DPO (Digital Product Option) card Printer without network.

Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.

Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.

9/20/6Lecture 12 - Interfacing Devices1 Interfacing Devices to the

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

Understanding Parallel Computers Parallel Processing EE 613.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.

Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.

Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.

Network Connected Multiprocessors

Berkeley Cluster Projects

HyperTransport™ Technology I/O Link

Interconnection Network Routing, Topology Design Trade-offs

Lecture: Memory, Multiprocessors

Interconnection Networks Contd.

Multiple Processor and Distributed Systems

Chapter 4 Multiprocessors

CINECA HIGH PERFORMANCE COMPUTING SYSTEM

Cluster Computers.

Interconnection Network and Prefetching

Presentation transcript:

CS 213 Commercial Multiprocessors

Origin2000 System – Shared Memory Directory state in same or separate DRAMs, accessed in parallel Upto 512 nodes (1024 processors) With 195MHz R10K processor, peak 390MFLOPS or 780 MIPS per proc Peak SysAD bus bw is 780MB/s, so also Hub-Mem Hub to router chip and to Xbow is 1.56 GB/s (both are off-board)

Origin Network Each router has six pairs of 1.56MB/s unidirectional links –Two to nodes, four to other routers –latency: 41ns pin to pin across a router Flexible cables up to 3 ft long Four “virtual channels”: request, reply, other two for priority or I/O

Cray T3D – Shared Memory Build up info in ‘shell’ Remote memory operations encoded in address

IBM Power 4 – Shared Memory

Power-4 Multi-chip Module

32-way SMP

NOW – Message Passing General purpose processor embedded in NIC to implement VIA – to be discussed later

Myrinet – Message Passing

Interface Processor

InfiniBand – Message Passing

Latency Comparison

Cray XD1 – Message Passing Four chassis, each holding six blades, each containing a dual (quad) 2.4 GHz AMD Opteron motherboard with 4GB of RAM and one 74 GB hard disk. The interconnection topology, shown in Fig. 1 has three levels of latencies: 1.communication time between the CPUs inside one blade is through shared memory 2.very fast message passing communication among blades within a chassis, 3.slower message passing communication between two different chassis

Latency Comparison

Dual (Quad) SMP and Hardware Accelerator

IBM SP2 – Message Passing