1 BGL Photo (system) BlueGene/L IBM Journal of Research and Development, Vol. 49, No. 2-3.

Slides:



Advertisements
Similar presentations
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene/P System Overview - Hardware.
Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992.
Microprocessor 8085/8086 Lecturer M A Rahim Khan Computer Engineering and Networks Deptt.
CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Supporting Parallel Applications on Clusters of Workstations: The Intelligent Network Interface Approach.
Case study IBM Bluegene/L system InfiniBand. Interconnect Family share for 06/2011 top 500 supercomputers Interconnect Family CountShare % Rmax Sum (GF)
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Top 500 Computers Federated Distributed Systems Anda Iamnitchi.
Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin.
Beowulf Supercomputer System Lee, Jung won CS843.
Zhao Lixing.  A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation.  Supercomputers.
System on a Chip (SoC) An Overview David Cheung Christopher Shannon.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
CS 213 Commercial Multiprocessors. Origin2000 System – Shared Memory Directory state in same or separate DRAMs, accessed in parallel Upto 512 nodes (1024.
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 4: Computer Performance.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Bugnion et al. Presented by: Ahmed Wafa.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Interconnection and Packaging in IBM Blue Gene/L Yi Zhu Feb 12, 2007.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.
BlueGene/L Power, Packaging and Cooling Todd Takken IBM Research February 6, 2004 (edited 2/11/04 version of viewgraphs)
Computer performance.
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
1 Computing platform Andrew A. Chien Mohsen Saneei University of Tehran.
Computer System Architectures Computer System Software
1 CS503: Operating Systems Spring 2014 Dongyan Xu Department of Computer Science Purdue University.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Cluster Workstations. Recently the distinction between parallel and distributed computers has become blurred with the advent of the network of workstations.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Advanced Computer Architecture Fundamental of Computer Design Instruction Set Principles and Examples Pipelining:Basic and Intermediate Concepts Memory.
CSE 661 PAPER PRESENTATION
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
The IBM Blue Gene/L System Architecture Presented by Sabri KANTAR.
Computer Organization & Assembly Language © by DR. M. Amer.
Cray Inc. Hot Interconnects 1 Bob Alverson, Duncan Roweth, Larry Kaplan Cray Inc.
Interconnection network network interface and a case study.
By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.
B5: Exascale Hardware. Capability Requirements Several different requirements –Exaflops/Exascale single application –Ensembles of Petaflop apps requiring.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Chapter 1: How are computers organized?. Software, data, & processing ? A computers has no insight or intuition A computers has no insight or intuition.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
BluesGene/L Supercomputer A System Overview Pietro Cicotti October 10, 2005 University of California, San Diego.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Background Computer System Architectures Computer System Software.
Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.
System on a Chip (SoC) An Overview David Cheung Christopher Shannon.
Architecture of Parallel Computers CSC / ECE 506 BlueGene Architecture 4/26/2007 Dr Steve Hunter.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
PARALLEL MODEL OF EVOLUTIONARY GAME DYNAMICS Amanda Peters MIT /13/2009.
CHAPTER 11: Modern Computer Systems
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Architecture & Organization 1
CS775: Computer Architecture
by Manuel Saldaña, Daniel Nunes, Emanuel Ramalho, and Paul Chow
Architecture & Organization 1
CLUSTER COMPUTING.
Chapter 1: How are computers organized?
Cluster Computers.
Presentation transcript:

1 BGL Photo (system) BlueGene/L IBM Journal of Research and Development, Vol. 49, No. 2-3.

Main Design Principles Some science & engineering applications scale up to and beyond 10,000 parallel processes; Improve computing capability, holding total system cost; Cost/perf trade-offs considering the end-use: –Applications <> Architecture <> Packaging Reduce complexity and size. –~25KW/rack is max for air-cooling in standard room. –Need to improve performance/power ratio. –700MHz PowerPC440 for ASIC has excellent FLOP/Watt. Maximize Integration: –On chip: ASIC with everything except main memory. –Off chip: Maximize number of nodes in a rack.. Large systems require excellent reliability, availability, serviceability (RAS)

3 Physical Layout of BG/L

4 The Compute Chip System-on-a-chip (SoC) 1 ASIC – 2 PowerPC processors – L1 and L2 Caches – 4MB embedded DRAM – DDR DRAM interface and DMA controller – Network connectivity hardware – Control / monitoring equip. (JTAG)

5 Compute and Node Cards

Node Architecture IBM PowerPC embedded CMOS processors, embedded DRAM, and system-on-a-chip technique is used mm square die size, allowing for a very high density of processing. The ASIC uses IBM CMOS CU micron technology. 700 Mhz processor speed close to memory speed. Two processors per node. Second processor is intended primarily for handling message passing operations

Midplane and Rack 1 rack holds 1024 nodes Nodes optimized for low power ASIC based on SoC technology –Outperform commodity clusters while saving on power –Aggressive packaging of processor, memory and interconnect –Power efficient & space efficient –Allows for latencies and bandwidths that are significantly better than those for nodes typically used in ASC scale supercomputers

The Torus Network 64 x 32 x 32 Each compute node is connected to its six neighbors: x+, x-, y+, y-, z+, z- Compute card is 1x2x1 Node card is 4x4x2 –16 compute cards in 4x2x2 arrangement Midplane is 8x8x8 –16 node cards in 2x2x4 arrangement Each uni-directional link is 1.4Gb/s, or 175MB/s. Each node can send and receive at 1.05GB/s. Supports cut-through routing, along with both deterministic and adaptive routing. Variable-sized packets of 32,64,96…256 bytes Guarantees reliable delivery 8

BG/L System Software System software supports efficient execution of parallel applications Compiler support for MPI-based C, C++, Fortran Front-end nodes are commodity PCs running Linux I/O nodes run a customized Linux kernel Compute nodes: extremely lightweight custom kernel –Space sharing, single-thread/processor (dual-threaded per node) –Flat address space, no paging –Physical resources are memory-mapped Service node is a single multiprocessor machine running a custom OS 9

Space Sharing BG/L system can be partitioned into electronically isolated sets of nodes (power of 2) Single-user, reservation-based for each partition Faulty hardware are electrically isolated to allow other nodes to continue to run in the presence of component failures.