© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Blue Gene/L Torus Interconnection Network N. R. Adiga, et.al IBM Journal.

Slides:



Advertisements
Similar presentations
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) The Black Widow High Radix Clos Network S. Scott, D.Abts, J. Kim, and W.
Advertisements

Presenter : Cheng-Ta Wu Kenichiro Anjo, Member, IEEE, Atsushi Okamura, and Masato Motomura IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39,NO. 5, MAY 2004.
Interconnection Networks: Flow Control and Microarchitecture.
Prof. Natalie Enright Jerger
A Novel 3D Layer-Multiplexed On-Chip Network
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene/P System Overview - Hardware.
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
Case study IBM Bluegene/L system InfiniBand. Interconnect Family share for 06/2011 top 500 supercomputers Interconnect Family CountShare % Rmax Sum (GF)
1 SpaceWire Router ASIC Steve Parkes, Chris McClements Space Technology Centre, University of Dundee Gerald Kempf, Christian Toegel Austrian Aerospace.
What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.
1 Lecture 23: Interconnection Networks Paper: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
CPU Chips The logical pinout of a generic CPU. The arrows indicate input signals and output signals. The short diagonal lines indicate that multiple pins.
Interconnection and Packaging in IBM Blue Gene/L Yi Zhu Feb 12, 2007.
Storage area network and System area network (SAN)
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Switching, routing, and flow control in interconnection networks.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
Paper Review Building a Robust Software-based Router Using Network Processors.
Blue Gene / C Cellular architecture 64-bit Cyclops64 chip: –500 Mhz –80 processors ( each has 2 thread units and a FP unit) Software –Cyclops64 exposes.
On-Chip Networks and Testing
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Elastic-Buffer Flow-Control for On-Chip Networks
Networks-on-Chips (NoCs) Basics
The Alpha Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented.
Shubhendu S. Mukherjee, Peter Bannon, Steven Lang, Aaron Spink, and David Webb Alpha Development Group, Compaq HOT Interconnects 9 (2001) Presented by.
Multiprocessor Interconnection Networks Todd C. Mowry CS 740 November 3, 2000 Topics Network design issues Network Topology.
ECE669 L21: Routing April 15, 2004 ECE 669 Parallel Computer Architecture Lecture 21 Routing.
Transport over Wireless Networks Myungchul Kim
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
CSE 661 PAPER PRESENTATION
The IBM Blue Gene/L System Architecture Presented by Sabri KANTAR.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
1 DSP handling of Video sources and Etherenet data flow Supervisor: Moni Orbach Students: Reuven Yogev Raviv Zehurai Technion – Israel Institute of Technology.
A Study of Cyclops64 Crossbar Architecture and Performance Yingping Zhang April, 2005.
Networks and Distributed Systems Mark Stanovich Operating Systems COP 4610.
The Alpha Network Architecture Mukherjee, Bannon, Lang, Spink, and Webb Summary Slides by Fred Bower ECE 259, Spring 2004.
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
NC2 (No6) 1 Maximally Adaptive Routing Maximize adaptivity for a double-x routing based on turn model. Virtual network 0 Virtual network 1 Maximally adaptive.
Lecture 16: Router Design
Interconnection network network interface and a case study.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Deadlock: Part II.
Networks and Distributed Systems Sarah Diesburg Operating Systems COP 4610.
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
Predictive High-Performance Architecture Research Mavens (PHARM), Department of ECE The NoX Router Mitchell Hayenga Mikko Lipasti.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Deadlock: Part II - Recovery.
Corse Overview Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip.
1 Lecture 29: Interconnection Networks Papers: Express Virtual Channels: Towards the Ideal Interconnection Fabric, ISCA’07, Princeton Interconnect Design.
Basic Low Level Concepts
Presented by: Nick Kirchem Feb 13, 2004
Lecture 23: Interconnection Networks
Physical constraints (1/2)
Interconnection Networks: Flow Control
Lecture 23: Router Design
Switching, routing, and flow control in interconnection networks
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Networks: Routing and Design
Lecture 25: Interconnection Networks
Switching, routing, and flow control in interconnection networks
Presentation transcript:

© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Blue Gene/L Torus Interconnection Network N. R. Adiga, et.al IBM Journal of Research & Development From

ECE 8813a (2) Overview An initiative for petaflops machine in support of computational biology Influenced by the success of “lattice” architectures targeted to specific problems  Customization and SoC technology  Better price/performance and energy/performance System: 32x32x64 nodes

ECE 8813a (3) Packaging and Scale-Up From buildup-800x571.jpg

ECE 8813a (4) Some Physical Notes Design point emphasizes “cellular” style problems  Nearest neighbor interconnect  Sensitivity to cabling Emphasize on performance/unit volume  Speed/energy Integration  Minimize parts count – separate NI cards/chips

ECE 8813a (5) Blue Gene/L Node Image from From Dual PPC MHz cores  Dual issue OOO core  Dual FPUs Five interconnects  Torus inter-processor communication network  Global/collectives network  Global barrier/interrupts  Ethernet  Control network

ECE 8813a (6) Blue Gene/L Node PPC K/32K L1 PPC K/32K/L1 L2 Shared SRAM buffer Shared L3 Directory (EDRAM) L3 or Memory (EDRAM) toruscollbarrier GigE 128 bits snoop 4 global barriers/Interrupts 3 I/Os6 I/Os 256 bits 2KB Not coherent across L1

ECE 8813a (7) The Router Microarchitecture 19x6 Byte wide input output ejection injection input output input output input output input output input output 7 2 each bypass Adaptive VC Escape VC High Priority VC Input pipeline 8 stage Pressure on routing logic and receiver arbiters Bit serial links – 175 MB/sec  Pin constraints One Kbyte VCs Switch speedup  Concurrent transfers to 2 senders

ECE 8813a (8) Router Vital Statistics 8 injection FIFOs  2 high priority and 6 normal 14 ejection FIFOs  Two groups of 7 oOne high priority and six normal for each direction Watermarks on the injection and ejection FIFOs tied to interrupts Deterministic bubble routing on the escape channel Worst case hardware latency through the node is 69 ns Area equivalent to one core

ECE 8813a (9) Arbitration Three stage arbitration  Join the shortest queue (JSQ) (RC + VCA) oUse token availability oUse deterministic VC oDo not compete  Serve the longest queue (SLQ) (for SA) o2-bit granularity o% of cycles devoted to randomized selection  Modified SLQ for SA allocation 19x6 Byte wide input output ejection injection input output input output input output input output input output 7 2 each

ECE 8813a (10) The Switching Layer Packet size varies from 32 bytes to 256 bytes (32 byte increments) Virtual cut through Token flow control: one token = 32 bytes  Not sufficient for deadlock freedom with variable sized packets?  Flow control + acknowledgements signaling protocol  Buffer allocation and freeing space in the retransmission buffer

ECE 8813a (11) The Switching Layer (cont.) Time-outs and link level retransmission for corrupted packets 32 byte chunk packet = 1 – 8 chunks 8 bytes - Link-level info - Routing info - VC & size - 8-bit CRC (protect header) 24 bit CRCvalid

ECE 8813a (12) The Routing Layer Adaptive and deterministic minimal path routing  Deadlock freedom: bubble router or deterministic Router registers store state  Neighbor coordinates  Hints early in the header to pipeline arbitration  Routing function implementation: hints+VCs Hardware broadcast down a single dimension 24 bit CRCvalid Hint bits Router State Registers Route

ECE 8813a (13) Tree Structured Collective Network One to all communication Embedded associative operations  min/max, add/sub, and/or Leaf to root latency of 2.5 microseconds Routing table driven

ECE 8813a (14) Fast Barrier/Interrupt Network Four tree structured networks for OR/AND operation 1.3 microseconds max delay User space accessible

ECE 8813a (15) Impact of Deadlock Avoidance Mechanism Asymmetry in traditional deadlock avoidance schemes From N. R. Adiga, et.al, “Blue Gene/L Torus Interconnection Network,” IBM J. Research and Development, March/May 2005

ECE 8813a (16) Impact of Adaptive Routing Diminishing returns for increasing number of virtual channels System non-uniformities affect maximum achievable link utilization  MPI all-to-all pattern

ECE 8813a (17) Summary Domain-specific system architecture  Note the impact of system design goals on the choices Heterogeneous interconnection network architecture