Download presentation
Presentation is loading. Please wait.
Published byJennifer Ray Modified over 9 years ago
1
The Alpha 21364 Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented by Luis Alfredo Campos
2
Alpha 21364 Goals Support communication-intensive server applications –High performance technical computing –Database servers –Web servers –Telecommunication applications Achieve: –Extremely low latency –Enormous bandwidth –Support directory cache coherence Improve: –Reliability –Availability
3
Overview Alpha 21264 core with enhancements Tightly-Coupled multiprocessor network –Connects up to 128 processors –Two-Dimensional torus network Integrated L2 Cache Integrated memory controllerRouter –Directory-Based CC –Separate Virtual Channels –Packet Classes
4
Network Packet Classes Seven Packet Classes –Request (3 Flits) –Forward (3 Flits) –Block Response (18 or 19 Flits) –Non-Block Response (2 or 3 Flits) –Write I/O (19 Flits) –Read I/O (3 Flits) –Special (1 or 3 Flits) Flits Are 32 Bits Data Plus 7 Bits ECC
5
Network Architecture Two-dimensional torus –Limited Support for Imperfect Tori Allows Fault Remapping Virtual Cut-Through Routing –Buffer space for 316 packets
6
Adaptive Routing Four Rectangles With Current and Destination At Diagonals Packets route within the minimum rectangle Maximize the bandwidth between source and destination
7
Avoiding Deadlocks in Adaptive Routing “Adaptive routing will not deadlock a network as long as packets can drain via a deadlock-free path” 19 Virtual Channels –3 sets of virtual channel per Packet class except for the Special Class (only one channel) Adaptive, VC0, and VC1 –Adaptive Is First Choice –VC0 and VC1 combination creates deadlock-free network
8
Router Architecture 9 pipeline types –Input and Output: Local, Interprocessor, and I/O Pin to pin latency of 13 cycles –Running at 1.2 Ghz Network Links run 33% slower –Running at 0.8 Ghz –Synchronous with outgoing links –Asynchronous with incoming links
9
Arbitration Needs to avoid central bottleneck –16 local arbiters –7 global arbiters Least Recently Selected (LRS) Scheme –Local Arbiters Classes Virtual Channel –Global Arbiters Input ports Rotary Rule mode –Priority to oldest packets Coherence Dependence Priority (CDP) Rule mode –Priority depending on class ordering
10
Questions How Is the 1.2 GHz Internal/800 MHz External Clock OK? Why 2-d Torus? –What Are the Limitations Imposed?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.