Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Alpha 21364 Network Architecture Mukherjee, Bannon, Lang, Spink, and Webb Summary Slides by Fred Bower ECE 259, Spring 2004.

Similar presentations


Presentation on theme: "The Alpha 21364 Network Architecture Mukherjee, Bannon, Lang, Spink, and Webb Summary Slides by Fred Bower ECE 259, Spring 2004."— Presentation transcript:

1 The Alpha 21364 Network Architecture Mukherjee, Bannon, Lang, Spink, and Webb Summary Slides by Fred Bower ECE 259, Spring 2004

2 It’s A Small Paper… …Packed With Detail Overview At High Level 21364 Chip Features and Built-In MP Constructs Network, Routing, and Router Basics More Depth Routing Policies Deadlock Avoidance Via Routing Policies What’s In A Router? Discussion

3 21364 Overview 21264 Core With MP Additions MC = Memory Controller Router Directory-Based CC Runs at Core Clock Buffering Capability 1.75 MB L2 Cache Figure 1: The Alpha 21364 Floorplan

4 The 21364 Network Topology 2-d Torus Limited Support for Imperfect Tori Allows Fault Remapping Virtual Cut-Through 316* Packet Router Buffer Simple, Adaptive Routing Constrained Within Minimum Rectangle Figure 2: A 12-Processor 21364 Network Configuration *316 Total Packets of Buffer Capacity Divided Unevenly Amongst Classes and Ports

5 Packet Classes Seven Packet Classes Request (3 Flits) Forward (3 Flits) Block Response (18 or 19 Flits) Non-Block Response (2 or 3 Flits) Write I/O (19 Flits) Read I/O (3 Flits) Special (1 or 3 Flits) Flits Are 32 Bits Data Plus 7 Bits ECC

6 Routing Policies: Minimum Rectangle Four Rectangles With Current and Destination At Diagonals Recall 2-d Torus – All Edges Wrap Constrain Adaptive Routing To Minimum Center of Figure 3 Figure 3: Routing Rectangles

7 Routing Basics Decode Of Packet Determines Routing Use Of Lookup Tables For Destination Resolution, Virtual Channel Assignments, and Broadcast Invalidation Clusters First Flit Has Routing And Packet Information ECC Checked/Corrected At Each Router Routers May Rewrite ECC Routers Send Feedback About Buffer Availability

8 Avoiding Coherence Deadlocks Virtual Channels Break Cyclic Dependence Separate Channel For Each Packet Class Guarantees Independence of Class Traffic Additional Ordering Constraint Amongst Classes of Packets Additional Measures To Preserve I/O Consistency Force Same-Class Requests To Arrive In-Order Using Deadlock-Free Virtual Channels Allow I/O Writes To Pass I/O Reads Using Separate Virtual Channels For Reads and Writes Prevent I/O Reads From Passing I/O Writes To Preserve Ordering Rules

9 Avoiding Routing Deadlocks 19 Virtual Channels 3 Networks For Each of 6 Packet Classes Plus 1 Special Adaptive, VC0, and VC1 Adaptive Is First Choice VC0 and VC1 Provide Guaranteed Drain If Adaptive Blocked Careful Selection of Rules To Break Deadlocks Within Dimensions and Across Dimensions

10 Internals Of The Router Pipelined Design 9 Pipeline Types Based Upon Input X Output Mapping Input/Output Either Local, Interprocessor, or I/O 13 Cycle In To Out Latency Key To Performance (Smaller Better) Recall Chip-Side At 1.2 GHz Network-Side Speed At 800 MHz Clock Sent With Outgoing Packets

11 Brief Conclusions Even With Moderate Constraints, Jelly- Bean MP Is Challenging Correctness, Deadlock-Avoidance, Buffering, Arbitration, and Performance Require Careful Consideration In Design This Paper Illustrates Where Network Latency Comes From Even A Fast Network Seems Slow Compared To Local Access

12 Discussion Was 2-d Torus the Right Shape For This Design? What Are the Limitations Imposed? How Is the 1.2 GHz Internal/800 MHz External Clock Discrepancy OK? Is MP Capability Better Than More Aggressive Core Optimizations For the Transistor Cost? What About SMT, CMP?


Download ppt "The Alpha 21364 Network Architecture Mukherjee, Bannon, Lang, Spink, and Webb Summary Slides by Fred Bower ECE 259, Spring 2004."

Similar presentations


Ads by Google