Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architecture and Design of the AlphaServer GS320 Gharachorloo, et al. (Compaq) Presented by Curt Harting

Similar presentations


Presentation on theme: "Architecture and Design of the AlphaServer GS320 Gharachorloo, et al. (Compaq) Presented by Curt Harting"— Presentation transcript:

1 Architecture and Design of the AlphaServer GS320 Gharachorloo, et al. (Compaq) Presented by Curt Harting http://h18002.www1.hp.com/alphaserver/gs320/

2 Motivation Make money – server revenue at the time was in 4 – 64 processor systems Snooping protocols work really well on small systems (<8 processors) but don’t scale well Directory structures at the time were made for large (>64 processors) systems, but are too slow for mid-range multiprocessors

3 The problems Snooping  Limited by bandwidth  Too much for each controller to do per cycle Directories  Long latency  Too much glue (Amdahl’s Law)

4 Overview 32 or 64 processor directory machine 8 Quad-Processor Building Blocks connected in a crossbar Each QBB has:  4 processors (with external L2)  4 memory modules  1 I/O interface  1 Global Port DTAG DIR (14 bits per line) TTT 4 request types: read, readX, X, X without data

5 Reducing Latency No waiting for invalidated copies to ACK on a GETX No Nack’ing Directory updates state as soon as the request arrives Dirty-Sharing NUMA

6 The Three Lane Information Super-Highway Information is passed on three virtual lanes (and an IO lane).  Q0: Carries a message from processor to the block’s home Point to point ordering must occur  Q1: Carries messages from the home Point of serialization! Must have total order  Q2: Replies/data

7 An example Reproduction of Figure 2d

8 Caveats Early request race - request gets to the owner before the data does  Solution: Stall the Q1 until the data arrives Late request race – request for data arrives after a writeback operation  Solution: Buffer victim until a writeback ACK is received Intra-Node transactions – Check TTT, possible loop through global Markers – Used to preserve global order

9 Memory Consistency A quick very high-level overview:  Separation of data and requests  The previously atomic response has been split into two parts: the commit and the data  Lots of regulations of what can go when (still)

10 Questions The total ordering of the Q1 lane “comes naturally in a crossbar switch”? The GS320 is said to be expandable to 64 processors, but the system detailed in the paper is tailored to 32 processors. How easily can it be expanded? Addressing has been a major issue in other papers, but it is not discussed in this one. Why?


Download ppt "Architecture and Design of the AlphaServer GS320 Gharachorloo, et al. (Compaq) Presented by Curt Harting"

Similar presentations


Ads by Google