Dezső Sima november (Ver. 1.0) Sima Dezső, 2008 DP/MP System Architectures
Contents 2. Intel’s DP servers 3. Intel’s MP servers 1. The evolution of Intel’s basic microarchitectures 4. AMD’s servers
1. The evolution of Intel’s basic microarchitectures
1. The evolution of Intel’s basic microarchitectures (1) Figure: Intel ’ s Tick-Tock development model [22]
1. The evolution of Intel’s basic microarchitectures (2) Figure: The speed of changes in Intel ’ s Tick-Tock development model [24]
1. The evolution of Intel’s basic microarchitectures (3) Figure: Key enhancements introduced into the Core2 microarchitecture (vs the Pentium4) [22] Wide dynamic execution - 4-wide decode/rename/retire Advanced digital media processing bit wide SSE execution unit Improved graphics/MM - New SSE 4.1 instructions Smart memory access - Memory disambiguation (spec. loads) - Hardware prefetching Advanced smart cache - Low latency, high BW shared L2 cache
1. The evolution of Intel’s basic microarchitectures (4) Figure: Key enhancements introduced into the Penryn microarchitecture (vs the Core) [23]
1. The evolution of Intel’s basic microarchitectures (5) Figure: Improvements introduced into the Nehalem microarchitecture (vs Penryn) [22]
1. The evolution of Intel’s basic microarchitectures (6) Figure: Hyperthreading in the Nehalem microarchitecture [22]
1. The evolution of Intel’s basic microarchitectures (7) 2-level cache hierarchy 3-level cache hierarchy Figure: 3-level cache hierarchy of Nehalem [22]
1. The evolution of Intel’s basic microarchitectures (8) Figure: Nehalem ’ s innovations in the system architecture [22]
1. The evolution of Intel’s basic microarchitectures (9) Figure: Nehalem ’ s innovations in the system architecture [22]
QickPath Interconnect 3.2 GHz DDR 20-bit (16-bit data 4-bit CRC) on each lane 12.8 GT/s on each direction Fastest FSB Formerly: Common System interconnect (CSI) 400 MHz QDR 8 Byte 12.8 GT/s bidirectional HyperTransport Bus HT 1.0: 0.8 GHz DDR 2-Byte 3.2 GT/s on each direction HT 2.0: 1.0 GHz DDR 2-Byte 4.0 GT/s on each direction HT 3.0: 2.6 GHz DDR 2-Byte 10.4 GT/s on each direction Typical speed and width figures in AMD ’ s systems 1. The evolution of Intel’s basic microarchitectures (10)
Figure: Die shot of Nehalem [45] 1. The evolution of Intel’s basic microarchitectures (11)
2. Intel’s DP Servers
Figure: Typical configuration of an early DP-server motherboard based on Intel’s E7500/E7501 (Plunas) chipset P4 ICH3-S FWH E7500/E7501 SDRAM interface SDRAM interface DDR 200/266 registered, ECC opt. Ultra ATA/100 PCI v.2.2 USB v. 1.1 GPIO FSB LPC HI 1.5 P4 (with RASUM) HI 2.0 PCI-X v.2.2 Prestonia MCH 400/533 MHz 8/12/16 GB HI 2.0 PCI-X bridge SATA c. GbE c. PCI-X v.2.2 SATA GbE Video c. MbE c. PCI v.2.2 LAN (5 ports) SVGA MbE SIO FDKBMSSPPP SCSI c. SCSI (1-2 slots) (3 slots) *100 ~ (2 ports) 2. Intel’s DP servers (1)
Figure: Typical configuration of an advanced early DP-server motherboard based on Intel’s E7520 (Lindenhurst) chipset ICH5R FWH E7520 SDRAM interface SDRAM interface DDR 266/333, DDR2 400 registered, ECC opt. Ultra ATA/100 PCI v.2.3 USB v. 2.0 SATA AC' 97 v.2.3 GPIO FSB LPC HI 1.5 (with RASUM) PCI E. x8 PCI-X v.1.0b Nocona Paxville DP Nocona Paxville DP MCH 800 MHz 16/24/32 GB PCI E. x8 PCI-X bridge SCSI c. GbE c. PCI-X v.1.0b PCI E. x8 (or 2x x4) SCSI GbE Video c. MbE c. PCI v.2.3 LAN (4 ports) SVGA MbE SIO FDKBMSSPPP ~1.4 2*100 2*150 ~ (2 ports) 2. Intel’s DP servers (2) P4
2. Intel’s DP servers (3) Paxville DP 2.8 2xIrwindale cores/90 nm Figure: Intel ’ s Pentium 4 based DC DP server processors [33], [34]
Nocona Paxville IrwindaleNocona (L2 enlarged to 2MB) (2 x Irwindale cores) 6/ nm 112 mm mtrs mPGA 604 2/ nm 135 mm mtrs mPGA / nm 2 x 135 mm 2 2 x 169 mtrs Xeon DP 2.8 Xeon MP mPGA 604 Figure: Genealogy of the Xeon Paxville core (DP enhanced Prescott)(DP enhanced Prescott 2M) Sources: Intel’s first 64-bit Xeon In contrast: corresponding desktop processors have the LGA 775 socket. 2. Intel’s DP servers (4)
2. Intel’s DP servers (5) Xeon 5000 (Dempsey) Paxville DP 2.8 2xIrwindale cores/90 nm 2xCedar Mill/65 nm (65 nm shrink of the Irwindale) Figure: Intel ’ s Pentium 4 based DC DP server processors [33], [34]
2. Intel’s DP servers (6) Xeon 5100 (Woodcrest) Core2-based/65 nm Xeon 5300 (Clowertown) Core2-based/65 nm 2xXeon 5100 Figure: Intel ’ s Core2 based DC/QC DP server processors [33], [35], [36]
2. Intel’s DP servers (7) Figure: Intel ’ s Penryn based QC DP server processor/45 nm (Source: Intel) Xeon 5400 (Harpertown)
2. Intel’s DP servers (8) Figure: Contrasting the die shots of the Xeon 5400 and 5300 processors [24]
2. Intel’s DP servers (9) Series --- (Paxville DP) 5000 (Dempsey) 5100 (Woodcrest) 5200 (Wolfdale) 5300 (Clovertown) 5400 (Harpertown) Dual/Quad-CoreDC QC ModelsXeon DP E5205/E5260/ X5275 E /X5355 E5405-E5472, X5450-X5482 MicroarchitecturePentium 4 Core2PenrynCore2Penryn Core2*Irwindale dies2*Cedar diesSingle die2*Woodcrest dies2*Penryn Intro.10/20055/20066/200611/200711/200611/2007 Techology90 nm65 nm 45 nm65 nm45 nm Die size2*135 mm 2 2*81 mm mm 2 2*143 mm 2 2*107 mm 2 Nr. of transistors2*169 mtrs2*188 mtrs291 mtrs2*291 mtrs2*410 mtrs Fc [GHz] L22*2 MB 4 MB6 MB2*4 MB2*6 MB FSB [MT/s]800667/ / / / /1600 TDP [W]13595/13065/80 80/12080/120/150 SocketPGA 604LGA 771 EM64T HT --- ED VT EIST (5140 or above) La Grande--- AMT2--- Flex Migration--- Table: Intel ’ s DC, QC DP servers
2. Intel’s DP servers (10) Gainstown (Q1/2009) (Q1/2010?) Nehalem-based/45 nm Westmere_based/32 nm (Socket 1366) ??? Figure: Intel ’ s future DP server processors [21] (Both 2-way multithreaded)
Figure: Overview of the implementation of Intel ’ s Tick-Tock model for DP servers [24] 2x1 C, 2 MB L2/C 5000 (Dempsy) 1x2 C, 4 MB L2/C 5100 (Woodchrest) 2x2 C, 4 MB L2/C 5300 (Clowertown) 2x2 C, 6 MB L2/2C 5400 (Harpertown) 1x4 C, ¼ MB L2/C 8 MB L3, 5xxx (Gainstown) 1x6 C, ¼ MB L2/C 12 MB L3, 5xxx (???) 2. Intel’s DP servers (11)
Figure: Evolution of Intel’s DP servers 800MT/s 7520 (Lindenhurst) Nocona Paxville SC/DC Nocona Paxville SC/DC 24 Lanes PCIe 7.5GB/s Dual DDR2 400 MT/s 6.4 GB/s 1066MT/s 17.1 GB/s Dempsey Woodcrest Clowertown DC 5000 (Blackford) 24 Lanes PCIe 7.5GB/s Dempsey Woodcrest Clowertown DC Quad FB-DIMM 533 MT/s 17.1 GB/s 2. Intel’s DP servers (16) 6.4 GB/s
Figure : Intel’s late Pentium4 based and subsequent DP server platforms DP Platforms Xeon DP 2.8 DC 10/2005 DP Cores DP Chipsets 2. Intel’s DP servers (12) 90 nm/2*169 mtrs 2*2 MB L2 800 MT/s PGA /2004 (Lindenhurst) 800 MT/s 2 x DDR/DDR2 16 GB Pentium4-based (90/65 nm) /Paxville DP) DC
Figure: Evolution of Intel’s DP servers 800MT/s 7520 (Lindenhurst) Nocona Paxville DC SC/DC Nocona Paxville SC/DC 24 Lanes PCIe 7.5GB/s Dual DDR2 400 MT/s 6.4 GB/s 2. Intel’s DP servers (13) 6.4 GB/s
Figure: Typical configuration of an advanced early DP-server motherboard based on Intel’s E7520 (Lindenhurst) chipset ICH5R FWH E7520 SDRAM interface SDRAM interface DDR 266/333, DDR2 400 registered, ECC opt. Ultra ATA/100 PCI v.2.3 USB v. 2.0 SATA AC' 97 v.2.3 GPIO FSB LPC HI 1.5 (with RASUM) PCI E. x8 PCI-X v.1.0b Nocona Paxville DP Nocona Paxville DP MCH 800 MHz 16/24/32 GB PCI E. x8 PCI-X bridge SCSI c. GbE c. PCI-X v.1.0b PCI E. x8 (or 2x x4) SCSI GbE Video c. MbE c. PCI v.2.3 LAN (4 ports) SVGA MbE SIO FDKBMSSPPP ~1.4 2*100 2*150 ~ (2 ports) 2. Intel’s DP servers (14) P4
Figure : Intel’s late Pentium4 based and subsequent DP server platforms DP Platforms Xeon DP 2.8 DC 10/2005 DP Cores Xeon 5100Xeon 5300Xeon /2006 6/2006 5/2006 DP Chipsets (Dempsey) DC(Woodcrest) DC(Clowertown) QC / P 5000V/Z 6/2006 (Blackford) (Blackford V/Z) 2xFSB 1066MT/s 4 x FBDIMM (DDR2) 64GB 2 x FBDIMM (DDR2) 16GB 2. Intel’s DP servers (15) (Bensley) 65 nm/291 mtrs 4 MB L2 667/1066 MT/s LGA771 Pentium4/Core2-based (65 nm) 65 nm/2*188 mtrs 2*2 MB L2 667/1066 MT/s LGA nm/2*291 mtrs 2*4 MB L2 667/1066 MT/s LGA nm/2*169 mtrs 2*2 MB L2 800 MT/s PGA /2004 (Lindenhurst) 800 MT/s 2 x DDR/DDR2 16 GB Pentium4-based (90/65 nm) /Paxville DP) DC
2. Intel’s DP servers (17) Intel ’ s Bensley platform [30] (Actually the block diagram of Tyan ’ s S5370 DP server)
FB-DIMM DDR2 64 GB 5000P SBE2 Xeon DC/QC 5000 DC 5100 DC 5300 QC Figure: Bensley DP motherboard, with the 5000 (Blackford) chipset (Supermicro X7DB8+) for the Xeon 5000 DC/QC DP processor families [7] 2. Intel’s DP servers (18)
Table: Latency and bandwidth scaling of the Intel 5000 platform (2006) vs the earlier generation (2004) [1] 2. Intel’s DP servers (19)
Figure : Intel’s late Pentium4 based and subsequent DP server platforms DP Platforms Xeon DP 2.8 DC 10/2005 DP Cores Xeon 5100Xeon 5300Xeon 5400Xeon / /2006 6/2006 5/2006 DP Chipsets (Dempsey) DC(Woodcrest) DC(Clowertown) QC(Harpertown) QC / P 5000V/Z /2006 (Blackford) (Blackford V/Z) 10/2007 2xFSB 1066MT/s 4 x FBDIMM (DDR2) 64GB 2 x FBDIMM (DDR2) 16GB /2007 (San Clemente) 2xFSB 1333/1066 MT/s 2 x DDR2 32/48 GB 2. Intel’s DP servers (20) (Bensley) (Cranberry Lake) 65 nm/291 mtrs 4 MB L2 667/1066 MT/s LGA771 Pentium4/Core2-based (65 nm) Penryn-based (45 nm) 65 nm/2*188 mtrs 2*2 MB L2 667/1066 MT/s LGA nm/2*291 mtrs 2*4 MB L2 667/1066 MT/s LGA nm/850 mtrs 2*6 MB L2 1066/1333 MT/s LGA nm/2*169 mtrs 2*2 MB L2 800 MT/s PGA604 Xeon 5200 (Harpertown) DC 45 nm/850 mtrs 2*6 MB L2 1066/1333 MT/s LGA /2004 (Lindenhurst) 800 MT/s 2 x DDR/DDR2 16 GB Pentium4-based (90/65 nm) /Paxville DP) DC
2. Intel’s DP servers (21) Figure: The Cranberry Lake platform [19] Xeon 5400 (QC) Xeon 5200 (DC) 5100 chipset
1066MT/s 17.1 GB/s Tylersburg Nehalem QC Nehalem QC DMI PCI Express Gen 2 2. Intel’s DP servers (22)
2. Intel’s DP servers (23) Figure: Intel ’ s forthcoming Nehalem-based DP server system architecture [31] QuickPath Interconnect Integrated memory controller
3. Intel’s MP servers
3. Intel’s MP servers (1) Figure: Intel ’ s Pentium4 based Xeon MP processors [17], [18] Tulsa (7100) 90 nm 65 nm CDM: Cedar Mill core (65 nm shrink of the Irwindale core) Potomac Paxville MP (7000)
3. Intel’s MP servers (2) Figure: Intel ’ s Core2 /Penryn based Xeon MP processors [19], [20] 65 nm 45 nm Core2 based Penryn based Dunnington (7400) Tigerton QC (7300) Tigerton DC (7300) Core2 based 65 nm
Table: Dual- and Quad-Core Xeon MP-lines 1 Concerning the L2 cache size, there is a contradiction in Intel’s dokumentation; according to the data sheets, models of the 7000 series include 1 or 2 MB L2 caches, in contrast the comparison charts for all models shows 1 MB large L2 caches. 3. Intel’s MP servers (3) Series 7000 (Paxville MP) 7100 (Tulsa) 7200 (Tigerton DC) 7300 (Tigerton QC) 7400 (Dunnington QC) 7400 (Dunnington 6C) Dual/Quad-CoreDC 2xSC2xDCQC 6C Models M-7140M / 7110N-7150N E7210/E7220 E7310/E7320/E7330/E73 40/X7350 E7420-E7440E7450/X7460 MicroarchitectureNetburst Core 2Penryn Core 2xIrwindale dies Cedar Mill-based single die 2xSC Woodcrest dies 2xWoodcrest dies Intro.11/20058/20069/20079/2008 Techology90 nm65 nm 45 nm Die size2*135 mm mm 2 2*143 mm mm 2 Nr. of transistors2*169 mtrs1328 mtrs2*291 mtrs1900 mtrs Fc [GHz] / /2.13/2.4/2.4/ /2.66 L22*1/2 MB 1 2*1 MB2*4 MB2*2/2*2/2*3/2*4/2*4 MB3*2 MB 3*3 MB L3---4/8/16 MB--- 8/12/16 MB 12/16 MB FSB [MT/s]667/ TDP [W]95/ /80/80/80/ /130 SocketmPGA604 EM64T HT --- ED VT EIST La Grande--- n.a. AMT2--- (Except E7310)n.a.
3. Intel’s MP servers (4) Figure: Intel ’ s Nehalem based MP server processor [21]
Figure: Overview of the implementation of Intel ’ s Tick-Tock model for MP servers [24] 2x1 C, 1 MB L2/C 16 MB L3, 7100 (Tulsa) 1x2 C, 4 MB L2/C 7200 (Tigerton DC) 2x2 C, 4 MB L2/C 7300 (Tigerton QC) 1x6 C, 3 MB L2/2C 16 MB L (Dunnington) 1x8 C, ¼ MB L2/C 24 MB L3, 7xxx (Beckton) 2. Intel’s MP servers (5) TICK Pentium 4 /Prescott) 90nm 1x1 C, 8 MB/C (Potomac) TOCK Pentium 4 /Irwindale) 90 nm 2x1 C, ½ MB/C 7000 (Paxville MP) 1x1 C, 1 MB/C (Cransfield)
Table: Overview of Intel ’ s DP and MP server processors 2. Intel’s MP servers (6) Core/technologyDP server processorsMP server processors Pentium465 nm 2x1 C, 2 MB L2/C5000 (Dempsy)2x1 C, 1 MB L2/C 16 MB L3,7100 (Tulsa) Core265 nm 1x2 C, 4 MB L2/C5100 (Woodchrest) 2x2 C, 4 MB L2/C5300 (Clowertown) 1x2 C, 4 MB L2/C7300 (Tigerton DC) 2x2 c, 4 MB L2/C7300 (Tigerton QC) Penryn45 nm 2x2 C, 6 MB L2/2C5400 (Harpertown)1x6 C, 3 MB L2/2C 16 MB L37400 (Dunnington) Nehalem45 nm 1x4 C, ¼ MB L2/C 8 MB L3,5xxx (Gainstown)1x8 C, ¼ MB L2/C 24 MB L3,7xxx (Beckton) Westmere32 nm 1x6 C, ¼ MB L2/C 12 MB L3,5xxx (???)
Figure: Evolution of Intel’s Xeon MP-based system architecture (until the appearance of Nehalem) Preceding NBs Xeon MP 1 3. Intel’s MP servers (7) SC 1 Xeon MP before Potomac Typically HI 1.5 (266 MB/s)
Figure: Overview of the implementation of Intel ’ s Tick-Tock model for DP servers [24] 2x1 C, 1 MB L2/C 16 MB L3, 7100 (Tulsa) 1x2 C, 4 MB L2/C 7300 (Tigerton DC) 2x2 C, 4 MB L2/C 7300 (Tigerton QC) 1x6 C, 3 MB L2/2C 16 MB L (Dunnington) 1x8 C, ¼ MB L2/C 24 MB L3, 7xxx (Beckton) 2. Intel’s MP servers (5) TICK Pentium 4 /Prescott) 90nm 1x1 C, 8 MB/C (Potomac) TOCK Pentium 4 /Irwindale) 90 nm 2x1 C, ½ MB/C 7000 (Paxville MP) 1x1 C, 1 MB/C (Cransfield)
3. Intel’s MP servers (8) Figure: Former Pentium II/III MP systemarchitecture [32]
MP Platforms Xeon /2005 MP Cores Xeon /2006 MP Chipsets 3/2005 4/ (Paxville MP DC)(Tulsa DC) (Twin Castle) (?) Figure : Intel’s Xeon-based MP server platforms 2xFSB 667 MT/s 4 x XMB (2 x DDR2) 32GB 2xFSB 800 MT/s 4 x XMB (2 x DDR2) 32GB Truland 65 nm/1328 mtrs 2x1 MB L2 16/8/4 MB L3 800/667 MT/s mPGA 604 P4-based/65 nm 3/2005 Xeon MP 3/2005 (Potomac SC) 90 nm/2x169 mtrs 2x1 (2) MB L /667 MT/s mPGA nm/675 mtrs 1 MB L2 8/4 MB L3 667 MT/s mPGA 604 P4-based/90 nm Truland 3. Intel’s MP servers (9)
Figure: Evolution of Intel’s Xeon MP-based system architecture (until the appearance of Nehalem) Preceding NBs Xeon MP 1 3. Intel’s MP servers (10) SC 1 Xeon MP before Potomac Typically HI 1.5 (266 MB/s) (Twin Castle) XMB 8500/ PCIe lanes + HI 1.5 Truland Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC (266 MT/s) (7 GT/s) DC Cransfield SC) Tulsa (DC) 3 The 8500 supports also 2 First x86-64 MP processor
eXxternal Memory Bridge Independent Memory Interface 5.33 GB inbound BW 2.67 GB outbound BW simultaneously Figure: Intel’s 8501 chipset for MP servers (4/ 2006) [4] Xeon DC MP 7000 (4/2005) or later DC/QC MP 7000 processors Intelligent MC Dual mem. channels DDR 266/333/400 4 DIMM/channel (North Bridge) 3. Intel’s MP servers (11) Serial link
7000/7100 FB-DIMM DDR2 64 GB Figure: Quad socket Intel E8501 chipset based motherboard (Supermicro X6QT8) for the Xeon 7000/7100 DC MP processor families [7] Xeon DC E8501 NB ICH5R SB 3. Intel’s MP servers (12)
Figure Bandwith bottlenecks in Intel’s 8501 MP server platform [2] 3. Intel’s MP servers (13)
MP Platforms Xeon /2005 MP Cores Xeon 7200Xeon 7300 Xeon /2007 8/2006 MP Chipsets 3/2005 4/2006 9/ (Paxville MP DC)(Tulsa DC) (Tigerton DC) (Tigerton) QC Caneland 9/2007 (Clarksboro) (Twin Castle) (?) Figure : Intel’s Xeon-based MP server platforms 2xFSB 667 MT/s 4 x XMB (2 x DDR2) 32GB 2xFSB 800 MT/s 4 x XMB (2 x DDR2) 32GB 4xFSB 1066 MT/s 4 x FBDIMM (DDR2) 512GB Truland Xeon /2008 (Dunnington 6C) 65 nm/1328 mtrs 2x1 MB L2 16/8/4 MB L3 800/667 MT/s mPGA nm/2x291 mtrs 2x4 MB L MT/s mPGA nm/2x291 mtrs 2x(4/3/2) MB L MT/s mPGA nm/1900 mtrs 9/6 MB L2 16/12/8 MB L MT/s mPGA 604 P4-based/65 nmCore2-based/65 nmCore2-based/45 nm 3/2005 Xeon MP 3/2005 (Potomac SC) 90 nm/2x169 mtrs 2x1 (2) MB L /667 MT/s mPGA nm/675 mtrs 1 MB L2 8/4 MB L3 667 MT/s mPGA 604 P4-based/90 nm TrulandCaneland Intel’s MP servers (14)
Figure: Evolution of Intel’s Xeon MP-based system architecture (until the appearance of Nehalem) Preceding NBs Xeon MP 1 (Clarksboro) Tigerton XMB 3. Intel’s MP servers (15) 6C/QC/DC SC FB-DIMM (DDR2) 28 PCIe lanes + HI 1.5 Dunnington 8 PCI-E lanes + ESI Truland Caneland Xeon MP before Potomac Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC Cransfield SC) Tulsa (DC) 3 The 6500 supports also 2 First x86-64 MP processor (266 MT/s) Typically HI 1.5 (266 MB/s) (7 GT/s) (2 GT/s)(1 GT/s) QC/DC (Twin Castle) 8500/8501 DC
Figure: Intel’s four socket 7300 (Caneland) platform, based on the 7300 (Clarksboro) chipset for the Xeon 7200/7300 DC/QC MP families (9/2007) [6] FB-DIMM up to 512 GB 7200 (Tigerton DC, Core2), DC Xeon 7300 (Tigerton QC, Core2), QC 3. Intel’s MP servers (16)
FB-DIMM DDR2 192 GB ATI ES1000 Graphics with 32MB video memory 7200 DC 7300 QC (Tigerton) Xeon Figure: Caneland MP motherboard, with the 7300 (Clarksboro) chipset (Supermicro X7QC3) for the Xeon 7200/7300 DC/QC MP processor families [7] SBE2 SB 7300 NB 3. Intel’s MP servers (17)
Figure: Performance comparison of the Caneland platform with a quad core Xeon (7300 family) vs the Bensley platform with a dual core Xeon 7140M [13] 3. Intel’s MP servers (18)
Beckton 8C 3. Intel’s MP servers (19) QPI QPI: QuickPath Interconnect QPI Figure: Intel ’ s Nehalem based MP server 4xFB-DIMM
3. Intel’s MP servers (20) FB-DIMM (DDR2) QPI Figure: Intel ’ s Nehalem based MP server system architecture [22]
4. AMD’s servers
nm/193 mm mtrs/82-89 W L2: 1 MB HT: 1.0, 0.8 GHz K8 90 nm/114 mm mtrs/95 W L2: 1 MB HT: 1.0, 1.0 GHz K8 90 nm/199 mm mtrs/95 W L2: 2*1 MB HT 1.0, GHz K8 90 nm/230 mm mtrs/95/110 W L2: 2*1 MB HT 1.0, GHz K8 65 nm/285 mm mtrs/95 W L2: 512 KB/C L3: 2 MB HT 3.0, 1.0 GHz K10 45 nm/243 mm mtrs/75W L2: 512 KB/C L3: 3 MB HT 3.0, GHz K10 Opteron (Sledgehammer) Opteron (Troy) Opteron (Italy) Opteron 2 2xx HE (Santa Rosa) Opteron (Barcelona) Opteron (Shanghai) 4 /03 2/05 5/058/0 6 9/07-4/0811/08 SCST DCST QCST Table: Overview of AMD ’ s Opteron DP processors 4. AMD’s servers (1)
nm/193 mm mtrs/82-89 W L2: 1 MB 3*HT 1.0, 0.8 GHz K8 90 nm/115 mm mtrs/85-93 W L2: 1 MB 3*HT 1.0, 1.0 GHz K8 90 nm/199 mm mtrs/95 W L2: 2 MB 3*HT 1.0, GHz K8 90 nm/220 mm mtrs/95/119 W L2: 2 MB/C 3*HT 1.0, GHz K8 65 nm/285 mm mtrs/84/95 W L2: 512 KB/C L3: 2 MB 4*HT 3.0, 1 GHz K10 45 nm/243 mm mtrs/75 W L2: 512 KB/C L3: 3 MB 4*HT 3.0, GHz K10 Opteron (Sledgehammer) Opteron (Athens) Opteron (Egypt) Opteron 82xx (Santa Rosa) Opteron (Barcelona) Opteron (Shanghai) 6/03-11/03 12/04-8/05 4/058/06-8/07 9/0711/08 Table: Overview of AMD ’ s Opteron MP processors 4. AMD’s servers (2) SCST DCST QCST
4. AMD’s servers (3) AMD Direct Connect Architecture Integrated Memory Controller Serial HyperTransport links Figure: AMD ’ s Direct Connect Architecture [41] Remark 3 HT 1.0 links at introduction, 4 HT 3.0 links with K10 (Barcelona) Introduced in 2003 along with the x86 ISA extension (Intel: 2008 with Nehalem)
4. AMD’s servers (4) Use of available HyperTransport links [44] UPs Each link supports connections to I/O devices DPs Two links support connections to I/O devices, any one of the three links may connect to another DP or MP processor MPs Each link supports connections to I/O devices or other DP or MP processors
AMD Opteron PCI- X PCI Express AMD Opteron PCI AMD Opteron PCI-X I/O RDD2 HT Figure: 2P and 4P server architectures based on AMD ’ s Direct Connect Architecture [42], [43] 4. AMD’s servers (5)
Figure: Advantages of AMD’s Direct Connect server architecture [2] 4. AMD’s servers (6)
Figure: Block diagram of a DP QC motherboard (Asus KFSN4-DRE/SAS) for the AMD Opteron 2300 QC family [10] 4. AMD’s servers (7)
Figure: DP motherboard for the AMD Opteron 2300 QC family (Asus KFSN4-DRE/SAS) [10] DDR2 64 GB 2300 Opteron QC DP nForce 2200 chipset 4. AMD’s servers (8) (Barcelona)
Figure: Block diagram of a QP QC motherboard for AMD’s Opteron 8000 DC/QC familes (ASUS KFN5-Q/SAS) [10] 4. AMD’s servers (9)
Figure: 4-socket motherboard for the AMD Opteron 8000 DC/QC familes (ASUS KFN5-Q/SAS) [10] 8300 Opteron QC MP nForce 3600 chipset DDR2 64 GB 4. AMD’s servers (10) Barcelona)
UP: Opteron 100/1000, DP: Opteron 200/2000 MP: Opteron 800/8000 Figure: Basic structure of the DC Opteron families [8] 4. AMD’s servers (11)
Figure: Block diagram of Barcelona (K10) vs K8 [46] (K10) 4. AMD’s servers (12)
4 HT 3.0 links Allow to build fully connected 4P systems with each processor using a separate I/O hub. 4. AMD’s servers (13)
Figure: Possible use of Barcelona ’ s four HT 3.0 links [47] 4. AMD’s servers (14)
4 HT 3.0 links [46] 4 links allow to build fully connected 4P systems with each processor using a separate I/O hub. HT 3.0 protocol allows to split each 16-bit link to two 8-bit wide links. This features can be utilized to build fully connected 8P systems with 8-bit wide links. 4. AMD’s servers (15)
Figure: Possible use of Barcelona ’ s four HT 3.0 links [47] 4. AMD’s servers (16)
Novel features of HT 3.0 links, such as Current platforms (Socket F with available chipsets) only supports 3 HT1.1 links with 2 GT/s speed [46]. higher speed or splitting a 16-bit HT link to two 8-bit links can be utilized only with a new platform. 4. AMD’s servers (17)
4. AMD’s servers (18) Figure: Cache architecture of the QC Barcelona [25]
4. AMD’s servers (19) Figure: Die shot and floor plan of Barcelona [27]
AMD reworked both chips and provided a new stepping. The Barcelona (and also the Phenom) processors had a bug in their TLB (Translation Lookaside Buffer) design [40]. 4. AMD’s servers (20)
4. AMD’s servers (21) Figure: Cache architectures of AMD ’ s QC Barcelona and Shanghai processors [25], [26] Barcelona (65 nm) Shanghai (45 nm)
Figure: Shanghai ’ s new features vs Barcelona [37] 4. AMD’s servers (22)
Figure_ Die shot and floor plan of Shanghai [37] 4. AMD’s servers (23)
4. AMD’s servers (24) Figure: Die shot of Shanghai [29] Pin to pin compatible with Barcelona 6 MB shared L3
AMD Shanghai Overview ModelCPU ClockMC ClockPart NumberPrice Opteron GHz2.2GHzOS2384WAL4DGI$989 Opteron GHz2.2GHzOS2382WAL4DGI$873 Opteron GHz2.0GHzOS2380WAL4DGI$698 Opteron GHz2.0GHzOS2378WAL4DGI$523 Opteron GHz2.0GHzOS2376WAL4DGI$377 Opteron GHz2.2GHzOS8384WAL4DGI$2149 Opteron GHz2.2GHzOS8382WAL4DGI$1865 Opteron GHz2.0GHzOS8380WAL4DGI$1514 Opteron GHz2.0GHzOS8378WAL4DGI$1165 Table: First introduced Shanghai based Opteron DP and MP models [38] 4. AMD’s servers (25)
Figure: AMD ’ s roadmap for server processors and platforms [37] 4. AMD’s servers (26)
[1]: Radhakrisnan S., Sundaram C. and Cheng K., „The Blackford Northbridge Chipset for the Intel 5000,” IEEE Micro, March/April 2007, pp [2]: Next-Generation AMD Opteron Processor with Direct Connect Architecture – 4P Server Comparison _PID_41461.pdf [3]: Intel® 5000P/5000V/5000Z Chipset Memory Controller Hub (MCH) – Datasheet, Sept [4]: Intel® E8501 Chipset North Bridge (NB) Datasheet, Mai 2006, [5]: Conway P & Hughes B., „The AMD Opteron Northbridge Architecture”, IEEE MICRO, March/April 2007, pp [6]: Intel® 7300 Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2007, [7]: Supermicro Motherboards, [8] Sander B., „AMD Microprocessor Technologies,” 2006, [9]: AMD Quad FX Platform with Dual Socket Direct Connect (DSDC) Architecture, [10]: Asustek motherboards References (1)
[11] Kanter, D. „A Preview of Intel's Bensley Platform (Part I),” Real Word Technologies, Aug. 2005, [12] Kanter, D. „A Preview of Intel's Bensley Platform (Part II),” Real Word Technologies, Nov. 2005, [13] Quad-Core Intel® Xeon® Processor 7300 Series Product Brief, Intel, Nov [14] „AMD Shows Off More Quad-Core Server Processors Benchmark” X-bit labs, Nov [15] AMD, Nov [16]: Rusu S., “ A Dual-Core Multi-Threaded Xeon Processor with 16 MB L3 Cache, ” Intel, 2006, [17]: Goto H., Intel Processors, PCWatch, March , [18]: Gilbert J. D., Hunt S., Gunadi D., Srinivas G., “ The Tulsa Processor, ” Hot Chips 18, 2006, [19]:Goto H., IDF 2007 Spring, PC Watch, April , References (2)
[20]: Hruska J., “Details slip on upcoming Intel Dunnington six-core processor,” Ars technica,Details slip on upcoming Intel Dunnington six-core processor February 26, 2008, upcoming-intel-dunnington-six-core-processor.html [21]: Goto H,, 32 nm Westmere arrives in , PC Watch, March , [22]: Singhal R., “ Next Generation Intel Microarchitecture (Nehalem) Family: Architecture Insight and Power Management, IDF Taipeh, Oct. 2008, -Taipei_TPTS001_100.pdf [23]: Smith S. L., “ 45 nm Product Press Briefing, ”, IDF Fall 2007, ftp://download.intel.com/pressroom/kits/events/idffall_2007/BriefingSmith45nm.pdf [24]: Bryant D., “ Intel Hitting on All Cylinders, ” UBS Conf., Nov. 2007, aa5a-0c46e8a1a76d/UBSConfNov2007Bryant.pdf [25]: Barcelona's Innovative Architecture Is Driven by a New Shared Cache, [26]: Larger L3 cache in Shanghai, Nov , AMD, [27]: Shimpi A. L., “ Barcelona Architecture: AMD on the Counterattack, ” March , Anandtech, References (3)
[28]: Rivas M., “ Roadmap update, ”, 2007 Financial Analyst Day, Dec. 2007, AMD, [29]: Scansen D., “ Under the Hood: AMD ’ s Shanghai marks move to 45 nm node, ” EE Times, Nov , [30]: 2-way Intel Dempsey/Woodcrest CPU Bensley Server Platform, Tyan, [31]: Gelsinger P. P., “ Intel Architecture Press Briefing, ”, 17. March 2008, [32]: Mueller S., Soper M. E., Sosinsky B., Server Chipsets, Jun 12, 2006,MuellerSoperSosinsky [33]: Goto H., IDF, Aug , [34]: TechChannel, fk=432919&id=il [35]: Intel quadcore Xeon 5300 review, Nov , Hardware.Info, _review References (4)
[36]: Wasson S., Intel's Woodcrest processor previewed, The Bensley server platform debuts, Mai 23, 2006, The Tech Report, [37]: Enderle R., AMD Shanghai “ We are back! TGDaily, November 13, 2008, Launch - Database Testing Date: November 13th, 2008 [38]: Clark J. & Whitehead R., “ AMD Shanghai Launch, Anandtech, Nov , AMD Shanghai Launch - Database Testing [39]: Chiappetta M., AMD Barcelona Architecture Launch: Native Quad-Core, Hothardware, Sept. 10, 2007, QuadCore/ [40]: Hachman M., “AMD Phenom, Barcelona Chips Hit By Lock-up Bug,”, ExtremeTech, Dec , [41]: AMD Opteron™ Processor for Servers and Workstations, html [42]: AMD Opteron Processor with Direct Connect Architecture, 2P Server Power Savings Comparison, AMD, [43]: AMD Opteron Processor with Direct Connect Architecture, 4P Server Power Savings Comparison, AMD, References (5)
[45]: Images, Xtreview, [46]: Kanter D., “Inside Barcelona: AMD's Next Generation, Real World Tech., Mai , [47]: Kanter D,, “AMD's K8L and 4x4 Preview, Real World Tech. June , [44]: AMD Opteron Product Data Sheet, AMD, References (6)