Download presentation
Presentation is loading. Please wait.
Published bySherman Dwight Thomas Modified over 9 years ago
1
Dezső Sima 2009. november (Ver. 1.0) Sima Dezső, 2008 DP/MP System Architectures
2
Contents 2. Intel’s DP servers 3. Intel’s MP servers 1. The evolution of Intel’s basic microarchitectures 4. AMD’s servers
3
1. The evolution of Intel’s basic microarchitectures
4
1. The evolution of Intel’s basic microarchitectures (1) Figure: Intel ’ s Tick-Tock development model [22]
5
1. The evolution of Intel’s basic microarchitectures (2) Figure: The speed of changes in Intel ’ s Tick-Tock development model [24]
6
1. The evolution of Intel’s basic microarchitectures (3) Figure: Key enhancements introduced into the Core2 microarchitecture (vs the Pentium4) [22] Wide dynamic execution - 4-wide decode/rename/retire Advanced digital media processing - 128-bit wide SSE execution unit Improved graphics/MM - New SSE 4.1 instructions Smart memory access - Memory disambiguation (spec. loads) - Hardware prefetching Advanced smart cache - Low latency, high BW shared L2 cache
7
1. The evolution of Intel’s basic microarchitectures (4) Figure: Key enhancements introduced into the Penryn microarchitecture (vs the Core) [23]
8
1. The evolution of Intel’s basic microarchitectures (5) Figure: Improvements introduced into the Nehalem microarchitecture (vs Penryn) [22]
9
1. The evolution of Intel’s basic microarchitectures (6) Figure: Hyperthreading in the Nehalem microarchitecture [22]
10
1. The evolution of Intel’s basic microarchitectures (7) 2-level cache hierarchy 3-level cache hierarchy Figure: 3-level cache hierarchy of Nehalem [22]
11
1. The evolution of Intel’s basic microarchitectures (8) Figure: Nehalem ’ s innovations in the system architecture [22]
12
1. The evolution of Intel’s basic microarchitectures (9) Figure: Nehalem ’ s innovations in the system architecture [22]
13
QickPath Interconnect 3.2 GHz DDR 20-bit (16-bit data 4-bit CRC) on each lane 12.8 GT/s on each direction Fastest FSB Formerly: Common System interconnect (CSI) 400 MHz QDR 8 Byte 12.8 GT/s bidirectional HyperTransport Bus HT 1.0: 0.8 GHz DDR 2-Byte 3.2 GT/s on each direction HT 2.0: 1.0 GHz DDR 2-Byte 4.0 GT/s on each direction HT 3.0: 2.6 GHz DDR 2-Byte 10.4 GT/s on each direction Typical speed and width figures in AMD ’ s systems 1. The evolution of Intel’s basic microarchitectures (10)
14
Figure: Die shot of Nehalem [45] 1. The evolution of Intel’s basic microarchitectures (11)
15
2. Intel’s DP Servers
16
Figure: Typical configuration of an early DP-server motherboard based on Intel’s E7500/E7501 (Plunas) chipset P4 ICH3-S FWH E7500/E7501 SDRAM interface SDRAM interface DDR 200/266 registered, ECC opt. Ultra ATA/100 PCI v.2.2 USB v. 1.1 GPIO FSB LPC HI 1.5 P4 (with RASUM) HI 2.0 PCI-X v.2.2 Prestonia MCH 400/533 MHz 8/12/16 GB HI 2.0 PCI-X bridge SATA c. GbE c. PCI-X v.2.2 SATA GbE Video c. MbE c. PCI v.2.2 LAN (5 ports) SVGA MbE SIO FDKBMSSPPP SCSI c. SCSI (1-2 slots) (3 slots) 3200-4264 1600- 2128 1600- 2128 266 133 1.5 2*100 ~5 1066 (2 ports) 2. Intel’s DP servers (1)
17
Figure: Typical configuration of an advanced early DP-server motherboard based on Intel’s E7520 (Lindenhurst) chipset ICH5R FWH E7520 SDRAM interface SDRAM interface DDR 266/333, DDR2 400 registered, ECC opt. Ultra ATA/100 PCI v.2.3 USB v. 2.0 SATA AC' 97 v.2.3 GPIO FSB LPC HI 1.5 (with RASUM) PCI E. x8 PCI-X v.1.0b Nocona Paxville DP Nocona Paxville DP MCH 800 MHz 16/24/32 GB PCI E. x8 PCI-X bridge SCSI c. GbE c. PCI-X v.1.0b PCI E. x8 (or 2x x4) SCSI GbE Video c. MbE c. PCI v.2.3 LAN (4 ports) SVGA MbE SIO FDKBMSSPPP 60 3200 2128- 3200 2128- 3200 266133 ~1.4 2*100 2*150 ~5 4000 (2 ports) 2. Intel’s DP servers (2) P4
18
2. Intel’s DP servers (3) Paxville DP 2.8 2xIrwindale cores/90 nm Figure: Intel ’ s Pentium 4 based DC DP server processors [33], [34]
19
http://www.xbitlabs.com/articles/cpu/display/opteron-xeon-workstation_5.html Nocona Paxville IrwindaleNocona (L2 enlarged to 2MB) (2 x Irwindale cores) 6/2004 90 nm 112 mm 2 125 mtrs mPGA 604 2/2005 90 nm 135 mm 2 169 mtrs mPGA 604 10/2005 90 nm 2 x 135 mm 2 2 x 169 mtrs Xeon DP 2.8 Xeon MP 7020-7041 mPGA 604 Figure: Genealogy of the Xeon Paxville core (DP enhanced Prescott)(DP enhanced Prescott 2M) http://www.theinquirer.net/default.aspx?article=16879 http://www.gamepc.com/labs/view_content.asp?id=x36o252&page=2 Sources: Intel’s first 64-bit Xeon In contrast: corresponding desktop processors have the LGA 775 socket. 2. Intel’s DP servers (4)
20
2. Intel’s DP servers (5) Xeon 5000 (Dempsey) Paxville DP 2.8 2xIrwindale cores/90 nm 2xCedar Mill/65 nm (65 nm shrink of the Irwindale) Figure: Intel ’ s Pentium 4 based DC DP server processors [33], [34]
21
2. Intel’s DP servers (6) Xeon 5100 (Woodcrest) Core2-based/65 nm Xeon 5300 (Clowertown) Core2-based/65 nm 2xXeon 5100 Figure: Intel ’ s Core2 based DC/QC DP server processors [33], [35], [36]
22
2. Intel’s DP servers (7) Figure: Intel ’ s Penryn based QC DP server processor/45 nm (Source: Intel) Xeon 5400 (Harpertown)
23
2. Intel’s DP servers (8) Figure: Contrasting the die shots of the Xeon 5400 and 5300 processors [24]
24
2. Intel’s DP servers (9) Series --- (Paxville DP) 5000 (Dempsey) 5100 (Woodcrest) 5200 (Wolfdale) 5300 (Clovertown) 5400 (Harpertown) Dual/Quad-CoreDC QC ModelsXeon DP 2.85030-50805110-5160 E5205/E5260/ X5275 E5310- 5345/X5355 E5405-E5472, X5450-X5482 MicroarchitecturePentium 4 Core2PenrynCore2Penryn Core2*Irwindale dies2*Cedar diesSingle die2*Woodcrest dies2*Penryn Intro.10/20055/20066/200611/200711/200611/2007 Techology90 nm65 nm 45 nm65 nm45 nm Die size2*135 mm 2 2*81 mm 2 143 mm 2 2*143 mm 2 2*107 mm 2 Nr. of transistors2*169 mtrs2*188 mtrs291 mtrs2*291 mtrs2*410 mtrs Fc [GHz]2.82.6-3.731.6-3.01.86-3.401.6-2.662.00-3.20 L22*2 MB 4 MB6 MB2*4 MB2*6 MB FSB [MT/s]800667/10661066/13331333/16001066/13331333/1600 TDP [W]13595/13065/80 80/12080/120/150 SocketPGA 604LGA 771 EM64T HT --- ED VT EIST (5140 or above) La Grande--- AMT2--- Flex Migration--- Table: Intel ’ s DC, QC DP servers
25
2. Intel’s DP servers (10) Gainstown (Q1/2009) (Q1/2010?) Nehalem-based/45 nm Westmere_based/32 nm (Socket 1366) ??? Figure: Intel ’ s future DP server processors [21] (Both 2-way multithreaded)
26
Figure: Overview of the implementation of Intel ’ s Tick-Tock model for DP servers [24] 2x1 C, 2 MB L2/C 5000 (Dempsy) 1x2 C, 4 MB L2/C 5100 (Woodchrest) 2x2 C, 4 MB L2/C 5300 (Clowertown) 2x2 C, 6 MB L2/2C 5400 (Harpertown) 1x4 C, ¼ MB L2/C 8 MB L3, 5xxx (Gainstown) 1x6 C, ¼ MB L2/C 12 MB L3, 5xxx (???) 2. Intel’s DP servers (11)
27
Figure: Evolution of Intel’s DP servers 800MT/s 7520 (Lindenhurst) Nocona Paxville SC/DC Nocona Paxville SC/DC 24 Lanes PCIe 7.5GB/s Dual DDR2 400 MT/s 6.4 GB/s 1066MT/s 17.1 GB/s Dempsey Woodcrest Clowertown DC 5000 (Blackford) 24 Lanes PCIe 7.5GB/s Dempsey Woodcrest Clowertown DC Quad FB-DIMM 533 MT/s 17.1 GB/s 2. Intel’s DP servers (16) 6.4 GB/s
28
Figure : Intel’s late Pentium4 based and subsequent DP server platforms DP Platforms Xeon DP 2.8 DC 10/2005 DP Cores DP Chipsets 2. Intel’s DP servers (12) 90 nm/2*169 mtrs 2*2 MB L2 800 MT/s PGA604 7520 6/2004 (Lindenhurst) 800 MT/s 2 x DDR/DDR2 16 GB Pentium4-based (90/65 nm) /Paxville DP) DC
29
Figure: Evolution of Intel’s DP servers 800MT/s 7520 (Lindenhurst) Nocona Paxville DC SC/DC Nocona Paxville SC/DC 24 Lanes PCIe 7.5GB/s Dual DDR2 400 MT/s 6.4 GB/s 2. Intel’s DP servers (13) 6.4 GB/s
30
Figure: Typical configuration of an advanced early DP-server motherboard based on Intel’s E7520 (Lindenhurst) chipset ICH5R FWH E7520 SDRAM interface SDRAM interface DDR 266/333, DDR2 400 registered, ECC opt. Ultra ATA/100 PCI v.2.3 USB v. 2.0 SATA AC' 97 v.2.3 GPIO FSB LPC HI 1.5 (with RASUM) PCI E. x8 PCI-X v.1.0b Nocona Paxville DP Nocona Paxville DP MCH 800 MHz 16/24/32 GB PCI E. x8 PCI-X bridge SCSI c. GbE c. PCI-X v.1.0b PCI E. x8 (or 2x x4) SCSI GbE Video c. MbE c. PCI v.2.3 LAN (4 ports) SVGA MbE SIO FDKBMSSPPP 60 3200 2128- 3200 2128- 3200 266133 ~1.4 2*100 2*150 ~5 4000 (2 ports) 2. Intel’s DP servers (14) P4
31
Figure : Intel’s late Pentium4 based and subsequent DP server platforms DP Platforms Xeon DP 2.8 DC 10/2005 DP Cores Xeon 5100Xeon 5300Xeon 5000 11/2006 6/2006 5/2006 DP Chipsets (Dempsey) DC(Woodcrest) DC(Clowertown) QC 5000 06/2006 5000P 5000V/Z 6/2006 (Blackford) (Blackford V/Z) 2xFSB 1066MT/s 4 x FBDIMM (DDR2) 64GB 2 x FBDIMM (DDR2) 16GB 2. Intel’s DP servers (15) (Bensley) 65 nm/291 mtrs 4 MB L2 667/1066 MT/s LGA771 Pentium4/Core2-based (65 nm) 65 nm/2*188 mtrs 2*2 MB L2 667/1066 MT/s LGA771 65 nm/2*291 mtrs 2*4 MB L2 667/1066 MT/s LGA771 90 nm/2*169 mtrs 2*2 MB L2 800 MT/s PGA604 7520 6/2004 (Lindenhurst) 800 MT/s 2 x DDR/DDR2 16 GB Pentium4-based (90/65 nm) /Paxville DP) DC
32
2. Intel’s DP servers (17) http://www.tyan.com/tempest/training/s5370.pdf Intel ’ s Bensley platform [30] (Actually the block diagram of Tyan ’ s S5370 DP server)
33
FB-DIMM DDR2 64 GB 5000P SBE2 Xeon DC/QC 5000 DC 5100 DC 5300 QC Figure: Bensley DP motherboard, with the 5000 (Blackford) chipset (Supermicro X7DB8+) for the Xeon 5000 DC/QC DP processor families [7] 2. Intel’s DP servers (18)
34
Table: Latency and bandwidth scaling of the Intel 5000 platform (2006) vs the earlier generation (2004) [1] 2. Intel’s DP servers (19)
35
Figure : Intel’s late Pentium4 based and subsequent DP server platforms DP Platforms Xeon DP 2.8 DC 10/2005 DP Cores Xeon 5100Xeon 5300Xeon 5400Xeon 5000 11/2007 11/2006 6/2006 5/2006 DP Chipsets (Dempsey) DC(Woodcrest) DC(Clowertown) QC(Harpertown) QC 5000 06/2006 5000P 5000V/Z 5100 6/2006 (Blackford) (Blackford V/Z) 10/2007 2xFSB 1066MT/s 4 x FBDIMM (DDR2) 64GB 2 x FBDIMM (DDR2) 16GB 5100 10/2007 (San Clemente) 2xFSB 1333/1066 MT/s 2 x DDR2 32/48 GB 2. Intel’s DP servers (20) (Bensley) (Cranberry Lake) 65 nm/291 mtrs 4 MB L2 667/1066 MT/s LGA771 Pentium4/Core2-based (65 nm) Penryn-based (45 nm) 65 nm/2*188 mtrs 2*2 MB L2 667/1066 MT/s LGA771 65 nm/2*291 mtrs 2*4 MB L2 667/1066 MT/s LGA771 45 nm/850 mtrs 2*6 MB L2 1066/1333 MT/s LGA771 90 nm/2*169 mtrs 2*2 MB L2 800 MT/s PGA604 Xeon 5200 (Harpertown) DC 45 nm/850 mtrs 2*6 MB L2 1066/1333 MT/s LGA771 7520 6/2004 (Lindenhurst) 800 MT/s 2 x DDR/DDR2 16 GB Pentium4-based (90/65 nm) /Paxville DP) DC
36
2. Intel’s DP servers (21) Figure: The Cranberry Lake platform [19] Xeon 5400 (QC) Xeon 5200 (DC) 5100 chipset
37
1066MT/s 17.1 GB/s Tylersburg Nehalem QC Nehalem QC DMI PCI Express Gen 2 2. Intel’s DP servers (22)
38
2. Intel’s DP servers (23) Figure: Intel ’ s forthcoming Nehalem-based DP server system architecture [31] QuickPath Interconnect Integrated memory controller
39
3. Intel’s MP servers
40
3. Intel’s MP servers (1) Figure: Intel ’ s Pentium4 based Xeon MP processors [17], [18] Tulsa (7100) 90 nm 65 nm CDM: Cedar Mill core (65 nm shrink of the Irwindale core) Potomac Paxville MP (7000)
41
3. Intel’s MP servers (2) Figure: Intel ’ s Core2 /Penryn based Xeon MP processors [19], [20] 65 nm 45 nm Core2 based Penryn based Dunnington (7400) Tigerton QC (7300) Tigerton DC (7300) Core2 based 65 nm
42
Table: Dual- and Quad-Core Xeon MP-lines 1 Concerning the L2 cache size, there is a contradiction in Intel’s dokumentation; according to the data sheets, models of the 7000 series include 1 or 2 MB L2 caches, in contrast the comparison charts for all models shows 1 MB large L2 caches. 3. Intel’s MP servers (3) Series 7000 (Paxville MP) 7100 (Tulsa) 7200 (Tigerton DC) 7300 (Tigerton QC) 7400 (Dunnington QC) 7400 (Dunnington 6C) Dual/Quad-CoreDC 2xSC2xDCQC 6C Models7020-7041 7110M-7140M / 7110N-7150N E7210/E7220 E7310/E7320/E7330/E73 40/X7350 E7420-E7440E7450/X7460 MicroarchitectureNetburst Core 2Penryn Core 2xIrwindale dies Cedar Mill-based single die 2xSC Woodcrest dies 2xWoodcrest dies Intro.11/20058/20069/20079/2008 Techology90 nm65 nm 45 nm Die size2*135 mm 2 435 mm 2 2*143 mm 2 503 mm 2 Nr. of transistors2*169 mtrs1328 mtrs2*291 mtrs1900 mtrs Fc [GHz]2.66-3.02.5-3.52.4/2.931.6/2.13/2.4/2.4/2.932.13-2.40 2.40/2.66 L22*1/2 MB 1 2*1 MB2*4 MB2*2/2*2/2*3/2*4/2*4 MB3*2 MB 3*3 MB L3---4/8/16 MB--- 8/12/16 MB 12/16 MB FSB [MT/s]667/800 1066 TDP [W]95/150 8080/80/80/80/13090 90/130 SocketmPGA604 EM64T HT --- ED VT EIST La Grande--- n.a. AMT2--- (Except E7310)n.a.
43
3. Intel’s MP servers (4) Figure: Intel ’ s Nehalem based MP server processor [21]
44
Figure: Overview of the implementation of Intel ’ s Tick-Tock model for MP servers [24] 2x1 C, 1 MB L2/C 16 MB L3, 7100 (Tulsa) 1x2 C, 4 MB L2/C 7200 (Tigerton DC) 2x2 C, 4 MB L2/C 7300 (Tigerton QC) 1x6 C, 3 MB L2/2C 16 MB L3 7400 (Dunnington) 1x8 C, ¼ MB L2/C 24 MB L3, 7xxx (Beckton) 2. Intel’s MP servers (5) TICK Pentium 4 /Prescott) 90nm 1x1 C, 8 MB/C (Potomac) TOCK Pentium 4 /Irwindale) 90 nm 2x1 C, ½ MB/C 7000 (Paxville MP) 1x1 C, 1 MB/C (Cransfield)
45
Table: Overview of Intel ’ s DP and MP server processors 2. Intel’s MP servers (6) Core/technologyDP server processorsMP server processors Pentium465 nm 2x1 C, 2 MB L2/C5000 (Dempsy)2x1 C, 1 MB L2/C 16 MB L3,7100 (Tulsa) Core265 nm 1x2 C, 4 MB L2/C5100 (Woodchrest) 2x2 C, 4 MB L2/C5300 (Clowertown) 1x2 C, 4 MB L2/C7300 (Tigerton DC) 2x2 c, 4 MB L2/C7300 (Tigerton QC) Penryn45 nm 2x2 C, 6 MB L2/2C5400 (Harpertown)1x6 C, 3 MB L2/2C 16 MB L37400 (Dunnington) Nehalem45 nm 1x4 C, ¼ MB L2/C 8 MB L3,5xxx (Gainstown)1x8 C, ¼ MB L2/C 24 MB L3,7xxx (Beckton) Westmere32 nm 1x6 C, ¼ MB L2/C 12 MB L3,5xxx (???)
46
Figure: Evolution of Intel’s Xeon MP-based system architecture (until the appearance of Nehalem) Preceding NBs Xeon MP 1 3. Intel’s MP servers (7) SC 1 Xeon MP before Potomac Typically HI 1.5 (266 MB/s)
47
Figure: Overview of the implementation of Intel ’ s Tick-Tock model for DP servers [24] 2x1 C, 1 MB L2/C 16 MB L3, 7100 (Tulsa) 1x2 C, 4 MB L2/C 7300 (Tigerton DC) 2x2 C, 4 MB L2/C 7300 (Tigerton QC) 1x6 C, 3 MB L2/2C 16 MB L3 7400 (Dunnington) 1x8 C, ¼ MB L2/C 24 MB L3, 7xxx (Beckton) 2. Intel’s MP servers (5) TICK Pentium 4 /Prescott) 90nm 1x1 C, 8 MB/C (Potomac) TOCK Pentium 4 /Irwindale) 90 nm 2x1 C, ½ MB/C 7000 (Paxville MP) 1x1 C, 1 MB/C (Cransfield)
48
3. Intel’s MP servers (8) Figure: Former Pentium II/III MP systemarchitecture [32]
49
MP Platforms Xeon 7000 11/2005 MP Cores Xeon 7100 8/2006 MP Chipsets 3/2005 4/2006 8500 8501 (Paxville MP DC)(Tulsa DC) (Twin Castle) (?) Figure : Intel’s Xeon-based MP server platforms 2xFSB 667 MT/s 4 x XMB (2 x DDR2) 32GB 2xFSB 800 MT/s 4 x XMB (2 x DDR2) 32GB Truland 65 nm/1328 mtrs 2x1 MB L2 16/8/4 MB L3 800/667 MT/s mPGA 604 P4-based/65 nm 3/2005 Xeon MP 3/2005 (Potomac SC) 90 nm/2x169 mtrs 2x1 (2) MB L2 - 800/667 MT/s mPGA 604 90 nm/675 mtrs 1 MB L2 8/4 MB L3 667 MT/s mPGA 604 P4-based/90 nm Truland 3. Intel’s MP servers (9)
50
Figure: Evolution of Intel’s Xeon MP-based system architecture (until the appearance of Nehalem) Preceding NBs Xeon MP 1 3. Intel’s MP servers (10) SC 1 Xeon MP before Potomac Typically HI 1.5 (266 MB/s) (Twin Castle) XMB 8500/8501 28 PCIe lanes + HI 1.5 Truland Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC (266 MT/s) (7 GT/s) DC Cransfield SC) Tulsa (DC) 3 The 8500 supports also 2 First x86-64 MP processor
51
eXxternal Memory Bridge Independent Memory Interface 5.33 GB inbound BW 2.67 GB outbound BW simultaneously Figure: Intel’s 8501 chipset for MP servers (4/ 2006) [4] Xeon DC MP 7000 (4/2005) or later DC/QC MP 7000 processors Intelligent MC Dual mem. channels DDR 266/333/400 4 DIMM/channel (North Bridge) 3. Intel’s MP servers (11) Serial link
52
7000/7100 FB-DIMM DDR2 64 GB Figure: Quad socket Intel E8501 chipset based motherboard (Supermicro X6QT8) for the Xeon 7000/7100 DC MP processor families [7] Xeon DC E8501 NB ICH5R SB 3. Intel’s MP servers (12)
53
Figure Bandwith bottlenecks in Intel’s 8501 MP server platform [2] 3. Intel’s MP servers (13)
54
MP Platforms Xeon 7000 11/2005 MP Cores Xeon 7200Xeon 7300 Xeon 7100 9/2007 8/2006 MP Chipsets 3/2005 4/2006 9/2007 8500 8501 7300 (Paxville MP DC)(Tulsa DC) (Tigerton DC) (Tigerton) QC Caneland 9/2007 (Clarksboro) (Twin Castle) (?) Figure : Intel’s Xeon-based MP server platforms 2xFSB 667 MT/s 4 x XMB (2 x DDR2) 32GB 2xFSB 800 MT/s 4 x XMB (2 x DDR2) 32GB 4xFSB 1066 MT/s 4 x FBDIMM (DDR2) 512GB Truland Xeon 7400 9/2008 (Dunnington 6C) 65 nm/1328 mtrs 2x1 MB L2 16/8/4 MB L3 800/667 MT/s mPGA 604 65 nm/2x291 mtrs 2x4 MB L2 - 1066 MT/s mPGA 604 65 nm/2x291 mtrs 2x(4/3/2) MB L2 - 1066 MT/s mPGA 604 45 nm/1900 mtrs 9/6 MB L2 16/12/8 MB L3 1066 MT/s mPGA 604 P4-based/65 nmCore2-based/65 nmCore2-based/45 nm 3/2005 Xeon MP 3/2005 (Potomac SC) 90 nm/2x169 mtrs 2x1 (2) MB L2 - 800/667 MT/s mPGA 604 90 nm/675 mtrs 1 MB L2 8/4 MB L3 667 MT/s mPGA 604 P4-based/90 nm TrulandCaneland 7300 3. Intel’s MP servers (14)
55
Figure: Evolution of Intel’s Xeon MP-based system architecture (until the appearance of Nehalem) Preceding NBs Xeon MP 1 (Clarksboro) Tigerton XMB 3. Intel’s MP servers (15) 6C/QC/DC SC FB-DIMM (DDR2) 28 PCIe lanes + HI 1.5 Dunnington 8 PCI-E lanes + ESI Truland Caneland 7300 1 Xeon MP before Potomac Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC Cransfield SC) Tulsa (DC) 3 The 6500 supports also 2 First x86-64 MP processor (266 MT/s) Typically HI 1.5 (266 MB/s) (7 GT/s) (2 GT/s)(1 GT/s) QC/DC (Twin Castle) 8500/8501 DC
56
Figure: Intel’s four socket 7300 (Caneland) platform, based on the 7300 (Clarksboro) chipset for the Xeon 7200/7300 DC/QC MP families (9/2007) [6] FB-DIMM up to 512 GB 7200 (Tigerton DC, Core2), DC Xeon 7300 (Tigerton QC, Core2), QC 3. Intel’s MP servers (16)
57
FB-DIMM DDR2 192 GB ATI ES1000 Graphics with 32MB video memory 7200 DC 7300 QC (Tigerton) Xeon Figure: Caneland MP motherboard, with the 7300 (Clarksboro) chipset (Supermicro X7QC3) for the Xeon 7200/7300 DC/QC MP processor families [7] SBE2 SB 7300 NB 3. Intel’s MP servers (17)
58
Figure: Performance comparison of the Caneland platform with a quad core Xeon (7300 family) vs the Bensley platform with a dual core Xeon 7140M [13] 3. Intel’s MP servers (18)
59
Beckton 8C 3. Intel’s MP servers (19) QPI QPI: QuickPath Interconnect QPI Figure: Intel ’ s Nehalem based MP server 4xFB-DIMM
60
3. Intel’s MP servers (20) FB-DIMM (DDR2) QPI Figure: Intel ’ s Nehalem based MP server system architecture [22]
61
4. AMD’s servers
62
200420052006200720082003 130 nm/193 mm 2 106 mtrs/82-89 W L2: 1 MB HT: 1.0, 0.8 GHz K8 90 nm/114 mm 2 114 mtrs/95 W L2: 1 MB HT: 1.0, 1.0 GHz K8 90 nm/199 mm 2 233 mtrs/95 W L2: 2*1 MB HT 1.0, GHz K8 90 nm/230 mm 2 227 mtrs/95/110 W L2: 2*1 MB HT 1.0, GHz K8 65 nm/285 mm 2 463 mtrs/95 W L2: 512 KB/C L3: 2 MB HT 3.0, 1.0 GHz K10 45 nm/243 mm 2 705 mtrs/75W L2: 512 KB/C L3: 3 MB HT 3.0, GHz K10 Opteron 2 40- 2 50 (Sledgehammer) Opteron 2 42- 2 56 (Troy) Opteron 2 65- 2 90 (Italy) Opteron 2 2xx HE (Santa Rosa) Opteron 2 347- 2 360 (Barcelona) Opteron 2 378-2384 (Shanghai) 4 /03 2/05 5/058/0 6 9/07-4/0811/08 SCST DCST QCST Table: Overview of AMD ’ s Opteron DP processors 4. AMD’s servers (1)
63
20042005200620072008 2003 130 nm/193 mm 2 106 mtrs/82-89 W L2: 1 MB 3*HT 1.0, 0.8 GHz K8 90 nm/115 mm 2 114 mtrs/85-93 W L2: 1 MB 3*HT 1.0, 1.0 GHz K8 90 nm/199 mm 2 233 mtrs/95 W L2: 2 MB 3*HT 1.0, GHz K8 90 nm/220 mm 2 243 mtrs/95/119 W L2: 2 MB/C 3*HT 1.0, GHz K8 65 nm/285 mm 2 463 mtrs/84/95 W L2: 512 KB/C L3: 2 MB 4*HT 3.0, 1 GHz K10 45 nm/243 mm 2 705 mtrs/75 W L2: 512 KB/C L3: 3 MB 4*HT 3.0, GHz K10 Opteron 840-850 (Sledgehammer) Opteron 842-856 (Athens) Opteron 865-890 (Egypt) Opteron 82xx (Santa Rosa) Opteron 8347-8356 (Barcelona) Opteron 8378-8384 (Shanghai) 6/03-11/03 12/04-8/05 4/058/06-8/07 9/0711/08 Table: Overview of AMD ’ s Opteron MP processors 4. AMD’s servers (2) SCST DCST QCST
64
4. AMD’s servers (3) AMD Direct Connect Architecture Integrated Memory Controller Serial HyperTransport links Figure: AMD ’ s Direct Connect Architecture [41] Remark 3 HT 1.0 links at introduction, 4 HT 3.0 links with K10 (Barcelona) Introduced in 2003 along with the x86 ISA extension (Intel: 2008 with Nehalem)
65
4. AMD’s servers (4) Use of available HyperTransport links [44] UPs Each link supports connections to I/O devices DPs Two links support connections to I/O devices, any one of the three links may connect to another DP or MP processor MPs Each link supports connections to I/O devices or other DP or MP processors
66
AMD Opteron PCI- X PCI Express AMD Opteron PCI AMD Opteron PCI-X I/O RDD2 HT Figure: 2P and 4P server architectures based on AMD ’ s Direct Connect Architecture [42], [43] 4. AMD’s servers (5)
67
Figure: Advantages of AMD’s Direct Connect server architecture [2] 4. AMD’s servers (6)
68
Figure: Block diagram of a DP QC motherboard (Asus KFSN4-DRE/SAS) for the AMD Opteron 2300 QC family [10] 4. AMD’s servers (7)
69
Figure: DP motherboard for the AMD Opteron 2300 QC family (Asus KFSN4-DRE/SAS) [10] DDR2 64 GB 2300 Opteron QC DP nForce 2200 chipset 4. AMD’s servers (8) (Barcelona)
70
Figure: Block diagram of a QP QC motherboard for AMD’s Opteron 8000 DC/QC familes (ASUS KFN5-Q/SAS) [10] 4. AMD’s servers (9)
71
Figure: 4-socket motherboard for the AMD Opteron 8000 DC/QC familes (ASUS KFN5-Q/SAS) [10] 8300 Opteron QC MP nForce 3600 chipset DDR2 64 GB 4. AMD’s servers (10) Barcelona)
72
UP: Opteron 100/1000, DP: Opteron 200/2000 MP: Opteron 800/8000 Figure: Basic structure of the DC Opteron families [8] 4. AMD’s servers (11)
73
Figure: Block diagram of Barcelona (K10) vs K8 [46] (K10) 4. AMD’s servers (12)
74
4 HT 3.0 links Allow to build fully connected 4P systems with each processor using a separate I/O hub. 4. AMD’s servers (13)
75
Figure: Possible use of Barcelona ’ s four HT 3.0 links [47] 4. AMD’s servers (14)
76
4 HT 3.0 links [46] 4 links allow to build fully connected 4P systems with each processor using a separate I/O hub. HT 3.0 protocol allows to split each 16-bit link to two 8-bit wide links. This features can be utilized to build fully connected 8P systems with 8-bit wide links. 4. AMD’s servers (15)
77
Figure: Possible use of Barcelona ’ s four HT 3.0 links [47] 4. AMD’s servers (16)
78
Novel features of HT 3.0 links, such as Current platforms (Socket F with available chipsets) only supports 3 HT1.1 links with 2 GT/s speed [46]. higher speed or splitting a 16-bit HT link to two 8-bit links can be utilized only with a new platform. 4. AMD’s servers (17)
79
4. AMD’s servers (18) Figure: Cache architecture of the QC Barcelona [25]
80
4. AMD’s servers (19) Figure: Die shot and floor plan of Barcelona [27]
81
AMD reworked both chips and provided a new stepping. The Barcelona (and also the Phenom) processors had a bug in their TLB (Translation Lookaside Buffer) design [40]. 4. AMD’s servers (20)
82
4. AMD’s servers (21) Figure: Cache architectures of AMD ’ s QC Barcelona and Shanghai processors [25], [26] Barcelona (65 nm) Shanghai (45 nm)
83
Figure: Shanghai ’ s new features vs Barcelona [37] 4. AMD’s servers (22)
84
Figure_ Die shot and floor plan of Shanghai [37] 4. AMD’s servers (23)
85
4. AMD’s servers (24) Figure: Die shot of Shanghai [29] Pin to pin compatible with Barcelona 6 MB shared L3
86
AMD Shanghai Overview ModelCPU ClockMC ClockPart NumberPrice Opteron 23842.7GHz2.2GHzOS2384WAL4DGI$989 Opteron 23822.6GHz2.2GHzOS2382WAL4DGI$873 Opteron 23802.5GHz2.0GHzOS2380WAL4DGI$698 Opteron 23782.4GHz2.0GHzOS2378WAL4DGI$523 Opteron 23762.3GHz2.0GHzOS2376WAL4DGI$377 Opteron 83842.7GHz2.2GHzOS8384WAL4DGI$2149 Opteron 83822.6GHz2.2GHzOS8382WAL4DGI$1865 Opteron 83802.5GHz2.0GHzOS8380WAL4DGI$1514 Opteron 83782.4GHz2.0GHzOS8378WAL4DGI$1165 Table: First introduced Shanghai based Opteron DP and MP models [38] 4. AMD’s servers (25)
87
Figure: AMD ’ s roadmap for server processors and platforms [37] 4. AMD’s servers (26)
88
[1]: Radhakrisnan S., Sundaram C. and Cheng K., „The Blackford Northbridge Chipset for the Intel 5000,” IEEE Micro, March/April 2007, pp. 22-33 [2]: Next-Generation AMD Opteron Processor with Direct Connect Architecture – 4P Server Comparison http://www.amd.com/us-en/assets/content_type/DownloadableAssets/4P_Server_Comparison _PID_41461.pdf [3]: Intel® 5000P/5000V/5000Z Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2006. http://www.intel.com/design/chipsets/datashts/313071.htm [4]: Intel® E8501 Chipset North Bridge (NB) Datasheet, Mai 2006, http://www.intel.com/design/chipsets/e8501/datashts/309620.htm [5]: Conway P & Hughes B., „The AMD Opteron Northbridge Architecture”, IEEE MICRO, March/April 2007, pp. 10-21 [6]: Intel® 7300 Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2007, http://www.intel.com/design/chipsets/datashts/313082.htm [7]: Supermicro Motherboards, http://www.supermicro.com/products/motherboard/ [8] Sander B., „AMD Microprocessor Technologies,” 2006, http://www.ewh.ieee.org/r4/chicago/foxvalley/IEEE_AMD_Meeting.ppt [9]: AMD Quad FX Platform with Dual Socket Direct Connect (DSDC) Architecture, http://www.asisupport.com/ts_amd_quad_fx.htm [10]: Asustek motherboards - http://www.asus.com.tw/products.aspx?l1=9&l2=39 http://support.asus.com/download/model_list.aspx?product=5&SLanguage=en-us References (1)
89
[11] Kanter, D. „A Preview of Intel's Bensley Platform (Part I),” Real Word Technologies, Aug. 2005, http://www.realworldtech.com/page.cfm?ArticleID=RWT110805135916&p=2 [12] Kanter, D. „A Preview of Intel's Bensley Platform (Part II),” Real Word Technologies, Nov. 2005, http://www.realworldtech.com/page.cfm?ArticleID=RWT112905011743&p=7 [13] Quad-Core Intel® Xeon® Processor 7300 Series Product Brief, Intel, Nov. 2007 http://download.intel.com/products/processor/xeon/7300_prodbrief.pdf [14] „AMD Shows Off More Quad-Core Server Processors Benchmark” X-bit labs, Nov. 2007 http://www.xbitlabs.com/news/cpu/display/20070702235635.html [15] AMD, Nov. 2006 http://www.asisupport.com/ts_amd_quad_fx.htm [16]: Rusu S., “ A Dual-Core Multi-Threaded Xeon Processor with 16 MB L3 Cache, ” Intel, 2006, http://ewh.ieee.org/r5/denver/sscs/Presentations/2006_04_Rusu.pdf [17]: Goto H., Intel Processors, PCWatch, March 04 2005, http://pc.watch.impress.co.jp/docs/2005/0304/kaigai162.htm [18]: Gilbert J. D., Hunt S., Gunadi D., Srinivas G., “ The Tulsa Processor, ” Hot Chips 18, 2006, http://www.hotchips.org/archives/hc18/3_Tues/HC18.S9/HC18.S9T1.pdf [19]:Goto H., IDF 2007 Spring, PC Watch, April 26 2007, http://pc.watch.impress.co.jp/docs/2007/0426/hot481.htm References (2)
90
[20]: Hruska J., “Details slip on upcoming Intel Dunnington six-core processor,” Ars technica,Details slip on upcoming Intel Dunnington six-core processor February 26, 2008, http://arstechnica.com/news.ars/post/20080226-details-slip-on-http://arstechnica.com/news.ars/post/20080226-details-slip-on- upcoming-intel-dunnington-six-core-processor.html [21]: Goto H,, 32 nm Westmere arrives in 2009-2010, PC Watch, March 26 2008, http://pc.watch.impress.co.jp/docs/2008/0326/kaigai428.htm [22]: Singhal R., “ Next Generation Intel Microarchitecture (Nehalem) Family: Architecture Insight and Power Management, IDF Taipeh, Oct. 2008, http://intel.wingateweb.com/taiwan08/published/sessions/TPTS001/FA08%20IDF -Taipei_TPTS001_100.pdf [23]: Smith S. L., “ 45 nm Product Press Briefing, ”, IDF Fall 2007, ftp://download.intel.com/pressroom/kits/events/idffall_2007/BriefingSmith45nm.pdf [24]: Bryant D., “ Intel Hitting on All Cylinders, ” UBS Conf., Nov. 2007, http://files.shareholder.com/downloads/INTC/0x0x191011/e2b3bcc5-0a37-4d06- aa5a-0c46e8a1a76d/UBSConfNov2007Bryant.pdf [25]: Barcelona's Innovative Architecture Is Driven by a New Shared Cache, http://developer.amd.com/documentation/articles/pages/8142007173.aspx [26]: Larger L3 cache in Shanghai, Nov. 13 2008, AMD, http://forums.amd.com/devblog/blogpost.cfm?threadid=103010&catid=271 [27]: Shimpi A. L., “ Barcelona Architecture: AMD on the Counterattack, ” March 1 2007, Anandtech, http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2939&p=1 References (3)
91
[28]: Rivas M., “ Roadmap update, ”, 2007 Financial Analyst Day, Dec. 2007, AMD, http://download.amd.com/Corporate/MarioRivasDec2007AMDAnalystDay.pdf [29]: Scansen D., “ Under the Hood: AMD ’ s Shanghai marks move to 45 nm node, ” EE Times, Nov. 11 2008, http://www.eetimes.com/news/latest/showArticle.jhtml?articleID=212002243 [30]: 2-way Intel Dempsey/Woodcrest CPU Bensley Server Platform, Tyan, http://www.tyan.com/tempest/training/s5370.pdf [31]: Gelsinger P. P., “ Intel Architecture Press Briefing, ”, 17. March 2008, http://download.intel.com/pressroom/archive/reference/Gelsinger_briefing_0308.pdf [32]: Mueller S., Soper M. E., Sosinsky B., Server Chipsets, Jun 12, 2006,MuellerSoperSosinsky http://www.informit.com/articles/article.aspx?p=481869 [33]: Goto H., IDF, Aug. 26 2005, http://pc.watch.impress.co.jp/docs/2005/0826/kaigai207.htm [34]: TechChannel, http://www.tecchannel.de/_misc/img/detail1000.cfm?pk=342850&http://www.tecchannel.de/_misc/img/detail1000.cfm?pk=342850& fk=432919&id=il-74145482909021379 [35]: Intel quadcore Xeon 5300 review, Nov. 13 2006, Hardware.Info, http://www.hardware.info/en-US/articles/amdnY2ppZGWa/Intel_quadcore_Xeon_ 5300_review References (4)
92
[36]: Wasson S., Intel's Woodcrest processor previewed, The Bensley server platform debuts, Mai 23, 2006, The Tech Report, http://techreport.com/articles.x/10021/1 [37]: Enderle R., AMD Shanghai “ We are back! TGDaily, November 13, 2008, http://www.tgdaily.com/content/view/40176/128/ Launch - Database Testing Date: November 13th, 2008 [38]: Clark J. & Whitehead R., “ AMD Shanghai Launch, Anandtech, Nov. 13 2008, http://www.anandtech.com/showdoc.aspx?i=3456 AMD Shanghai Launch - Database Testing [39]: Chiappetta M., AMD Barcelona Architecture Launch: Native Quad-Core, Hothardware, Sept. 10, 2007, http://hothardware.com/Articles/AMD_Barcelona_Architecture_Launch_Native_ QuadCore/ [40]: Hachman M., “AMD Phenom, Barcelona Chips Hit By Lock-up Bug,”, ExtremeTech, Dec. 5 2007, http://www.extremetech.com/article2/0,2845,2228878,00.asp [41]: AMD Opteron™ Processor for Servers and Workstations, http://amd.com.cn/CHCN/Processors/ProductInformation/0,,30_118_8826_8832,http://amd.com.cn/CHCN/Processors/ProductInformation/0,,30_118_8826_8832 00-1.html [42]: AMD Opteron Processor with Direct Connect Architecture, 2P Server Power Savings Comparison, AMD, http://enterprise.amd.com/downloads/2P_Power_PID_41497.pdf [43]: AMD Opteron Processor with Direct Connect Architecture, 4P Server Power Savings Comparison, AMD, http://enterprise.amd.com/downloads/4P_Power_PID_41498.pdf References (5)
93
[45]: Images, Xtreview, http://xtreview.com/images/K10%20processor%2045nm%20architec%203.jpg [46]: Kanter D., “Inside Barcelona: AMD's Next Generation, Real World Tech., Mai 16 2007, http://www.realworldtech.com/page.cfm?ArticleID=RWT051607033728 [47]: Kanter D,, “AMD's K8L and 4x4 Preview, Real World Tech. June 02 2006, http://www.realworldtech.com/page.cfm?ArticleID=RWT060206035626&p=1 [44]: AMD Opteron Product Data Sheet, AMD, http://pdfs.icecat.biz/pdf/1868812-2278.pdf References (6)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.