Dezső Sima 2009. november (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures.

Slides:



Advertisements
Similar presentations
Premio Desktop and Intel Processor Roadmap for Q3/2003 Premio Desktop and Intel Processor Roadmap for Q4/2003 to Q3/2004 By Calvin Chen Technical Director.
Advertisements

AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
III. Multicore Processors (4) Dezső Sima Spring 2007 (Ver. 2.1)  Dezső Sima, 2007.
Premio Predator G2 Workstation Training
Nov COMP60621 Concurrent Programming for Numerical Applications Lecture 6 Chronos – a Dell Multicore Computer Len Freeman, Graham Riley Centre for.
OPTERON (Advanced Micro Devices). History of the Opteron AMD's server & workstation processor line 2003: Original Opteron released o 32 & 64 bit processing.
A+ Guide to Hardware: Managing, Maintaining, and Troubleshooting, Sixth Edition Chapter 4 Supporting Processors and Upgrading Memory.
Cosc 2150 Current CPUs Intel and AMD processors. Notes The information is current as of Dec 5, 2014, unless otherwise noted. The information for this.
Complete CompTIA A+ Guide to PCs, 6e Chapter 2: On the Motherboard © 2014 Pearson IT Certification
The AMD and Intel Architectures COMP Jamie Curtis.
Intel® 64-bit Platforms Platform Features. Agenda Introduction and Positioning of Intel® 64-bit Platforms Intel® 64-Bit Xeon™ Platforms Intel® Itanium®
1 Comparing The Intel ® Core ™ 2 Duo Processor to a Single Core Pentium ® 4 Processor at Twice the Speed Performance Benchmarking and Competitive Analysis.
Dezső Sima Evolution of Intel’s Basic Microarchitectures - 2 April 2013 Vers. 3.3.
111 *Other names and brands may be claimed as the property of others Q Sell Up Guide Intel ® Core™ i7 (Bloomfield) vs. Lynnfield Positioning Intel.
Intel ® Server Platform Transitions Nov / Dec ‘07.
Advantech Embedded System Group Q2 2013
K10 based AMD processors Dezső Sima Fall 2007 (Ver. 2.1)  Dezső Sima, 2007.
INFO1119 (Fall 2012) INFO1119: Operating System and Hardware Module 2: Computer Components Hardware – Part 2 Hardware – Part 2.
Dezső Sima Fall 2007 (Ver. 1.0)  Sima Dezső, 2007 Multisocket system architectures.
COMPUTER ARCHITECTURE
Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology.
III. Multicore Processors (3)
PHY 201 (Blum) Buses Warning: some of the terminology is used inconsistently within the field.
Rendszerarchitektúra Sima Dezső 2007 tavaszi félév (Ver. 2.0)  Sima Dezső, 2007.
FRAME STRUCTURE Sephiroth Kwon GRMA
13. System Architecture Dezső Sima Fall 2006  D. Sima, 2006.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Dezső Sima Multicore and Manycore Processors December 2008 Overview and Trends.
PR-DLSR Motherboard Training for TSD & RMA engineers
OCIPUG Hardware SIG February 12, OCIPUG Hardware SIG Agenda – February 12, :00 – 7:05 Administration 7:05 – 8:00 Featured Topic – CPUs 8:00.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Complete CompTIA A+ Guide to PCs, 6e Chapter 2: On the Motherboard © 2014 Pearson IT Certification
Copyright © 2007 Heathkit Company, Inc. All Rights Reserved PC Fundamentals Presentation 27 – A Brief History of the Microprocessor.
Intel’s Penryn Sima Dezső Fall 2007 Version nm quad-core -
PR-DLS Motherboard Training for TSD & RMA engineers.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
DP/MP System Architectures
Computer Architecture Part IV-B: I/O Buses. Chipsets Intelligent bus controller chips found on the motherboard Enable higher speeds on one or more buses.
Copyright © 2007 Heathkit Company, Inc. All Rights Reserved PC Fundamentals Presentation 30 – PC Architecture.
Dezső Sima Evolution of Intel’s Basic Microarchitectures - 2 November 2012 Vers. 3.2.
Josh Ruggiero CSE 420 – April 23 rd  MCH – Memory Controller Hub  Bridges connection from CPU to RAM and Video Bus (AGP/PCI-X)  Connects to South.
Virtualisation Front Side Buses SMP systems COMP Jamie Curtis.
Hyper Threading Technology. Introduction Hyper-threading is a technology developed by Intel Corporation for it’s Xeon processors with a 533 MHz system.
Dezső Sima Fall 2007 (Ver. 2.1)  Dezső Sima, 2007 Multicore Processors (2)
Dezső Sima 2011 December (Ver. 1.5)  Sima Dezső, 2011 Platforms II.
Revision - 01 Intel Confidential Page 1 Intel HPC Update Norfolk, VA April 2008.
1 Chapter 2 Central Processing Unit. 2 CPU The "brain" of the computer system is called the central processing unit. Everything that a computer does is.
Input/Output Organization III: Commercial Bus Standards CE 140 A1/A2 20 August 2003.
Chap 4: Processors Mainly manufactured by Intel and AMD Important features of Processors: Processor Speed (900MHz, 3.2 GHz) Multiprocessing Capabilities.
I7’s Core. Intel’s Core i7 Content Overview Socket SSE 4.2 Instruction Set Cores –Intel Quickpath Interconnect –Nehalem - new micro-architecture –EP,
Dezső Sima September 2015 (Ver. 1.3)  Sima Dezső, 2015 Intel’s High Performance MP Servers and Platforms.
Dezső Sima September 2015 (Ver. 1.3)  Sima Dezső, 2015 Intel’s High Performance MP Servers and Platforms.
THE COMPUTER MOTHERBOARD AND ITS COMPONENTS Compiled By: Jishnu Pradeep.
Sima Dezső 2007 őszi félév (Ver. 2.1)  Dezső Sima, 2007 Többmagos Processzorok (3)
Recent experience with PCI-X 2.0 and PCI-E network interfaces and emerging server systems Yang Xia Caltech US LHC Network Working Group October 23, 2006.
Dezső Sima April 2016 (Ver. 1.5)  Sima Dezső, 2016 Intel’s High Performance MP Servers and Platforms.
Intel and AMD processors
Manycore processors Sima Dezső October Version 6.2.
Homework Reading Machine Projects Labs Exam Next Class
Intel’s High Performance
Evolution of Intel’s Basic Microarchitectures - 2
A Comprehensive Study of Intel Core i3, i5 and i7 family
Systemarchitecture Dezső Sima Spring 2007 (Ver. 2.1)
Technology and Historical Perspective: A peek of the microprocessor Evolution 11/14/2018 cpeg323\Topic1a.ppt.
III. Multicore Processors (2)
EVGA nForce™ 790i Ultra SLI
Többmagos Processzorok (2)
Chapter 4 Supporting Processors and Upgrading Memory
Presentation transcript:

Dezső Sima november (Ver. 1.0)  Sima Dezső, 2008 DP/MP System Architectures

Contents 2. Intel’s DP servers 3. Intel’s MP servers 1. The evolution of Intel’s basic microarchitectures 4. AMD’s servers

1. The evolution of Intel’s basic microarchitectures

1. The evolution of Intel’s basic microarchitectures (1) Figure: Intel ’ s Tick-Tock development model [22]

1. The evolution of Intel’s basic microarchitectures (2) Figure: The speed of changes in Intel ’ s Tick-Tock development model [24]

1. The evolution of Intel’s basic microarchitectures (3) Figure: Key enhancements introduced into the Core2 microarchitecture (vs the Pentium4) [22] Wide dynamic execution - 4-wide decode/rename/retire Advanced digital media processing bit wide SSE execution unit Improved graphics/MM - New SSE 4.1 instructions Smart memory access - Memory disambiguation (spec. loads) - Hardware prefetching Advanced smart cache - Low latency, high BW shared L2 cache

1. The evolution of Intel’s basic microarchitectures (4) Figure: Key enhancements introduced into the Penryn microarchitecture (vs the Core) [23]

1. The evolution of Intel’s basic microarchitectures (5) Figure: Improvements introduced into the Nehalem microarchitecture (vs Penryn) [22]

1. The evolution of Intel’s basic microarchitectures (6) Figure: Hyperthreading in the Nehalem microarchitecture [22]

1. The evolution of Intel’s basic microarchitectures (7) 2-level cache hierarchy 3-level cache hierarchy Figure: 3-level cache hierarchy of Nehalem [22]

1. The evolution of Intel’s basic microarchitectures (8) Figure: Nehalem ’ s innovations in the system architecture [22]

1. The evolution of Intel’s basic microarchitectures (9) Figure: Nehalem ’ s innovations in the system architecture [22]

QickPath Interconnect 3.2 GHz DDR 20-bit (16-bit data 4-bit CRC) on each lane 12.8 GT/s on each direction Fastest FSB Formerly: Common System interconnect (CSI) 400 MHz QDR 8 Byte 12.8 GT/s bidirectional HyperTransport Bus HT 1.0: 0.8 GHz DDR 2-Byte 3.2 GT/s on each direction HT 2.0: 1.0 GHz DDR 2-Byte 4.0 GT/s on each direction HT 3.0: 2.6 GHz DDR 2-Byte 10.4 GT/s on each direction Typical speed and width figures in AMD ’ s systems 1. The evolution of Intel’s basic microarchitectures (10)

Figure: Die shot of Nehalem [45] 1. The evolution of Intel’s basic microarchitectures (11)

2. Intel’s DP Servers

Figure: Typical configuration of an early DP-server motherboard based on Intel’s E7500/E7501 (Plunas) chipset P4 ICH3-S FWH E7500/E7501 SDRAM interface SDRAM interface DDR 200/266 registered, ECC opt. Ultra ATA/100 PCI v.2.2 USB v. 1.1 GPIO FSB LPC HI 1.5 P4 (with RASUM) HI 2.0 PCI-X v.2.2 Prestonia MCH 400/533 MHz 8/12/16 GB HI 2.0 PCI-X bridge SATA c. GbE c. PCI-X v.2.2 SATA GbE Video c. MbE c. PCI v.2.2 LAN (5 ports) SVGA MbE SIO FDKBMSSPPP SCSI c. SCSI (1-2 slots) (3 slots) *100 ~ (2 ports) 2. Intel’s DP servers (1)

Figure: Typical configuration of an advanced early DP-server motherboard based on Intel’s E7520 (Lindenhurst) chipset ICH5R FWH E7520 SDRAM interface SDRAM interface DDR 266/333, DDR2 400 registered, ECC opt. Ultra ATA/100 PCI v.2.3 USB v. 2.0 SATA AC' 97 v.2.3 GPIO FSB LPC HI 1.5 (with RASUM) PCI E. x8 PCI-X v.1.0b Nocona Paxville DP Nocona Paxville DP MCH 800 MHz 16/24/32 GB PCI E. x8 PCI-X bridge SCSI c. GbE c. PCI-X v.1.0b PCI E. x8 (or 2x x4) SCSI GbE Video c. MbE c. PCI v.2.3 LAN (4 ports) SVGA MbE SIO FDKBMSSPPP ~1.4 2*100 2*150 ~ (2 ports) 2. Intel’s DP servers (2) P4

2. Intel’s DP servers (3) Paxville DP 2.8 2xIrwindale cores/90 nm Figure: Intel ’ s Pentium 4 based DC DP server processors [33], [34]

Nocona Paxville IrwindaleNocona (L2 enlarged to 2MB) (2 x Irwindale cores) 6/ nm 112 mm mtrs mPGA 604 2/ nm 135 mm mtrs mPGA / nm 2 x 135 mm 2 2 x 169 mtrs Xeon DP 2.8 Xeon MP mPGA 604 Figure: Genealogy of the Xeon Paxville core (DP enhanced Prescott)(DP enhanced Prescott 2M) Sources: Intel’s first 64-bit Xeon In contrast: corresponding desktop processors have the LGA 775 socket. 2. Intel’s DP servers (4)

2. Intel’s DP servers (5) Xeon 5000 (Dempsey) Paxville DP 2.8 2xIrwindale cores/90 nm 2xCedar Mill/65 nm (65 nm shrink of the Irwindale) Figure: Intel ’ s Pentium 4 based DC DP server processors [33], [34]

2. Intel’s DP servers (6) Xeon 5100 (Woodcrest) Core2-based/65 nm Xeon 5300 (Clowertown) Core2-based/65 nm 2xXeon 5100 Figure: Intel ’ s Core2 based DC/QC DP server processors [33], [35], [36]

2. Intel’s DP servers (7) Figure: Intel ’ s Penryn based QC DP server processor/45 nm (Source: Intel) Xeon 5400 (Harpertown)

2. Intel’s DP servers (8) Figure: Contrasting the die shots of the Xeon 5400 and 5300 processors [24]

2. Intel’s DP servers (9) Series --- (Paxville DP) 5000 (Dempsey) 5100 (Woodcrest) 5200 (Wolfdale) 5300 (Clovertown) 5400 (Harpertown) Dual/Quad-CoreDC QC ModelsXeon DP E5205/E5260/ X5275 E /X5355 E5405-E5472, X5450-X5482 MicroarchitecturePentium 4 Core2PenrynCore2Penryn Core2*Irwindale dies2*Cedar diesSingle die2*Woodcrest dies2*Penryn Intro.10/20055/20066/200611/200711/200611/2007 Techology90 nm65 nm 45 nm65 nm45 nm Die size2*135 mm 2 2*81 mm mm 2 2*143 mm 2 2*107 mm 2 Nr. of transistors2*169 mtrs2*188 mtrs291 mtrs2*291 mtrs2*410 mtrs Fc [GHz] L22*2 MB 4 MB6 MB2*4 MB2*6 MB FSB [MT/s]800667/ / / / /1600 TDP [W]13595/13065/80 80/12080/120/150 SocketPGA 604LGA 771 EM64T HT --- ED VT EIST (5140 or above) La Grande--- AMT2--- Flex Migration--- Table: Intel ’ s DC, QC DP servers

2. Intel’s DP servers (10) Gainstown (Q1/2009) (Q1/2010?) Nehalem-based/45 nm Westmere_based/32 nm (Socket 1366) ??? Figure: Intel ’ s future DP server processors [21] (Both 2-way multithreaded)

Figure: Overview of the implementation of Intel ’ s Tick-Tock model for DP servers [24] 2x1 C, 2 MB L2/C 5000 (Dempsy) 1x2 C, 4 MB L2/C 5100 (Woodchrest) 2x2 C, 4 MB L2/C 5300 (Clowertown) 2x2 C, 6 MB L2/2C 5400 (Harpertown) 1x4 C, ¼ MB L2/C 8 MB L3, 5xxx (Gainstown) 1x6 C, ¼ MB L2/C 12 MB L3, 5xxx (???) 2. Intel’s DP servers (11)

Figure: Evolution of Intel’s DP servers 800MT/s 7520 (Lindenhurst) Nocona Paxville SC/DC Nocona Paxville SC/DC 24 Lanes PCIe 7.5GB/s Dual DDR2 400 MT/s 6.4 GB/s 1066MT/s 17.1 GB/s Dempsey Woodcrest Clowertown DC 5000 (Blackford) 24 Lanes PCIe 7.5GB/s Dempsey Woodcrest Clowertown DC Quad FB-DIMM 533 MT/s 17.1 GB/s 2. Intel’s DP servers (16) 6.4 GB/s

Figure : Intel’s late Pentium4 based and subsequent DP server platforms DP Platforms Xeon DP 2.8 DC 10/2005 DP Cores DP Chipsets 2. Intel’s DP servers (12) 90 nm/2*169 mtrs 2*2 MB L2 800 MT/s PGA /2004 (Lindenhurst) 800 MT/s 2 x DDR/DDR2 16 GB Pentium4-based (90/65 nm) /Paxville DP) DC

Figure: Evolution of Intel’s DP servers 800MT/s 7520 (Lindenhurst) Nocona Paxville DC SC/DC Nocona Paxville SC/DC 24 Lanes PCIe 7.5GB/s Dual DDR2 400 MT/s 6.4 GB/s 2. Intel’s DP servers (13) 6.4 GB/s

Figure: Typical configuration of an advanced early DP-server motherboard based on Intel’s E7520 (Lindenhurst) chipset ICH5R FWH E7520 SDRAM interface SDRAM interface DDR 266/333, DDR2 400 registered, ECC opt. Ultra ATA/100 PCI v.2.3 USB v. 2.0 SATA AC' 97 v.2.3 GPIO FSB LPC HI 1.5 (with RASUM) PCI E. x8 PCI-X v.1.0b Nocona Paxville DP Nocona Paxville DP MCH 800 MHz 16/24/32 GB PCI E. x8 PCI-X bridge SCSI c. GbE c. PCI-X v.1.0b PCI E. x8 (or 2x x4) SCSI GbE Video c. MbE c. PCI v.2.3 LAN (4 ports) SVGA MbE SIO FDKBMSSPPP ~1.4 2*100 2*150 ~ (2 ports) 2. Intel’s DP servers (14) P4

Figure : Intel’s late Pentium4 based and subsequent DP server platforms DP Platforms Xeon DP 2.8 DC 10/2005 DP Cores Xeon 5100Xeon 5300Xeon /2006 6/2006 5/2006 DP Chipsets (Dempsey) DC(Woodcrest) DC(Clowertown) QC / P 5000V/Z 6/2006 (Blackford) (Blackford V/Z) 2xFSB 1066MT/s 4 x FBDIMM (DDR2) 64GB 2 x FBDIMM (DDR2) 16GB 2. Intel’s DP servers (15) (Bensley) 65 nm/291 mtrs 4 MB L2 667/1066 MT/s LGA771 Pentium4/Core2-based (65 nm) 65 nm/2*188 mtrs 2*2 MB L2 667/1066 MT/s LGA nm/2*291 mtrs 2*4 MB L2 667/1066 MT/s LGA nm/2*169 mtrs 2*2 MB L2 800 MT/s PGA /2004 (Lindenhurst) 800 MT/s 2 x DDR/DDR2 16 GB Pentium4-based (90/65 nm) /Paxville DP) DC

2. Intel’s DP servers (17) Intel ’ s Bensley platform [30] (Actually the block diagram of Tyan ’ s S5370 DP server)

FB-DIMM DDR2 64 GB 5000P SBE2 Xeon DC/QC 5000 DC 5100 DC 5300 QC Figure: Bensley DP motherboard, with the 5000 (Blackford) chipset (Supermicro X7DB8+) for the Xeon 5000 DC/QC DP processor families [7] 2. Intel’s DP servers (18)

Table: Latency and bandwidth scaling of the Intel 5000 platform (2006) vs the earlier generation (2004) [1] 2. Intel’s DP servers (19)

Figure : Intel’s late Pentium4 based and subsequent DP server platforms DP Platforms Xeon DP 2.8 DC 10/2005 DP Cores Xeon 5100Xeon 5300Xeon 5400Xeon / /2006 6/2006 5/2006 DP Chipsets (Dempsey) DC(Woodcrest) DC(Clowertown) QC(Harpertown) QC / P 5000V/Z /2006 (Blackford) (Blackford V/Z) 10/2007 2xFSB 1066MT/s 4 x FBDIMM (DDR2) 64GB 2 x FBDIMM (DDR2) 16GB /2007 (San Clemente) 2xFSB 1333/1066 MT/s 2 x DDR2 32/48 GB 2. Intel’s DP servers (20) (Bensley) (Cranberry Lake) 65 nm/291 mtrs 4 MB L2 667/1066 MT/s LGA771 Pentium4/Core2-based (65 nm) Penryn-based (45 nm) 65 nm/2*188 mtrs 2*2 MB L2 667/1066 MT/s LGA nm/2*291 mtrs 2*4 MB L2 667/1066 MT/s LGA nm/850 mtrs 2*6 MB L2 1066/1333 MT/s LGA nm/2*169 mtrs 2*2 MB L2 800 MT/s PGA604 Xeon 5200 (Harpertown) DC 45 nm/850 mtrs 2*6 MB L2 1066/1333 MT/s LGA /2004 (Lindenhurst) 800 MT/s 2 x DDR/DDR2 16 GB Pentium4-based (90/65 nm) /Paxville DP) DC

2. Intel’s DP servers (21) Figure: The Cranberry Lake platform [19] Xeon 5400 (QC) Xeon 5200 (DC) 5100 chipset

1066MT/s 17.1 GB/s Tylersburg Nehalem QC Nehalem QC DMI PCI Express Gen 2 2. Intel’s DP servers (22)

2. Intel’s DP servers (23) Figure: Intel ’ s forthcoming Nehalem-based DP server system architecture [31] QuickPath Interconnect Integrated memory controller

3. Intel’s MP servers

3. Intel’s MP servers (1) Figure: Intel ’ s Pentium4 based Xeon MP processors [17], [18] Tulsa (7100) 90 nm 65 nm CDM: Cedar Mill core (65 nm shrink of the Irwindale core) Potomac Paxville MP (7000)

3. Intel’s MP servers (2) Figure: Intel ’ s Core2 /Penryn based Xeon MP processors [19], [20] 65 nm 45 nm Core2 based Penryn based Dunnington (7400) Tigerton QC (7300) Tigerton DC (7300) Core2 based 65 nm

Table: Dual- and Quad-Core Xeon MP-lines 1 Concerning the L2 cache size, there is a contradiction in Intel’s dokumentation; according to the data sheets, models of the 7000 series include 1 or 2 MB L2 caches, in contrast the comparison charts for all models shows 1 MB large L2 caches. 3. Intel’s MP servers (3) Series 7000 (Paxville MP) 7100 (Tulsa) 7200 (Tigerton DC) 7300 (Tigerton QC) 7400 (Dunnington QC) 7400 (Dunnington 6C) Dual/Quad-CoreDC 2xSC2xDCQC 6C Models M-7140M / 7110N-7150N E7210/E7220 E7310/E7320/E7330/E73 40/X7350 E7420-E7440E7450/X7460 MicroarchitectureNetburst Core 2Penryn Core 2xIrwindale dies Cedar Mill-based single die 2xSC Woodcrest dies 2xWoodcrest dies Intro.11/20058/20069/20079/2008 Techology90 nm65 nm 45 nm Die size2*135 mm mm 2 2*143 mm mm 2 Nr. of transistors2*169 mtrs1328 mtrs2*291 mtrs1900 mtrs Fc [GHz] / /2.13/2.4/2.4/ /2.66 L22*1/2 MB 1 2*1 MB2*4 MB2*2/2*2/2*3/2*4/2*4 MB3*2 MB 3*3 MB L3---4/8/16 MB--- 8/12/16 MB 12/16 MB FSB [MT/s]667/ TDP [W]95/ /80/80/80/ /130 SocketmPGA604 EM64T HT --- ED VT EIST La Grande--- n.a. AMT2--- (Except E7310)n.a.

3. Intel’s MP servers (4) Figure: Intel ’ s Nehalem based MP server processor [21]

Figure: Overview of the implementation of Intel ’ s Tick-Tock model for MP servers [24] 2x1 C, 1 MB L2/C 16 MB L3, 7100 (Tulsa) 1x2 C, 4 MB L2/C 7200 (Tigerton DC) 2x2 C, 4 MB L2/C 7300 (Tigerton QC) 1x6 C, 3 MB L2/2C 16 MB L (Dunnington) 1x8 C, ¼ MB L2/C 24 MB L3, 7xxx (Beckton) 2. Intel’s MP servers (5) TICK Pentium 4 /Prescott) 90nm 1x1 C, 8 MB/C (Potomac) TOCK Pentium 4 /Irwindale) 90 nm 2x1 C, ½ MB/C 7000 (Paxville MP) 1x1 C, 1 MB/C (Cransfield)

Table: Overview of Intel ’ s DP and MP server processors 2. Intel’s MP servers (6) Core/technologyDP server processorsMP server processors Pentium465 nm 2x1 C, 2 MB L2/C5000 (Dempsy)2x1 C, 1 MB L2/C 16 MB L3,7100 (Tulsa) Core265 nm 1x2 C, 4 MB L2/C5100 (Woodchrest) 2x2 C, 4 MB L2/C5300 (Clowertown) 1x2 C, 4 MB L2/C7300 (Tigerton DC) 2x2 c, 4 MB L2/C7300 (Tigerton QC) Penryn45 nm 2x2 C, 6 MB L2/2C5400 (Harpertown)1x6 C, 3 MB L2/2C 16 MB L37400 (Dunnington) Nehalem45 nm 1x4 C, ¼ MB L2/C 8 MB L3,5xxx (Gainstown)1x8 C, ¼ MB L2/C 24 MB L3,7xxx (Beckton) Westmere32 nm 1x6 C, ¼ MB L2/C 12 MB L3,5xxx (???)

Figure: Evolution of Intel’s Xeon MP-based system architecture (until the appearance of Nehalem) Preceding NBs Xeon MP 1 3. Intel’s MP servers (7) SC 1 Xeon MP before Potomac Typically HI 1.5 (266 MB/s)

Figure: Overview of the implementation of Intel ’ s Tick-Tock model for DP servers [24] 2x1 C, 1 MB L2/C 16 MB L3, 7100 (Tulsa) 1x2 C, 4 MB L2/C 7300 (Tigerton DC) 2x2 C, 4 MB L2/C 7300 (Tigerton QC) 1x6 C, 3 MB L2/2C 16 MB L (Dunnington) 1x8 C, ¼ MB L2/C 24 MB L3, 7xxx (Beckton) 2. Intel’s MP servers (5) TICK Pentium 4 /Prescott) 90nm 1x1 C, 8 MB/C (Potomac) TOCK Pentium 4 /Irwindale) 90 nm 2x1 C, ½ MB/C 7000 (Paxville MP) 1x1 C, 1 MB/C (Cransfield)

3. Intel’s MP servers (8) Figure: Former Pentium II/III MP systemarchitecture [32]

MP Platforms Xeon /2005 MP Cores Xeon /2006 MP Chipsets 3/2005 4/ (Paxville MP DC)(Tulsa DC) (Twin Castle) (?) Figure : Intel’s Xeon-based MP server platforms 2xFSB 667 MT/s 4 x XMB (2 x DDR2) 32GB 2xFSB 800 MT/s 4 x XMB (2 x DDR2) 32GB Truland 65 nm/1328 mtrs 2x1 MB L2 16/8/4 MB L3 800/667 MT/s mPGA 604 P4-based/65 nm 3/2005 Xeon MP 3/2005 (Potomac SC) 90 nm/2x169 mtrs 2x1 (2) MB L /667 MT/s mPGA nm/675 mtrs 1 MB L2 8/4 MB L3 667 MT/s mPGA 604 P4-based/90 nm Truland 3. Intel’s MP servers (9)

Figure: Evolution of Intel’s Xeon MP-based system architecture (until the appearance of Nehalem) Preceding NBs Xeon MP 1 3. Intel’s MP servers (10) SC 1 Xeon MP before Potomac Typically HI 1.5 (266 MB/s) (Twin Castle) XMB 8500/ PCIe lanes + HI 1.5 Truland Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC (266 MT/s) (7 GT/s) DC Cransfield SC) Tulsa (DC) 3 The 8500 supports also 2 First x86-64 MP processor

eXxternal Memory Bridge Independent Memory Interface 5.33 GB inbound BW 2.67 GB outbound BW simultaneously Figure: Intel’s 8501 chipset for MP servers (4/ 2006) [4] Xeon DC MP 7000 (4/2005) or later DC/QC MP 7000 processors Intelligent MC Dual mem. channels DDR 266/333/400 4 DIMM/channel (North Bridge) 3. Intel’s MP servers (11) Serial link

7000/7100 FB-DIMM DDR2 64 GB Figure: Quad socket Intel E8501 chipset based motherboard (Supermicro X6QT8) for the Xeon 7000/7100 DC MP processor families [7] Xeon DC E8501 NB ICH5R SB 3. Intel’s MP servers (12)

Figure Bandwith bottlenecks in Intel’s 8501 MP server platform [2] 3. Intel’s MP servers (13)

MP Platforms Xeon /2005 MP Cores Xeon 7200Xeon 7300 Xeon /2007 8/2006 MP Chipsets 3/2005 4/2006 9/ (Paxville MP DC)(Tulsa DC) (Tigerton DC) (Tigerton) QC Caneland 9/2007 (Clarksboro) (Twin Castle) (?) Figure : Intel’s Xeon-based MP server platforms 2xFSB 667 MT/s 4 x XMB (2 x DDR2) 32GB 2xFSB 800 MT/s 4 x XMB (2 x DDR2) 32GB 4xFSB 1066 MT/s 4 x FBDIMM (DDR2) 512GB Truland Xeon /2008 (Dunnington 6C) 65 nm/1328 mtrs 2x1 MB L2 16/8/4 MB L3 800/667 MT/s mPGA nm/2x291 mtrs 2x4 MB L MT/s mPGA nm/2x291 mtrs 2x(4/3/2) MB L MT/s mPGA nm/1900 mtrs 9/6 MB L2 16/12/8 MB L MT/s mPGA 604 P4-based/65 nmCore2-based/65 nmCore2-based/45 nm 3/2005 Xeon MP 3/2005 (Potomac SC) 90 nm/2x169 mtrs 2x1 (2) MB L /667 MT/s mPGA nm/675 mtrs 1 MB L2 8/4 MB L3 667 MT/s mPGA 604 P4-based/90 nm TrulandCaneland Intel’s MP servers (14)

Figure: Evolution of Intel’s Xeon MP-based system architecture (until the appearance of Nehalem) Preceding NBs Xeon MP 1 (Clarksboro) Tigerton XMB 3. Intel’s MP servers (15) 6C/QC/DC SC FB-DIMM (DDR2) 28 PCIe lanes + HI 1.5 Dunnington 8 PCI-E lanes + ESI Truland Caneland Xeon MP before Potomac Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC Potomac 2 Paxville MP 3 DC/SC Cransfield SC) Tulsa (DC) 3 The 6500 supports also 2 First x86-64 MP processor (266 MT/s) Typically HI 1.5 (266 MB/s) (7 GT/s) (2 GT/s)(1 GT/s) QC/DC (Twin Castle) 8500/8501 DC

Figure: Intel’s four socket 7300 (Caneland) platform, based on the 7300 (Clarksboro) chipset for the Xeon 7200/7300 DC/QC MP families (9/2007) [6] FB-DIMM up to 512 GB 7200 (Tigerton DC, Core2), DC Xeon 7300 (Tigerton QC, Core2), QC 3. Intel’s MP servers (16)

FB-DIMM DDR2 192 GB ATI ES1000 Graphics with 32MB video memory 7200 DC 7300 QC (Tigerton) Xeon Figure: Caneland MP motherboard, with the 7300 (Clarksboro) chipset (Supermicro X7QC3) for the Xeon 7200/7300 DC/QC MP processor families [7] SBE2 SB 7300 NB 3. Intel’s MP servers (17)

Figure: Performance comparison of the Caneland platform with a quad core Xeon (7300 family) vs the Bensley platform with a dual core Xeon 7140M [13] 3. Intel’s MP servers (18)

Beckton 8C 3. Intel’s MP servers (19) QPI QPI: QuickPath Interconnect QPI Figure: Intel ’ s Nehalem based MP server 4xFB-DIMM

3. Intel’s MP servers (20) FB-DIMM (DDR2) QPI Figure: Intel ’ s Nehalem based MP server system architecture [22]

4. AMD’s servers

nm/193 mm mtrs/82-89 W L2: 1 MB HT: 1.0, 0.8 GHz K8 90 nm/114 mm mtrs/95 W L2: 1 MB HT: 1.0, 1.0 GHz K8 90 nm/199 mm mtrs/95 W L2: 2*1 MB HT 1.0, GHz K8 90 nm/230 mm mtrs/95/110 W L2: 2*1 MB HT 1.0, GHz K8 65 nm/285 mm mtrs/95 W L2: 512 KB/C L3: 2 MB HT 3.0, 1.0 GHz K10 45 nm/243 mm mtrs/75W L2: 512 KB/C L3: 3 MB HT 3.0, GHz K10 Opteron (Sledgehammer) Opteron (Troy) Opteron (Italy) Opteron 2 2xx HE (Santa Rosa) Opteron (Barcelona) Opteron (Shanghai) 4 /03 2/05 5/058/0 6 9/07-4/0811/08 SCST DCST QCST Table: Overview of AMD ’ s Opteron DP processors 4. AMD’s servers (1)

nm/193 mm mtrs/82-89 W L2: 1 MB 3*HT 1.0, 0.8 GHz K8 90 nm/115 mm mtrs/85-93 W L2: 1 MB 3*HT 1.0, 1.0 GHz K8 90 nm/199 mm mtrs/95 W L2: 2 MB 3*HT 1.0, GHz K8 90 nm/220 mm mtrs/95/119 W L2: 2 MB/C 3*HT 1.0, GHz K8 65 nm/285 mm mtrs/84/95 W L2: 512 KB/C L3: 2 MB 4*HT 3.0, 1 GHz K10 45 nm/243 mm mtrs/75 W L2: 512 KB/C L3: 3 MB 4*HT 3.0, GHz K10 Opteron (Sledgehammer) Opteron (Athens) Opteron (Egypt) Opteron 82xx (Santa Rosa) Opteron (Barcelona) Opteron (Shanghai) 6/03-11/03 12/04-8/05 4/058/06-8/07 9/0711/08 Table: Overview of AMD ’ s Opteron MP processors 4. AMD’s servers (2) SCST DCST QCST

4. AMD’s servers (3) AMD Direct Connect Architecture Integrated Memory Controller Serial HyperTransport links Figure: AMD ’ s Direct Connect Architecture [41] Remark 3 HT 1.0 links at introduction, 4 HT 3.0 links with K10 (Barcelona) Introduced in 2003 along with the x86 ISA extension (Intel: 2008 with Nehalem)

4. AMD’s servers (4) Use of available HyperTransport links [44] UPs Each link supports connections to I/O devices DPs Two links support connections to I/O devices, any one of the three links may connect to another DP or MP processor MPs Each link supports connections to I/O devices or other DP or MP processors

AMD Opteron PCI- X PCI Express AMD Opteron PCI AMD Opteron PCI-X I/O RDD2 HT Figure: 2P and 4P server architectures based on AMD ’ s Direct Connect Architecture [42], [43] 4. AMD’s servers (5)

Figure: Advantages of AMD’s Direct Connect server architecture [2] 4. AMD’s servers (6)

Figure: Block diagram of a DP QC motherboard (Asus KFSN4-DRE/SAS) for the AMD Opteron 2300 QC family [10] 4. AMD’s servers (7)

Figure: DP motherboard for the AMD Opteron 2300 QC family (Asus KFSN4-DRE/SAS) [10] DDR2 64 GB 2300 Opteron QC DP nForce 2200 chipset 4. AMD’s servers (8) (Barcelona)

Figure: Block diagram of a QP QC motherboard for AMD’s Opteron 8000 DC/QC familes (ASUS KFN5-Q/SAS) [10] 4. AMD’s servers (9)

Figure: 4-socket motherboard for the AMD Opteron 8000 DC/QC familes (ASUS KFN5-Q/SAS) [10] 8300 Opteron QC MP nForce 3600 chipset DDR2 64 GB 4. AMD’s servers (10) Barcelona)

UP: Opteron 100/1000, DP: Opteron 200/2000 MP: Opteron 800/8000 Figure: Basic structure of the DC Opteron families [8] 4. AMD’s servers (11)

Figure: Block diagram of Barcelona (K10) vs K8 [46] (K10) 4. AMD’s servers (12)

4 HT 3.0 links Allow to build fully connected 4P systems with each processor using a separate I/O hub. 4. AMD’s servers (13)

Figure: Possible use of Barcelona ’ s four HT 3.0 links [47] 4. AMD’s servers (14)

4 HT 3.0 links [46] 4 links allow to build fully connected 4P systems with each processor using a separate I/O hub. HT 3.0 protocol allows to split each 16-bit link to two 8-bit wide links. This features can be utilized to build fully connected 8P systems with 8-bit wide links. 4. AMD’s servers (15)

Figure: Possible use of Barcelona ’ s four HT 3.0 links [47] 4. AMD’s servers (16)

Novel features of HT 3.0 links, such as Current platforms (Socket F with available chipsets) only supports 3 HT1.1 links with 2 GT/s speed [46]. higher speed or splitting a 16-bit HT link to two 8-bit links can be utilized only with a new platform. 4. AMD’s servers (17)

4. AMD’s servers (18) Figure: Cache architecture of the QC Barcelona [25]

4. AMD’s servers (19) Figure: Die shot and floor plan of Barcelona [27]

AMD reworked both chips and provided a new stepping. The Barcelona (and also the Phenom) processors had a bug in their TLB (Translation Lookaside Buffer) design [40]. 4. AMD’s servers (20)

4. AMD’s servers (21) Figure: Cache architectures of AMD ’ s QC Barcelona and Shanghai processors [25], [26] Barcelona (65 nm) Shanghai (45 nm)

Figure: Shanghai ’ s new features vs Barcelona [37] 4. AMD’s servers (22)

Figure_ Die shot and floor plan of Shanghai [37] 4. AMD’s servers (23)

4. AMD’s servers (24) Figure: Die shot of Shanghai [29] Pin to pin compatible with Barcelona 6 MB shared L3

AMD Shanghai Overview ModelCPU ClockMC ClockPart NumberPrice Opteron GHz2.2GHzOS2384WAL4DGI$989 Opteron GHz2.2GHzOS2382WAL4DGI$873 Opteron GHz2.0GHzOS2380WAL4DGI$698 Opteron GHz2.0GHzOS2378WAL4DGI$523 Opteron GHz2.0GHzOS2376WAL4DGI$377 Opteron GHz2.2GHzOS8384WAL4DGI$2149 Opteron GHz2.2GHzOS8382WAL4DGI$1865 Opteron GHz2.0GHzOS8380WAL4DGI$1514 Opteron GHz2.0GHzOS8378WAL4DGI$1165 Table: First introduced Shanghai based Opteron DP and MP models [38] 4. AMD’s servers (25)

Figure: AMD ’ s roadmap for server processors and platforms [37] 4. AMD’s servers (26)

[1]: Radhakrisnan S., Sundaram C. and Cheng K., „The Blackford Northbridge Chipset for the Intel 5000,” IEEE Micro, March/April 2007, pp [2]: Next-Generation AMD Opteron Processor with Direct Connect Architecture – 4P Server Comparison _PID_41461.pdf [3]: Intel® 5000P/5000V/5000Z Chipset Memory Controller Hub (MCH) – Datasheet, Sept [4]: Intel® E8501 Chipset North Bridge (NB) Datasheet, Mai 2006, [5]: Conway P & Hughes B., „The AMD Opteron Northbridge Architecture”, IEEE MICRO, March/April 2007, pp [6]: Intel® 7300 Chipset Memory Controller Hub (MCH) – Datasheet, Sept. 2007, [7]: Supermicro Motherboards, [8] Sander B., „AMD Microprocessor Technologies,” 2006, [9]: AMD Quad FX Platform with Dual Socket Direct Connect (DSDC) Architecture, [10]: Asustek motherboards References (1)

[11] Kanter, D. „A Preview of Intel's Bensley Platform (Part I),” Real Word Technologies, Aug. 2005, [12] Kanter, D. „A Preview of Intel's Bensley Platform (Part II),” Real Word Technologies, Nov. 2005, [13] Quad-Core Intel® Xeon® Processor 7300 Series Product Brief, Intel, Nov [14] „AMD Shows Off More Quad-Core Server Processors Benchmark” X-bit labs, Nov [15] AMD, Nov [16]: Rusu S., “ A Dual-Core Multi-Threaded Xeon Processor with 16 MB L3 Cache, ” Intel, 2006, [17]: Goto H., Intel Processors, PCWatch, March , [18]: Gilbert J. D., Hunt S., Gunadi D., Srinivas G., “ The Tulsa Processor, ” Hot Chips 18, 2006, [19]:Goto H., IDF 2007 Spring, PC Watch, April , References (2)

[20]: Hruska J., “Details slip on upcoming Intel Dunnington six-core processor,” Ars technica,Details slip on upcoming Intel Dunnington six-core processor February 26, 2008, upcoming-intel-dunnington-six-core-processor.html [21]: Goto H,, 32 nm Westmere arrives in , PC Watch, March , [22]: Singhal R., “ Next Generation Intel Microarchitecture (Nehalem) Family: Architecture Insight and Power Management, IDF Taipeh, Oct. 2008, -Taipei_TPTS001_100.pdf [23]: Smith S. L., “ 45 nm Product Press Briefing, ”, IDF Fall 2007, ftp://download.intel.com/pressroom/kits/events/idffall_2007/BriefingSmith45nm.pdf [24]: Bryant D., “ Intel Hitting on All Cylinders, ” UBS Conf., Nov. 2007, aa5a-0c46e8a1a76d/UBSConfNov2007Bryant.pdf [25]: Barcelona's Innovative Architecture Is Driven by a New Shared Cache, [26]: Larger L3 cache in Shanghai, Nov , AMD, [27]: Shimpi A. L., “ Barcelona Architecture: AMD on the Counterattack, ” March , Anandtech, References (3)

[28]: Rivas M., “ Roadmap update, ”, 2007 Financial Analyst Day, Dec. 2007, AMD, [29]: Scansen D., “ Under the Hood: AMD ’ s Shanghai marks move to 45 nm node, ” EE Times, Nov , [30]: 2-way Intel Dempsey/Woodcrest CPU Bensley Server Platform, Tyan, [31]: Gelsinger P. P., “ Intel Architecture Press Briefing, ”, 17. March 2008, [32]: Mueller S., Soper M. E., Sosinsky B., Server Chipsets, Jun 12, 2006,MuellerSoperSosinsky [33]: Goto H., IDF, Aug , [34]: TechChannel, fk=432919&id=il [35]: Intel quadcore Xeon 5300 review, Nov , Hardware.Info, _review References (4)

[36]: Wasson S., Intel's Woodcrest processor previewed, The Bensley server platform debuts, Mai 23, 2006, The Tech Report, [37]: Enderle R., AMD Shanghai “ We are back! TGDaily, November 13, 2008, Launch - Database Testing Date: November 13th, 2008 [38]: Clark J. & Whitehead R., “ AMD Shanghai Launch, Anandtech, Nov , AMD Shanghai Launch - Database Testing [39]: Chiappetta M., AMD Barcelona Architecture Launch: Native Quad-Core, Hothardware, Sept. 10, 2007, QuadCore/ [40]: Hachman M., “AMD Phenom, Barcelona Chips Hit By Lock-up Bug,”, ExtremeTech, Dec , [41]: AMD Opteron™ Processor for Servers and Workstations, html [42]: AMD Opteron Processor with Direct Connect Architecture, 2P Server Power Savings Comparison, AMD, [43]: AMD Opteron Processor with Direct Connect Architecture, 4P Server Power Savings Comparison, AMD, References (5)

[45]: Images, Xtreview, [46]: Kanter D., “Inside Barcelona: AMD's Next Generation, Real World Tech., Mai , [47]: Kanter D,, “AMD's K8L and 4x4 Preview, Real World Tech. June , [44]: AMD Opteron Product Data Sheet, AMD, References (6)