COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept. stasys.maciulevicius@ktu.lt
Development of processor architecture Main processor development and production companies, creating a new processors to the various market segments, are seeking: enhance its performance; to reach this goul they: increase clock frequency use a variety of microarchitecture enhancements move to multi-core microarchitectures reduce energy consumption 2014 ©S.Maciulevičius ©S.Maciulevičius 2
Word length: from 32 to 64 bits 32-bit processor can do operations over integers to 232 or 4.3 billion 64-bit processor’s facilities reach 264 or round 18.4 quintillion (18,400,000,000,000,000,000); 32-bit processors and operating systems can support up to 4 gigabytes of memory, including only 2 gigabytes for applications; CAD/CAM and scientific calculations this is not enough at present 2014 ©S.Maciulevičius ©S.Maciulevičius 3
Data in processors Data type Register set Functional unit x86 word x86-64 word Integers GPR ALU 32 64 Addresses ALU or AGU Floating point numbers FPR FPU Vectors VR VPU 128 As can be seen, but differs only length of integers and addresses 2014 ©S.Maciulevičius ©S.Maciulevičius 4
x86-64 specification The x86-64 specification was designed by Advanced Micro Devices (AMD) as an extension of the x86 instruction set It allows far larger virtual and physical address spaces than x86, doubles the width of the integer registers from 32 to 64 bits, increases the number of integer registers, and provides other enhancements 2014 ©S.Maciulevičius ©S.Maciulevičius 5
Intel® EM64T Intel has released their “64-bit technology” in order to compete with AMD’s 64-bit technology Intel EM64T enhances system performance enabling access more than 4 GB memory Intel EM64T supports: 64-bit virtual address space 64-bit pointers 64-bit general purpose registers 64-bit integers 2014 ©S.Maciulevičius ©S.Maciulevičius 6
EM64T (and x86-64) registers 2014 ©S.Maciulevičius ©S.Maciulevičius 7
Multi-core processors Increase the frequency towards increasing performance, becoming more and more difficult Instead, the companies have focused their efforts to increase the parallelism - developed dual-core processors, later moving to a multi-core processors This way follow Intel, AMD, Motorola, Sun and other companies 2014 ©S.Maciulevičius ©S.Maciulevičius 8
Intel Core microarchitecture summary 2014 ©S.Maciulevičius
Intel Nehalem microarchitecture Nehalem is the codename for an Intel processor microarchitecture, successor to the Core microarchitecture The first processor released with the Nehalem architecture was the desktop Core i7, which was released in November 2008. Nehalem differs radically from Netburst. Nehalem-based microprocessors use higher clock speeds and are more energy-efficient. Hyper-threading is reintroduced, along with a reduction in L2 cache size, as well as an enlarged L3 cache that is shared by all cores 2014 ©S.Maciulevičius ©S.Maciulevičius 10
Intel Nehalem microarchitecture 64 KB L1 cache/core (32 KB L1 Data + 32 KB L1 Instruction) and 256 KB L2 cache/core 4–12 MB L3 cache Native (all processor cores on a single die) quad- and octa-core processors Intel QuickPath Interconnect in high-end models replacing the legacy front side bus Integration of PCI Express and DMI into the processor, replacing the northbridge Integrated memory controller supporting two or three memory channels of DDR3 SDRAM or four FB-DIMM2 channels Second-generation Intel Virtualization Technology 2014 ©S.Maciulevičius ©S.Maciulevičius 11
Some of Intel Nehalem processors Core i7 (LGA 1366) Core i7 (LGA 1156) Core i5 Core 2 Quad Processor Interface LGA 1366 LGA 1156 LGA 775 Number of Cores 4 Turbo Boost Yes No Hyper-Threading L1 Cache 32KB/32KB per core L2 Cache 256KB per core Up to 12MB shared L3 Cache 8MB shared Memory Channels 3 2 Max. Memory Rate DDR3-1066 DDR3-1333 DDR3-1600 Chipset X58 P55 X48 Price $284-$999 $285-$555 $199 $163-$316 2014 ©S.Maciulevičius ©S.Maciulevičius 12
Intel’s strategy Intel introduces new microprocessor architectures every 2 years as part of “Tick-Tock” strategy: 2014 ©S.Maciulevičius
Intel’s Sandy Bridge Sandy Bridge is the codename for a microarchitecture developed by Intel beginning in 2005 for CPUs in computers to replace the Nehalem microarchitecture It was designed for the full range of applications from mobile devices, laptop and desktop computers, to large enterprise servers Intel demonstrated a Sandy Bridge processor in 2009, and released first products in January 2011 based on the architecture . 2014 ©S.Maciulevičius
Intel’s Sandy Bridge Sandy Bridge main features: 32 nm fabrication process CPU clock rate 1.4–3.4 GHz, grafics clock rate 350-850 MHz (for different models) Turbo Boost 2.0 technology enables rise of clock rate till 3.8 GHz and 1350 MHz respectively 32 kB data + 32 kB instruction L1 cache (3 clocks) and 256 kB L2 cache (8 clocks) per core Shared L3 cache – 3-8 MB (25 clocks) . 2014 ©S.Maciulevičius
Intel’s Sandy Bridge Sandy Bridge has integrated graphic controller and specialized accelerator; it accelerates multimedia content processing significantly Sandy Bridge supports DirectX 10.1 and OpenCL 1.1; its productivity far exceeds the performance of the first generation Core Advanced Vector Extensions (AVX) 256-bit instruction set with wider vectors, new extensible syntax and rich functionality . 2014 ©S.Maciulevičius
Intel’s Sandy Bridge Decoded micro-operation cache and enlarged, optimized branch predictor 256-bit/cycle ring bus interconnect between cores, graphics, cache and System Agent Domain Intel Quick Sync Video, hardware support for video encoding and decoding Up to 8 physical cores or 16 logical cores through Hyper-threading TDP of desktop CPUs is 35–95 W, for mobile CPUs –17-55 W . 2014 ©S.Maciulevičius
Intel’s Sandy Bridge caches . 2014 ©S.Maciulevičius
Sandy Bridge microarchitecture . 2014 ©S.Maciulevičius
Sandy Bridge: L0 cache . 2014 ©S.Maciulevičius
Sandy Bridge: ring bus Each core, each slice of L3 (LLC) cache, the on-die GPU, media engine and the system agent all have a stop on the ring bus The bus is made up of four independent rings: a data ring, request ring, acknowledge ring and snoop ring. Each stop for each ring can accept 32-bytes of data per clock . 2014 ©S.Maciulevičius
Intel’s Ivy Bridge Ivy Bridge is the first chip to use Intel's 22nm tri-gate transistors, which help scale frequency and reduce power consumption At a high level Ivy Bridge looks a lot like Sandy Bridge Ivy Bridge is considered a tick from the CPU perspective but a tock from the GPU perspective 2014 ©S.Maciulevičius
Intel’s Ivy Bridge 2014 ©S.Maciulevičius
Intel’s Ivy Bridge 2014 ©S.Maciulevičius
Ivy Bridge Configurable TDP Intel’s Ivy Bridge Ivy Bridge introduces configurable TDP that allows the platform to increase the CPU's TDP if given additional cooling, or decrease the TDP to fit into a smaller form factor 65W 55W 45W Ivy Bridge XE 33W 17W 13W Ivy Bridge ULV cTDP Up Nominal cTDP Down Ivy Bridge Configurable TDP 2014 ©S.Maciulevičius
Intel’s Ivy Bridge Sandy Bridge brought a completely redesigned GPU core onto the processor die itself With Ivy Bridge the GPU remains on die but it grows more than the CPU does this generation Ivy Bridge GPU adds support for OpenCL 1.1, DirectX 11 and OpenGL 3.1 2014 ©S.Maciulevičius
From Nehalem to Hasswell 2014 ©S.Maciulevičius ©S.Maciulevičius 27
Intel’s Hasswell Haswell is the codename for a processor microarchitecture as the successor to the Ivy Bridge architecture Using the 22 nm process, Intel is expected to release CPUs based on this microarchitecture around June 2, 2013 With Haswell, Intel will introduce a new low-power processor designed for convertible or 'hybrid' Ultrabooks 2014 ©S.Maciulevičius ©S.Maciulevičius 28
Intel’s Hasswell The Haswell architecture is specifically designed to optimize the power savings and performance benefits Haswell is expected to launch in three major forms: Desktop version (LGA1150 socket): Haswell-DT Mobile/Laptop version (PGA socket): Haswell-MB BGA version: 47W and 57W TDP classes: Haswell-H (For "All-in-one" systems, Mini-ITX form factor motherboards, and other small footprint formats.) 13.5W and 15W TDP classes (SoC): Haswell-ULT (For Intel's UltraBook platform.) 10W TDP class (SoC): Haswell-ULX (For tablets and certain UltraBook-class implementations.) 2013 2014 ©S.Maciulevičius ©S.Maciulevičius 29
Intel’s Hasswell Performance Compared to Ivy Bridge: Twice the vector processing performance At least 10% sequential CPU performance increase (8 execution ports per core versus 6) Up to double the performance of the integrated GPU 2014 ©S.Maciulevičius ©S.Maciulevičius 30
Intel’s Hasswell 2014 ©S.Maciulevičius ©S.Maciulevičius 31
2014 ©S.Maciulevičius ©S.Maciulevičius 32
CPU Idle Power 2014 ©S.Maciulevičius ©S.Maciulevičius 33
2014 ©S.Maciulevičius ©S.Maciulevičius 34
Intel’s Hasswell 2014 ©S.Maciulevičius ©S.Maciulevičius 35
Intel Hasswell 2013 ©S.Maciulevičius 36
AVX2 – FMA 2013 ©S.Maciulevičius 37
Some models CPU Freq. Turbo Boost Cache-Memory Cores / Threads TDP Core i7-4770K 3.5 GHz 3.9 GHz 8 MB 4 / 8 84 W Core i7-4770 3.4 GHz Core i7-4770S 3.1 GHz 65 W Core i7-4770T 2.5 GHz 3.7 GHz 45 W Core i7-4765T 2.0 GHz 3.0 GHz 35 W 2013 ©S.Maciulevičius 38