Presented by : Nasser Hadjloo

Slides:



Advertisements
Similar presentations
Multicore Architectures Michael Gerndt. Development of Microprocessors Transistor capacity doubles every 18 months © Intel.
Advertisements

AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.
Intel Xeon Nehalem Architecture Billy Brennan Christopher Ruiz Kay Sackey.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Fall EE 333 Lillevik 333f06-l20 University of Portland School of Engineering Computer Organization Lecture 20 Pipelining: “bucket brigade” MIPS.
The First Microprocessor By: Mark Tocchet and João Tupinambá.
Original Authors: Stefan Rusu, Simon Tam, Harry Muljono, Jason Stinson, David Ayers, Jonathan Chang, Raj Varada, Matt Ratta, Sailesh Kottapalli Some slides.
Intel® Core™ Duo Processor Behrooz Jafarnejad Winter 2006.
OPTERON (Advanced Micro Devices). History of the Opteron AMD's server & workstation processor line 2003: Original Opteron released o 32 & 64 bit processing.
INTEL COREI3 INTEL COREI5 INTEL COREI7 Maryam Zeb Roll#52 GFCW Peshawar.
Microprocessors I Time: Sundays & Tuesdays 07:30 to 8:45 Place: EE 4 ( New building) Lecturer: Bijan Vosoughi Vahdat Room: VP office, NE of Uni Office.
1 Microprocessor-based Systems Course 4 - Microprocessors.
Processor history / DX/SX SX/DX Pentium 1997 Pentium MMX
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 4: Computer Performance.
The Pentium 4 CPSC 321 Andreas Klappenecker. Today’s Menu Advanced Pipelining Brief overview of the Pentium 4.
Cosc 2150 Current CPUs Intel and AMD processors. Notes The information is current as of Dec 5, 2014, unless otherwise noted. The information for this.
Copyright © 2006, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners Intel® Core™ Duo Processor.
7-Aug-15 (1) CSC Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32.
The AMD and Intel Architectures COMP Jamie Curtis.
3.1Introduction to CPU Central processing unit etched on silicon chip called microprocessor Contain tens of millions of tiny transistors Key components:
Intel® 64-bit Platforms Platform Features. Agenda Introduction and Positioning of Intel® 64-bit Platforms Intel® 64-Bit Xeon™ Platforms Intel® Itanium®
Hiep Hong CS 147 Spring Intel Core 2 Duo. CPU Chronology 2.
1 Comparing The Intel ® Core ™ 2 Duo Processor to a Single Core Pentium ® 4 Processor at Twice the Speed Performance Benchmarking and Competitive Analysis.
111 *Other names and brands may be claimed as the property of others Q Sell Up Guide Intel ® Core™ i7 (Bloomfield) vs. Lynnfield Positioning Intel.
CMPE 511 Computer Architecture Caner AKSOY CmpE Boğaziçi University December 2006 Intel ® Core 2 Duo Desktop Processor Architecture.
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.
COMPUTER ARCHITECTURE
Chapter 18 Multicore Computers
Computer performance.
Microprocessors Chapter 1 powered by dj1. Slide 2 of 66Chapter 1 Objectives  Discuss the working of microprocessor  Discuss the various interfaces of.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Multi-Core Architectures
A+ Guide to Managing and Maintaining your PC, 6e Chapter 5 Processors and Chipsets (v0.9)
Intel’s Penryn Sima Dezső Fall 2007 Version nm quad-core -
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
History of Microprocessor MPIntroductionData BusAddress Bus
NVMe & Modern PC and CPU Architecture 1. Typical PC Layout (Intel) Northbridge ◦Memory controller hub ◦Obsolete in Sandy Bridge Southbridge ◦I/O controller.
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Dezső Sima Evolution of Intel’s Basic Microarchitectures - 2 November 2012 Vers. 3.2.
1 Latest Generations of Multi Core Processors
Evolution of Microprocessors Microprocessor A microprocessor incorporates most of all the functions of a computer’s central processing unit on a single.
Hyper Threading Technology. Introduction Hyper-threading is a technology developed by Intel Corporation for it’s Xeon processors with a 533 MHz system.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Computer Architecture Introduction Lynn Choi Korea University.
Microprocessor Microarchitecture Introduction Lynn Choi School of Electrical Engineering.
Succeeding with Technology Chapter 2 Hardware Designed to Meet the Need The Digital Revolution Integrated Circuits and Processing Storage Input, Output,
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
Modern general-purpose processors. Post-RISC architecture Instruction & arithmetic pipelining Superscalar architecture Data flow analysis Branch prediction.
Lecture 3 Dr. Muhammad Ayaz Computer Organization and Assembly Language. (CSC-210)
Microprocessor Design Process
Lecture 3 (Microprocessor) Dr. Muhammad Ayaz Computer Organization and Assembly Language. (CSC-210)
Hardware Architecture
MAHARANA PRATAP COLLEGE OF TECHNOLOGY SEMINAR ON- COMPUTER PROCESSOR SUBJECT CODE: CS-307 Branch-CSE Sem- 3 rd SUBMITTED TO SUBMITTED BY.
Modern Processors.  Desktop processors  Notebook processors  Server and workstation processors  Embedded and communications processors  Internet.
Intel and AMD processors
Multiple Processor Systems
Multiprocessing.
Architecture & Organization 1
Evolution of Intel’s Basic Microarchitectures - 2
Phnom Penh International University (PPIU)
A Comprehensive Study of Intel Core i3, i5 and i7 family
Unit 2 Computer Systems HND in Computing and Systems Development
Hyperthreading Technology
Architecture & Organization 1
Intel Xeon Nehalem Architecture
Computer Evolution and Performance
Lecture 3 (Microprocessor)
Presentation transcript:

Presented by : Nasser Hadjloo

Design Considerations  Instruction-level parallelism.  Use of Cache hierarchies and their management.  Higher clock speeds  The Front Side Bus (FSB).  Multi-Threading.  Power Consumption and heating issues.  Etc …

Intel Architectures: Netburst

NetBurst Architecture

Features of Netburst Architecture  Hyperthreading single processor appears to be two logical processor Each logical processor has its own set of register, APIC( Advanced programmable interrupt controller) Increases resource utilization and improve performance.

 Rapid Execution Engine: Arithmetic Logic Units (ALUs) run at twice the processor frequency. Basic integer operations executes in 1/2 processor clock tick. Provides higher throughput and reduced latency of execution.

Netburst Microarchitecture

Design Considerations  Deeper pipeline(20 stage) with increased branch mispredictions but greater clock speeds and performance.  Techniques to hide penalties such as parallel execution, buffering, and speculation.  Executes instructions dynamically and out-of order.  Performance of a particular code sequence may vary depending on the state the machine was in when that code sequence was entered.

Modifications in NetBurst  Northwood design combined an increased cache size, a smaller 130 nm fabrication process, and hyper- threading technology  Prescott, had a heavily improved branch predictor, the introduction of the SSE3 SIMD instructions, the implementation of Intel 64, Intel's branding for their compatible implementation of the x bit version of the x86 architecture  two Prescott cores in a single die, and later Presler, which consists of two Cedar Mill cores on two separate dies.  But this had problems……….

Heading to Core

Core Microachitecture

Core Microarchitecture

Design Considerations of Core  L2 control unit (super-queue)= L2 controller (snoop requests)+ Bus control unit (data and I/O requests to and from the external bus).  Prefetching unit is extended to handle separately hardware prefetching by each core.  Shared L2 cache in the Core 2 Duo eliminates on-chip L2-level cache coherence and between L1s of two cores in Core 2 Duo.  Although, Core 2 Duo benefits from its on-chip access to the other L1 cache, its performance is limited.

Features of Core Architecture  Multiple cores and hardware virtualization.  14 stage pipeline (smaller than Netburst).  Dual core design with linked L1 cache and shared L2 cache.  Macrofusion - Two program instructions can be executed as one micro-operation.  Intel Intelligent Power Capability- manages run time power consumption of the processors’ execution cores.  Includes advanced power gating capacity- ultra fine- grained control systems that turns on individual processor logic subsystems only if when they are needed.

Modifications in Core  Allendale core, with 2 MB L2 cache, offers a smaller die size and therefore greater yields.  Merom, the first mobile version of the Core 2, gives more emphasis on low power consumption to enhance notebook battery life.  Kentsfield released was the first Intel desktop quad core CPU. It comprises of two separate silicon dies (each equivalent to a single Core 2 duo) on one multi chip module  Penryn design are the addition of new instructions including SSE4.  Problem……..

Problem with quad core

Heading to Nehalem

Introduction  Core i7 New Intel CPU brand name for the business and high-end consumer markets  Core i5 processors intended for the main-stream consumer market  Core i3 processors intended for the entry-level consumer market

Features of Nehalem  Integrated Memory Controller  Quick Path Interconnect  Advanced Configuration and Power States  Improvements to the pipeline (L2 Branch Predictor, Renamed Returned Stack Buffer, L2 TLB, etc)  HyperThreading  SSE4.2 instructions  Nehalem architecture has a three-level cache

Core i7 History  It was started by Bloomfield Architecture in 2008  In 2009 Lynnfield and Clarksfield models cames  Prior to 2010 all models were quad core  In 2010 Arrandale (dual core) models comes  In 2010 Gulftown models (extreme) comes which has six hyperthreaded cores

Bloomfield  All models started by Core-i7 9xx with socket 1366  Includes single-processor servers sold as Xeon 35xx  Replaced Yorkfield processors  Use a different socket than other core-I cpus. Even from all 45 nm cpus  On-die memory controller (uncore clock)  Use (only one) QPI instead of FSB  Support for SSE4.2 & SSE4.1 instruction sets

Bloomfield  32 KB L1 instruction and 32 KB L1 data cache per core  256 KB L2 cache (combined instruction and data) per core  8 MB L3 (combined instruction and data) "inclusive", shared by all cores  "Turbo Boost" technology allows all active cores to intelligently clock themselves up in steps of 133 MHz over the design clock rate as long as the CPU's predetermined thermal and electrical requirements are still met

Lynnfield  Used on Core-i5  There is no QPI but directly connects to a southbridge using a 2.5 GT/s Direct Media Interface and to other devices using PCI Express links in its Socket 1156  Core i7 processors based on Lynnfield have Hyper-Threading, which is disabled in Lynnfield-based Core i5 processors

Lynnfield  Core i5-7xx, Core i7-8xx or Xeon X34xx  Replaced Penryn based Yorkfield processor  45 nm  Socket 1156 opposed to the 1366  include Direct Media Interface and PCI Express links (dedicated northbridge chip, called the memory controller hub or I/O hub)

Clarksfield  Is the mobile version of Lynnfield and available under the Core i7 Mobile brand  Quad core, 45 nm  integrated PCI Express and DMI links  Core i7 7xxQM (6MB), Core i7 8xxQM (8MB), Core i7 9xxXM Extreme Edition (8MB)  Replaced Penryn-QC

Arrandale  Second Mobile cups which contains All Core i7 6xx [UE, LE, E] (4MB)  Core i5 5xx [UM, M, E] (3MB), Core i5 4xxM (3MB)  Core i3 3xxM, Celeron U3xxx ( unreleased ), P4 xxx (2MB)  Integrated graphics processing unit but only two processor cores  32 nm and Dual Core  E series processors are embedded versions with support for PCIe bifurcation and ECC memory

Clarkdale  Desktop version of Arrandale, 32 nm  Only as Core i3 and Core i5 and Dual Core  All support Intel's Hyper Threading (HT)  Integrated Graphics as well as PCI-Express and DMI links  The Clarkdale processor package contains two dies: the actual 32 nm processor with the I/O connections and the 45 nm graphics controller with the memory interface  Successor of Wolfdale (45nm)

Clarkdale  Used in Intel Core, Pentium and Celeron  The Core i5 versions generally have all features enabled  Only the Core i5-661 model lacking Intel VT-d and TXT like the Core i3, which also does not support Turbo Boost and the AES new instructions  Pentium and Celeron versions do not have SMT, only use a reduced amount of third-level cache

Gulftown or Westmere-EP  The Extreme Edition version of the Core i7 featuring 6 cores, 32nm process (core i9)  Gulftown is the first six-core dual-socket processor from Intel  Hyper-Threading (for a total of 12 logical threads), 12 MB of cache, Turbo Boost and Intel QuickPath connection bus  Uses Westmere micro architecture a 32 nm shrink version of Nehalem

Gulftown  50% higher performance than bloomfield core i7 975  Includes Core i7 9xx and Corei7 9xxx [12 MB], Xeon 36xx, Xeon 56xx  Socket 1366

Specification

Nehalem Architecture

Design Considerations  Hypertreading is reintroduced to cater to increasing number of thread based applications.  Cores are placed on a single die to reduce latencies.  QuickPath Interconnect also supplements to achieve this purpose.  L1 and L2 for each core and large shared L3 cache for improving performance.

Looking forward to Sandy Bridge

What can we expect……  Sandy Bridge microchip will have an architecture optimized for 32-nanometer transistors  The Sandy Bridge microarchitecture is also said to focus on the connections of the processor core like vertical interconnects and multilevel dies  Increase in FLOPs by using AVX (Advanced Vector Extensions)  Haswell will be the successor to Sandy Bridge will be in 22nm.  The tick tock model works just fine…!!!

Intel Processor Trends

NetBurstCoreNehalem Cache Hierarchy Two level hierarchy Three level hierarchy Second level cache size 256KB–2MB1MB–12MB>1MB Third level cache size --8MB Front side bus (in MHz) 400, 533, 800, , 667,800, 1066,1333,1600 (QPI=6.4GT/s)

Intel Processor Trends

SPEC 2000benchmark (3.0 GHz, Pentium 4 processor with Hyper-Threading Technology) Primary Cache: 12k micro-ops I + 8KBD on chip Secondary Cache: 512KB(I+D) on chip Memory: 512 MB (3.80 GHz, Intel Pentium 4 processor 570J) Primary Cache: 12k micro-ops I + 16KBD on chip Secondary Cache: 1MB(I+D) on chip Memory: 1 GB GHz, Intel(R) Pentium(R) 4 processor Primary Cache: 12k micro-ops I + 16KBD on chip Secondary Cache: 2MB(I+D) on chip Memory: 1 GB Intel(R) Core(TM) 2 Extreme processor X6800( 2.93 GHz, 1066 MHz bus Primary Cache: 32KBI + 32KBD per core, on chip Secondary Cache: 4 MB(I+D) per chip, on chip (shared) Memory: 2 GB

SPEC 2006 benchmark 2006:Intel Core 2 Duo E GHz, 1066 MHz bus Primary Cache: 32 KB I + 32 KB D on chip per core Secondary Cache: 4 MB I+D on chip per chip Memory: 2 GB 2007:Intel Core 2 Extreme QX GHz 1333 MHz FSB Primary Cache:32 KB I + 32 KB D on chip per core Secondary Cache:12 MB I+D on chip per chip, 6 MB shared / 2 cores Memory: 4 GB 2008:Intel Xeon X GHz Primary Cache: 32 KB I + 32 KB D on chip per core Secondary Cache: 6 MB I+D on chip per chip Memory: 16 GB 2009:Intel Core i7-965 Extreme Edition Intel Turbo Boost Technology up to 3.46 GHz Primary Cache: 32 KB I + 32 KB D on chip per core Secondary Cache: 256 KB I+D on chip per core L3 Cache: 8 MB I+D on chip per chip Memory: 12 GB

 Focus needs to be on more scalable and robust architecture.  Implementing 3-D integration.  How about a 128 bit processor?  The speed of light problem.  The end of Moore’s Law? Our Views

REFERENCES: Journals:  Koufaty, D. Marr, D.T, “Hyperthreading technology In the netburst Microarchitecture”, Volume: 23, Issue: 2, page(s): 56 – 65.  Lu Peng, Jih-Kwon Peir, Prakash, T.K., Yen-Kuang Chen, Koppelman, D, “Memory Performance and Scalability of Intel's and AMD's Dual-Core Processors: A Case Study”, Performance, Computing, and Communications Conference, IPCCC IEEE International April 2007 Page(s):55 – 64.  Kurd, N., Douglas, J., Mosalikanti, P., Kumar, R., “Next generation Intel® micro- architecture (Nehalem) clocking architecture”, VLSI Circuits, 2008 IEEE Symposium on June 2008 Page(s):62 – 63.  Varghese George, Sanjeev Jahagirdar, Chao Tong, Smits, Ken, Satish Damaraju, Siers, Scott, Ves Naydenov, Tanveer Khondker, Sanjib Sarkar, Puneet Singh, “Penryn: 45-nm next generation Intel® core™ 2 processor”, Solid-State Circuits Conference, ASSCC '07. IEEE Asian Nov Page(s):14 – 17.  Chang, J., Ming Huang, Shoemaker, J., Benoit, J., Szu-Liang Chen, Wei Chen, Siufu Chiu, Ganesan, R.; Leong, G., Lukka, V., Rusu, S., Srivastava, D., “The 65-nm 16-MB Shared On-Die L3 Cache for the Dual-Core Intel Xeon Processor 7100 Series”, Solid-State Circuits, IEEE Journal of Volume 42, Issue 4, April 2007 Page(s):846 – 852.  Bin-feng Qian, Li-min Yan, “The research of the inclusive cache used in multi-core processor”, Electronic Packaging Technology & High Density Packaging, ICEPT-HDP International Conference on July 2008 Page(s):1 – 4. Online References:   