Original Authors: Stefan Rusu, Simon Tam, Harry Muljono, Jason Stinson, David Ayers, Jonathan Chang, Raj Varada, Matt Ratta, Sailesh Kottapalli Some slides.

Slides:



Advertisements
Similar presentations
EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.
Advertisements

1 A Dual-Core Multi-Threaded Xeon® Processor with 16MB L3 Cache Stefan Rusu, Simon Tam, Harry Muljono, David Ayers, Jonathan Chang (Intel, Santa Clara,
HISTORY OF MICROPROCESSORS
Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.
Intel Xeon Nehalem Architecture Billy Brennan Christopher Ruiz Kay Sackey.
Pentium microprocessors CAS 133 – Basic Computer Skills/MS Office CIS 120 – Computer Concepts I Russ Erdman.
Better answers The Alpha and Microprocessors: Continuing the Performance Lead Beyond Y2K Shubu Mukherjee, Ph.D. Principal Hardware Engineer.
Chapter 1 An Introduction To Microprocessor And Computer
The First Microprocessor By: Mark Tocchet and João Tupinambá.
The AMD Athlon ™ Processor: Future Directions Fred Weber Vice President, Engineering Computation Products Group.
Computers Organization & Assembly Language Chapter 1 THE 80x86 MICROPROCESSOR.
Nov COMP60621 Concurrent Programming for Numerical Applications Lecture 6 Chronos – a Dell Multicore Computer Len Freeman, Graham Riley Centre for.
OPTERON (Advanced Micro Devices). History of the Opteron AMD's server & workstation processor line 2003: Original Opteron released o 32 & 64 bit processing.
INTEL COREI3 INTEL COREI5 INTEL COREI7 Maryam Zeb Roll#52 GFCW Peshawar.
Microprocessors I Time: Sundays & Tuesdays 07:30 to 8:45 Place: EE 4 ( New building) Lecturer: Bijan Vosoughi Vahdat Room: VP office, NE of Uni Office.
1 Microprocessor-based Systems Course 4 - Microprocessors.
Processor history / DX/SX SX/DX Pentium 1997 Pentium MMX
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 4: Computer Performance.
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
1 CS402 PPP # 1 Computer Architecture Evolution. 2 John Von Neuman original concept.
Cosc 2150 Current CPUs Intel and AMD processors. Notes The information is current as of Dec 5, 2014, unless otherwise noted. The information for this.
7-Aug-15 (1) CSC Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32.
Intel® 64-bit Platforms Platform Features. Agenda Introduction and Positioning of Intel® 64-bit Platforms Intel® 64-Bit Xeon™ Platforms Intel® Itanium®
111 *Other names and brands may be claimed as the property of others Q Sell Up Guide Intel ® Core™ i7 (Bloomfield) vs. Lynnfield Positioning Intel.
Lect 13-1 Lect 13: and Pentium. Lect Microprocessor Family  Microprocessor  Introduced in 1989  High Integration  On-chip 8K.
COMPUTER ARCHITECTURE
CS 152 Computer Architecture and Engineering Lecture 23: Putting it all together: Intel Nehalem Krste Asanovic Electrical Engineering and Computer Sciences.
Computer performance.
LOGO. Characteristics of Processors  Funtions  Is the central processing unit, performing all the processing, calculation and control systems.  The.
Computer Organization CSC 405 Bus Structure. System Bus Functions and Features A bus is a common pathway across which data can travel within a computer.
Microprocessors Chapter 1 powered by dj1. Slide 2 of 66Chapter 1 Objectives  Discuss the working of microprocessor  Discuss the various interfaces of.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
بسم الله الرحمن الرحيم QPI and PCI. INTRODUCTION  Short for Peripheral Component Interconnect, PCI was introduced by Intel in The PCI bus Came.
Current Computer Architecture Trends CE 140 A1/A2 29 August 2003.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
History of Microprocessor MPIntroductionData BusAddress Bus
AMD Athlon 64 FX-55 PROCESSOR ARCHITECTURE
Srihari Makineni & Ravi Iyer Communications Technology Lab
ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.
1 Class Presentation For Advanced VLSI Course Professor: Dr. S. M. Fakhraie Presented by: Sayyed Hassan Sohofi Major Reference: A 0.13µm Triple-Vt 9MB.
1 Latest Generations of Multi Core Processors
Evolution of Microprocessors Microprocessor A microprocessor incorporates most of all the functions of a computer’s central processing unit on a single.
Computer Architecture By Chris Van Horn. CPU Basics “Brains of the Computer” Fetch Execute Cycle Instruction Branching.
Hyper Threading Technology. Introduction Hyper-threading is a technology developed by Intel Corporation for it’s Xeon processors with a 533 MHz system.
Computer Architecture Introduction Lynn Choi Korea University.
Microprocessor Microarchitecture Introduction Lynn Choi School of Electrical Engineering.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
PROCESSOR Ambika | shravani | namrata | saurabh | soumen.
I7’s Core. Intel’s Core i7 Content Overview Socket SSE 4.2 Instruction Set Cores –Intel Quickpath Interconnect –Nehalem - new micro-architecture –EP,
Hewlett-Packard PA-RISC Bit Processors: History, Features, and Architecture Presented By: Adam Gray Christie Kummers Joshua Madagan.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
Central Processing Unit (CPU) The Computer’s Brain.
The Pentium Series CS 585: Computer Architecture Summer 2002 Tim Barto.
FPGA Technology Overview Carl Lebsack * Some slides are from the “Programmable Logic” lecture slides by Dr. Morris Chang.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice ProLiant G5 to G6 Processor Positioning.
Hardware Architecture
1© Copyright 2015 EMC Corporation. All rights reserved. NUMA(YEY) BY JACOB KUGLER.
Intel and AMD processors
Multiple Processor Systems
Multiprocessing.
HISTORY OF MICROPROCESSORS
Intel’s Core i7 Processor
HISTORY OF MICROPROCESSORS
A Comprehensive Study of Intel Core i3, i5 and i7 family
Unit 2 Computer Systems HND in Computing and Systems Development
Hyperthreading Technology
Intel Xeon Nehalem Architecture
Presentation transcript:

Original Authors: Stefan Rusu, Simon Tam, Harry Muljono, Jason Stinson, David Ayers, Jonathan Chang, Raj Varada, Matt Ratta, Sailesh Kottapalli Some slides are included from original paper only for educational purposes

Outline Introduction –Xeon Family –Xeon in Supercomputing Overview of Nehalem Architecture –Pipeline –Quick Path Interconnect Nehalem based Xeon –Platforms Configurations –Clock Domains –Clock Skews

Introduction Wikipedia -> The Xeon is a brand of multiprocessing-capable x86 microprocessors from Intel mainly targeted at the server, workstation and embedded system markets.

Xeon Family [2] Current Xeon Generations: –Xeon3000 Entry and small business Single processor servers –Xeon5000 Versatile data center 1 to 2 processor servers –Xeon processor servers –Xeon7000 Powerful enterprise 2 to 256 processor server

Xeon in Supercomputing [3] Top500.org is an organization ranks supercomputers all around the world according to GFLOPS Xeon owns 64% (391/500) of supercomputers Nehalem 45nm Nehalem 32nm Core 45nm Core 65nm 55% 15% 26% 4%

Overview of Nehalem Architecture [4] Introduced with Intel Core i7 Nehalem Overall Features: –2 up to 8 core –Optional Hyper-threading –L1 and L2 cache per core, shared L3 –Integrated Memory Controller –Quick Path Interconnect –Optional Turbo Boost Nehalem Die-Shot [5]

Overview of Nehalem Architecture [5] Nehalem Pipeline Second level of Virtual Address translation Out-of-order execution. Up to 6 insn/clk

Overview of Nehalem Architecture [4] QPI and IMC: –Motivation? High bandwidth demand in Multiprocessor systems: Processor-IO, Processor-Processor and Processor-Memory Front Side Bus versus Quick Path Interconnect [5]

Overview of Nehalem Architecture [4] Quick Path Interconnect: –Features Connects a microprocessor to IO or other microprocessor Point-To-Point link –Eliminates shared bus problems Up to 25GByte/second (vs 10GB/s FSB) High RAS (reliability, availability and serviceability) –CRC check with no cycles penalty –Self-healing link –Clock fail-over

Platform Configuration in Multiprocessor Systems 2 Processor [1] 4 Processor [1] 8 Processor [1] 4-QPI per CPU

Nehalem in Xeon Processor [6] 8-Core Xeon Die-shot

Nehalem in Xeon Processor [1] 8-Core Xeon Floorplan

Clock Domains [1] 3 primary clock domains: Core Un-core I/O System clock buffer that generates 133MHz Interfaces to BCLK and delivers low-noise reference clock to all 16 PLLs Enabling independent clock frequency for the core which is coefficient of BCLK and highly synchronized with it PLLs are controlled by On-chip PCU (power Control Unit) Controlling is done according to gathered data from sensors

Clock Domains [1] QPI PLLs adapting Processor-to-Processor or Processor-to-IO frequency MI PLLs adapting Processor-to-Memory frequency

Simulated Un-Core clock skew profile [1] Simulation based on 100% layout extracted model

Future Works

References [1] Stefan Rusu et al; 45nm 8-Core Enterprise Xeon® Processor; ISSCC 2009; page [2] [3] [4] Intel Next Generation Microarchitecture (Nehalem) White Paper [5] [6] Die-Shot-1.jpg

The End Any Question?

Overview of Nehalem Architecture [4] Nehalem core benefits: –Larger out-of-order window –Faster Handling of branch misprediction –More accurate branch prediction: Second-level BTB –Better Hyper-threading: Larger cache and bandwidth L3 Cache QPI [6]

Intel Codenames Intel has historically named integrated circuit (IC) development projects after geographical names of towns, rivers or mountains near the location of the Intel facility responsible for the IC. Codenames usually mapping to many marketing names Latest architecture of Intel microprocessors named Nehalem (Nomenclature: The Nehalem River in Oregon, or possibly the town of Nehalem in Tillamook County, Oregon)

Xeon Family [2] Xeon 3000 –45nm technology Processor Number Intel® QPI Speed or Front Side Bus L3 Cache Base Frequency max Turbo Frequency Power Number of Cores Number of Threads X3480 8MB3.06 GHz3.73 GHz95 W48 X3470 8MB2.93 GHz3.6 GHz95 W48 X3460 8MB2.8 GHz3.46 GHz95 W48 X3450 8MB2.66 GHz3.2 GHz95 W48 X3440 8MB2.53 GHz2.93 GHz95 W48 X3430 8MB2.4 GHz2.8 GHz95 W44 W GT/s8MB3.33 GHz3.6 GHz130 W48 W GT/s8MB3.2 GHz3.46 GHz130 W48 W GT/s8MB3.2 GHz3.46 GHz130 W48 W GT/s8MB3.06 GHz3.33 GHz130 W48 W GT/s8MB2.93 GHz3.2 GHz130 W48 W GT/s8MB2.8 GHz3.06 GHz130 W48 W GT/s8MB2.66 GHz2.93 GHz130 W48 W GT/s4MB2.53 GHz 130 W22 LC3528 4MB1.73 GHz2.133 GHz35 W24 LC3518 2MB1.73 GHz 23 W11 L3426 8MB1.86 GHz3.2 GHz45 W48

Xeon Family [2] Xeon 5000 –45nm technology Processor Number Intel® QPI Speed or Front Side Bus L3 Cache Base Frequency max Turbo Frequency Powe r Number of Cores Number of Threads X GT/s8MB2.93 GHz 3.33 Ghz95 W48 X GT/s8MB2.8 GHz 3.20 Ghz95 W48 X GT/s8MB2.66 GHz 3.06 Ghz95 W48 L GT/s8MB2.4 GHz 2.4 Ghz60 W48 L GT/s8MB2.26 GHz 2.53 Ghz60 W48 L GT/s8MB2.13 GHz 2.40 Ghz60 W48 L GT/s8MB2 GHz 2.40 Ghz38 W24 L GT/s4MB2.13 GHz N/A60 W44 E GT/s8MB2.53 GHz 2.80 Ghz80 W48 E GT/s8MB2.4 GHz 2.66 Ghz80 W48 E GT/s8MB2.26 GHz 2.53 Ghz80 W48 E GT/s4MB2.26 GHz N/A80 W44 E GT/s4MB2.13 GHz N/A80 W44 E GT/s4MB2 GHz N/A80 W44 E GT/s4MB2 GHz N/A80 W22 E GT/s4MB1.86 GHz N/A80 W22

Xeon Family [2] Xeon 6000 –45nm technology Processor Number Intel® QPI Speed or Front Side Bus L3 Cache Base Frequency max Turbo Frequency Power Number of Cores Number of Threads X GT/s18MB2 GHz2.4 GHz130 W816 E GT/s18MB2 GHz2.266 GHz105 W612 E GT/s12MB1.73 GHz1.733 GHz105 W48

Xeon Family [2] Xeon 7000 –45nm technology Processor Number Intel® QPI Speed or Front Side Bus L3 Cache Base Frequency max Turbo Frequency Power Number of Cores Number of Threads X GT/s24MB2.266 GHz2.666 GHz130 W816 X GT/s18MB2 GHz2.4 GHz130 W816 X GT/s18MB2.666 GHz2.8 GHz130 W66 X MHz16MB2.66 GHzN/A130 W66 L GT/s24MB1.866 GHz2.533 GHz95 W816 L GT/s18MB1.866 GHz2.533 GHz95 W612 L MHz12MB2.13 GHzN/A65 W66 L MHz12MB2.13 GHzN/A50 W44 E GT/s18MB2 GHz2.266 GHz105 W612 E GT/s12MB1.866 GHz2.133 GHz105 W612 E GT/s18MB1.866 GHz 95 W48 E MHz12MB2.4 GHzN/A90 W66 E MHz16MB2.4 GHzN/A90 W44 E MHz12MB2.13 GHzN/A90 W44 E MHz8MB2.13 GHzN/A90 W44