Intel Core I7 Pipeline Wei-Tse Sun.

Slides:



Advertisements
Similar presentations
Chapter 3 Embedded Computing in the Emerging Smart Grid Arindam Mukherjee, ValentinaCecchi, Rohith Tenneti, and Aravind Kailas Electrical and Computer.
Advertisements

Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
VLIW Very Large Instruction Word. Introduction Very Long Instruction Word is a concept for processing technology that dates back to the early 1980s. The.
Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.
Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.
THE AMD-K7 TM PROCESSOR Microprocessor Forum 1998 Dirk Meyer.
Superscalar Organization Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.
A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.
April 27, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 23: Putting it all together: Intel Nehalem Krste Asanovic Electrical.
Power Savings in Embedded Processors through Decode Filter Cache Weiyu Tang, Rajesh Gupta, Alex Nicolau.
SYNAR Systems Networking and Architecture Group CMPT 886: Architecture of Niagara I Processor Dr. Alexandra Fedorova School of Computing Science SFU.
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
1 Lecture 10: ILP Innovations Today: handling memory dependences with the LSQ and innovations for each pipeline stage (Section 3.5)
1 Lecture 10: ILP Innovations Today: ILP innovations and SMT (Section 3.5)
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Time-predictability of a computer system Master project in progress By Wouter van der Put.
CS 152 Computer Architecture and Engineering Lecture 23: Putting it all together: Intel Nehalem Krste Asanovic Electrical Engineering and Computer Sciences.
Computer performance.
Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.
By Michael Butler, Leslie Barnes, Debjit Das Sarma, Bob Gelinas This paper appears in: Micro, IEEE March/April 2011 (vol. 31 no. 2) pp 마이크로 프로세서.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Nicolas Tjioe CSE 520 Wednesday 11/12/2008 Hyper-Threading in NetBurst Microarchitecture David Koufaty Deborah T. Marr Intel Published by the IEEE Computer.
JPCM - JDC121 JPCM. Agenda JPCM - JDC122 3 Software performance is Better Performance tuning requires accurate Measurements. JPCM - JDC124 Software.
Comparing Intel’s Core with AMD's K8 Microarchitecture IS 3313 December 14 th.
COMP25212 CPU Multi Threading Learning Outcomes: to be able to: –Describe the motivation for multithread support in CPU hardware –To distinguish the benefits.
Dynamic Pipelines. Interstage Buffers Superscalar Pipeline Stages In Program Order In Program Order Out of Order.
The original MIPS I CPU ISA has been extended forward three times The practical result is that a processor implementing MIPS IV is also able to run MIPS.
SIMULTANEOUS MULTITHREADING Ting Liu Liu Ren Hua Zhong.
Spring 2003CSE P5481 Advanced Caching Techniques Approaches to improving memory system performance eliminate memory operations decrease the number of misses.
CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.
Fetch Directed Prefetching - a Study
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
Computer Structure 2015 – Intel ® Core TM μArch 1 Computer Structure Multi-Threading Lihu Rappoport and Adi Yoaz.
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
1 Lecture 5a: CPU architecture 101 boris.
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
Visit for more Learning Resources
Microbenchmarking the GT200 GPU
Prof. Onur Mutlu Carnegie Mellon University
Simultaneous Multithreading
Multi-core processors
Computer Structure Multi-Threading
The University of Adelaide, School of Computer Science
Architecture & Organization 1
Architecture Background
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Flow Path Model of Superscalars
Advantages of Dynamic Scheduling
Drinking from the Firehose Decode in the Mill™ CPU Architecture
Accelerating Dependent Cache Misses with an Enhanced Memory Controller
Architecture & Organization 1
Levels of Parallelism within a Single Processor
CMPT 886: Computer Architecture Primer
Out-of-Order Commit Processor
Coe818 Advanced Computer Architecture
Lecture 20: OOO, Memory Hierarchy
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
15-740/ Computer Architecture Lecture 14: Prefetching
Lecture: Cache Hierarchies
Levels of Parallelism within a Single Processor
Chip&Core Architecture
Lecture 10: ILP Innovations
Lecture 9: ILP Innovations
What Are Performance Counters?
Presentation transcript:

Intel Core I7 Pipeline Wei-Tse Sun

Core I7 Haswell Launched last year quad-core processor

Pipeline cycle, latency how many instructions we can execute at the same time? branch prediction cache-handle cache misses

Single Thread Pipeline Section working-sharing Step 2 and 3 are independent and can be performed in parallel.

For-loop parallel execution Loop work-sharing The loop can be repeat if iterations have no dependencies between each other.

Out-of-Order Execution core i7 Branch Prediction Icache Tag TLB Cache Tag Icache Data Decode Cache Data Allocation Out-of-Order Execution 1 2 3 4 5 6 7 Improves performance and saves wasted work Handle cache misses in parallel to hide latency Deeper buffers -more instruction parallelism -more resource when running a single thread Power down execution units when not in use More load/store bandwidth -better cache line split latency

Interconnect Improved load balancing to System Agent More efficiently shares resources, low latency Better scheduling Lower power, better efficiency System Agent Core LLC

Cache hierarchy Pipelines handle data & non-data accesses independently. load balancing to system agent. deeper pending queues, better scheduling. large eDRAM Cache -high throughput and low latency

Reference HASWELL: THE FOURTH-GENERATION INTEL CORE PROCESSOR- published by the IEEE Computer Society 2014 4th Generation Intel Core Processor, codenamed Haswell- Per Hammarlund , August 2013 Performance of Parallel Algorithms Using OpenMP- Roman Mego and Tomas Fryza, IEEE 2013 Improving Energy Efficiency through Parallelisation and Vectorisation on Intel core i5 and i7 Processor- Juan M. Cebrian, Jan Christian Meyer and Lasse Natvig, IEEE 2013

Thank you