Multi-core Processing The Past and The Future Amir Moghimi, ASIC Course, UT ECE.

Slides:



Advertisements
Similar presentations
CSE431 Chapter 7A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 7A: Intro to Multiprocessor Systems Mary Jane Irwin (
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
Chapter 4 Advanced Pipelining and Intruction-Level Parallelism Computer Architecture A Quantitative Approach John L Hennessy & David A Patterson 2 nd Edition,
Microprocessor Microarchitecture Multithreading Lynn Choi School of Electrical Engineering.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 19 - Pipelined.
Instruction Level Parallelism (ILP) Colin Stevens.
Review: Multiprocessor Basics
Multithreading and Dataflow Architectures CPSC 321 Andreas Klappenecker.
©UCB CS 162 Computer Architecture Lecture 1 Instructor: L.N. Bhuyan
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 3.
How Multi-threading can increase on-chip parallelism
Chapter 7 Multicores, Multiprocessors, and Clusters.
Parallel Computer Architectures
7-Aug-15 (1) CSC Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32.
Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.
Computer System Architectures Computer System Software
Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Lecture 30Fall 2006 Computer Architecture Fall 2006 Lecture 30. CMPs & SMTs Adapted from Mary Jane Irwin ( ) [Adapted.
University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.
Introduction CSE 410, Spring 2008 Computer Systems
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
1 Multi-core processors 12/1/09. 2 Multiprocessors inside a single chip It is now possible to implement multiple processors (cores) inside a single chip.
A Gentler, Kinder Guide to the Multi-core Galaxy Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech Guest lecture for ECE4100/6100.
[Tim Shattuck, 2006][1] Performance / Watt: The New Server Focus Improving Performance / Watt For Modern Processors Tim Shattuck April 19, 2006 From the.
Nicolas Tjioe CSE 520 Wednesday 11/12/2008 Hyper-Threading in NetBurst Microarchitecture David Koufaty Deborah T. Marr Intel Published by the IEEE Computer.
Hyper-Threading Technology Architecture and Micro-Architecture.
Tahir CELEBI, Istanbul, 2005 Hyper-Threading Technology Architecture and Micro-Architecture Prepared by Tahir Celebi Istanbul, 2005.
Niagara: a 32-Way Multithreaded SPARC Processor
Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.
CASH: REVISITING HARDWARE SHARING IN SINGLE-CHIP PARALLEL PROCESSOR
Hyper Threading Technology. Introduction Hyper-threading is a technology developed by Intel Corporation for it’s Xeon processors with a 533 MHz system.
1 Processor Architecture Jurij Silc, Borut Robic, Theo Ungerer.
.1 Multiprocessor on a Chip & Simultaneous Multi-threads [Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005]
Anshul Kumar, CSE IITD Other Architectures & Examples Multithreaded architectures Dataflow architectures Multiprocessor examples 1 st May, 2006.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Thread Level Parallelism Since ILP has inherent limitations, can we exploit multithreading? –a thread is defined as a separate process with its own instructions.
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
Computer Architecture Introduction Lynn Choi Korea University.
Microprocessor Microarchitecture Introduction Lynn Choi School of Electrical Engineering.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
CSC 7080 Graduate Computer Architecture Lec 8 – Multiprocessors & Thread- Level Parallelism (3) – Sun T1 Dr. Khalaf Notes adapted from: David Patterson.
.1 Multiprocessor on a Chip & Simultaneous Multi-threads [Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005]
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
Introduction CSE 410, Spring 2005 Computer Systems
CSE431 L28 CMP&SMT.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 28. CMPs & SMTs Mary Jane Irwin ( )
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
Microprocessor Microarchitecture Introduction
Lynn Choi School of Electrical Engineering
Multi-core processors
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Hyperthreading Technology
Levels of Parallelism within a Single Processor
Computer Architecture Lecture 4 17th May, 2006
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
CSC3050 – Computer Architecture
Levels of Parallelism within a Single Processor
Chapter 4 Multiprocessors
Advanced Architecture +
CS 286 Computer Organization and Architecture
8 – Simultaneous Multithreading
Presentation transcript:

Multi-core Processing The Past and The Future Amir Moghimi, ASIC Course, UT ECE

The Past Instruction Level Parallelism (ILP) Enhanced Processors Wide Dynamic Execution [1] with techniques such as: speculative execution (using branch prediction) out of order execution (using register renaming and reservation stations) super scalar (using multiple-issue instruction cache and reorder buffer) e.g. Intel P6 Micro-architecture Used in: Pentium® Pro processor, Pentium® II processor and Pentium® III processors

ILP Limitations Window size limitation [2] due to: 2450 comparisons for register dependency detection among 50 instructions in one clock cycle! A branch instruction every 5 instructions on average Imperfect branch prediction Serial nature of the application with true data dependencies So, how to use this huge amount of silicon coming every year and a half? Use multiple cores on a single die

Multi-core Basics A multi-core chip is one which combines two or more independent processing cores into a single die (also known as Chip Multi-Processor) Four main questions arise [3]: How the application is developed? How do they share data? How do they physically communicate? How scalable is the architecture?

Given Answers For parallel application development, use the thread concept formerly proposed for discrete multi-processor systems # of Proc Communication model Message passing8 to 2048 Shared address NUMA8 to 256 UMA2 to 64 Physical connection Network8 to 256 Bus2 to 36 [3]

Chip Level Multi-threading Implemented in superscalar processors before introducing multi-core chips Multi-threading Methods: Fine-grained Coarse-grained Simultaneous MT e.g. Intel HyperThreading Technology

4-way Threading Processor [3] Thread AThread B Thread CThread D Time → Issue slots → SMTFine MTCoarse MT

Now Multi-core Processing A simple look at a multi-core processor (IBM Xenon used in MS-Xbox 360) Simple but effective Core 0 L1D L1I Core 1 L1D L1I Core 2 L1D L1I 1MB UL2 [4]

A More Powerful Design STI Cell (used in PS3) [8]

A Comparison Sun UltraSPARC T1 [5] 4-way MT SPARC pipe Crossbar 4-way banked L2 Memory controllers I/O shared funcs [3]

UltraSPARC T1 vs. Pentium EE [5]

UltraSPARC T1 vs. Pentium EE Performance Comparison running SPEC JBB 2000, TPC-C, TPC-W, and XML Test as server benchmarks and SPEC CPU2000 as the serial benchmark [5] Pentium Extreme Edition Die Photo [5]

Now the Trend Intel will deliver a quad-core (4 full execution cores) processor in the first quarter of 2007 [1] “We forecast that more than 85 percent of our server processors and more than 70 percent of our mobile and desktop Pentium® family processor shipments will be multi-core–based by the end of 2006” [7] Intel plans to have 32 cores on a die till 2015 [7] But do not forget the high power density and memory bandwidth issues!

Thanks Any questions?

References John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach 2nd Edition. Morgan Kaufmann, PSU CSE 431, Mary Jane Irwin, Computer Architecture, Fall 2005, Lecture lnxw09XBoxDesignhttp://www-128.ibm.com/developerworks/power/library/pa-fpfxbox/?ca=dgr- lnxw09XBoxDesign 5. kalender/23b3b3ee6c4e487e6f4205fa03e783bc.0.0/Niagara_CMT.pdf 6. James Laudon: Performance/Watt: the new server focus. SIGARCH Computer Architecture News 33(4): 5-13 (2005)