© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Lecture 4: Microarchitecture: Overview and General Trends.

Slides:



Advertisements
Similar presentations
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Advertisements

Multithreading Processors and Static Optimization Review Adapted from Bhuyan, Patterson, Eggers, probably others.
Lecture 6: Multicore Systems
Multithreading processors Adapted from Bhuyan, Patterson, Eggers, probably others.
Microprocessor Microarchitecture Multithreading Lynn Choi School of Electrical Engineering.
CS 7810 Lecture 16 Simultaneous Multithreading: Maximizing On-Chip Parallelism D.M. Tullsen, S.J. Eggers, H.M. Levy Proceedings of ISCA-22 June 1995.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 19 - Pipelined.
1 Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1)
Instruction Level Parallelism (ILP) Colin Stevens.
1 Lecture 11: ILP Innovations and SMT Today: out-of-order example, ILP innovations, SMT (Sections 3.5 and supplementary notes)
CS 162 Computer Architecture Lecture 10: Multithreading Instructor: L.N. Bhuyan Adopted from Internet.
Chapter Hardwired vs Microprogrammed Control Multithreading
1 Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections )
Chapter 17 Parallel Processing.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 3.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
1 Lecture 12: ILP Innovations and SMT Today: ILP innovations, SMT, cache basics (Sections 3.5 and supplementary notes)
How Multi-threading can increase on-chip parallelism
1 Lecture 10: ILP Innovations Today: ILP innovations and SMT (Section 3.5)
Multi-core Processing The Past and The Future Amir Moghimi, ASIC Course, UT ECE.
Hyper-Threading, Chip multiprocessors and both Zoran Jovanovic.
8 – Simultaneous Multithreading. 2 Review from Last Time Limits to ILP (power efficiency, compilers, dependencies …) seem to limit to 3 to 6 issue for.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
CPE 631: Multithreading: Thread-Level Parallelism Within a Processor Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.
Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: –Discretize the.
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
– Mehmet SEVİK – Yasin İNAĞ
CASH: REVISITING HARDWARE SHARING IN SINGLE-CHIP PARALLEL PROCESSOR
A few issues on the design of future multicores André Seznec IRISA/INRIA.
André Seznec Caps Team IRISA/INRIA 1 High Performance Microprocessors André Seznec IRISA/INRIA
SIMULTANEOUS MULTITHREADING Ting Liu Liu Ren Hua Zhong.
© Wen-mei Hwu and S. J. Patel, 2005 ECE 412, University of Illinois Lecture Instruction Execution: Dynamic Scheduling.
Computer Architecture Lec 10 –Simultaneous Multithreading.
RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors.
Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University.
Lecture 19 Beyond Low-level Parallelism. 2 © Wen-mei Hwu and S. J. Patel, 2002 ECE 412, University of Illinois Outline Models for exploiting large grained.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
E6200, Fall 07, Oct 24Ambale: CMP1 Bharath Ambale Venkatesh 10/24/2007.
1 Lecture: SMT, Cache Hierarchies Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1)
Computer Structure 2015 – Intel ® Core TM μArch 1 Computer Structure Multi-Threading Lihu Rappoport and Adi Yoaz.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
15-740/ Computer Architecture Lecture 12: Issues in OoO Execution Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011.
Modern general-purpose processors. Post-RISC architecture Instruction & arithmetic pipelining Superscalar architecture Data flow analysis Branch prediction.
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University.
Simultaneous Multithreading CMPE 511 BOĞAZİÇİ UNIVERSITY.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
Microarchitecture.
15-740/ Computer Architecture Lecture 21: Superscalar Processing
Prof. Onur Mutlu Carnegie Mellon University
Simultaneous Multithreading
5.2 Eleven Advanced Optimizations of Cache Performance
Hyperthreading Technology
Lecture: SMT, Cache Hierarchies
Levels of Parallelism within a Single Processor
Computer Architecture Lecture 4 17th May, 2006
Lecture: SMT, Cache Hierarchies
CPE 631: Multithreading: Thread-Level Parallelism Within a Processor
Levels of Parallelism within a Single Processor
8 – Simultaneous Multithreading
ECE 721, Spring 2019 Prof. Eric Rotenberg.
Presentation transcript:

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Lecture 4: Microarchitecture: Overview and General Trends

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Outline Microarchitecture –State of the art –Future trends, Ronen et al.

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Microarchitecture: Overview Instruction Supply Execution Mechanism Data Supply Highest performance means generating the highest instruction and data bandwidth you can, and effectively consuming that bandwidth in execution – paraphrased from M. Alsup, AMD Fellow

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Microarchitecture, 1990 Short pipelines On-chip I and D Caches, blocking Simple prediction

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Microarchitecture, 2000 Mechanisms to find parallel instructions –dynamic scheduling –static scheduling On-chip cache hierarchies, with non-blocking, higher-bandwidth caches Sophisticated branch prediction

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Future Microarchitecture: One Perspective Instruction Supply Execution Mechanism Data Supply Highest performance means generating the highest instruction and data bandwidth you can, and effectively consuming that bandwidth in execution – paraphrased from M. Alsup, AMD Fellow

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Where are we headed? [with influences from Ronen et al] More ILP : Even wider, deeper –enabling technology: speculation, predication, compiler transformations, binary re-optimization, complexity effective design Multithreading –enabling technology: speculation, subordinate threads, discovery of thread-level parallelism Chip Multiprocessors –enabling technology: speculation, discovery of thread-level, course-grained parallelism

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois More ILP Instruction Supply –Branches, cache misses, partial fetches Data Supply –Higher bandwidth, lower latency, memory ordering, non-blocking caches Execution –Reduction of redundant work, design complexity and partitioning Tolerating Latency –Can some things just take a long time?

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Multithreading [Burton Smith, 1978] Fetch Execute WriteBack This is a snapshot of the pipeline during a single cycle. Each color represents instructions from a different thread. B. Smith’s original concept was for a single-wide pipeline, but extends naturally to a multiple issue pipeline.

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Simultaneous Multithreadiing [W. Yamamoto, 1994/D. Tullsen, 1995] Fetch Execute WriteBack

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Simultaneous Multithreading, possible implementation Front EndBack End Intel Hyperthreading in Pentium 4 [HotChips’14] is first realization with two threads Small ISA register file minimizes effect of replication Replicated retirement logic Minimal hardware overhead but major increase in verification cost

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Chip Multiprocessor [K. Olukotun, 1996] Fetch Execute WriteBack ProcA Shared L2 Cache ProcC ProcDProcB Single processor die contains multiple CPUs all of which share some amount of resources, such as an L2 cache and chip pins.

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Hardware Accelerators

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Existing Solutions … Intel IXP1200 Network Processor Philips Nexperia (Viper) ARM MICRO- ENGINES ACCESS CTL. MIPS MPEG VLIW VIDEO MSP IBM Cell … what’s next? …

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Strategic Memory Data Delivery Data transfer network managed by memory transfer module (MTM) –A smart, global manager –Strategic allocation of network bandwidth –Has some idea of data priority in the application –Scalability challenges exist Work hand-in-hand with compartmentalization

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Adding Point-to-Point Communication Neighbor-to-neighbor interconnects added –Explicitly scheduled communication –Tight coupling between processing elements

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Discussion/Thought Exercise What are the essential differences between the SMT model of execution and the CMP model? –What resources are shared and in what manner? –What type of data movement exists in one but not others? –What types of applications/situations are the best case situations for each model?