ECE 721, Spring 2019 Prof. Eric Rotenberg.

Slides:



Advertisements
Similar presentations
Machine cycle.
Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution
1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design.
CIS 501: Comp. Arch. | Prof. Joe Devietti | Superscalar1 CIS 501: Computer Architecture Unit 8: Superscalar Pipelines Slides developed by Joe Devietti,
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Superscalar Organization Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
1 Lecture 11: ILP Innovations and SMT Today: out-of-order example, ILP innovations, SMT (Sections 3.5 and supplementary notes)
1 Lecture 19: Core Design Today: issue queue, ILP, clock speed, ILP innovations.
CS 152 Computer Architecture & Engineering Andrew Waterman University of California, Berkeley Section 8 Spring 2010.
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
1 Lecture 10: ILP Innovations Today: ILP innovations and SMT (Section 3.5)
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
COMP381 by M. Hamdi 1 Commercial Superscalar and VLIW Processors.
Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
Dynamic Pipelines. Interstage Buffers Superscalar Pipeline Stages In Program Order In Program Order Out of Order.
1 CPRE 585 Term Review Performance evaluation, ISA design, dynamically scheduled pipeline, and memory hierarchy.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
Pentium III Instruction Stream. Introduction Pentium III uses several key features to exploit ILP This part of our presentation will cover the methods.
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
CSE431 L13 SS Execute & Commit.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 13: SS Backend (Execute, Writeback & Commit) Mary Jane.
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
Dynamic Scheduling Why go out of style?
Prof. Hsien-Hsin Sean Lee
15-740/ Computer Architecture Lecture 21: Superscalar Processing
Simultaneous Multithreading
PowerPC 604 Superscalar Microprocessor
Prof. Onur Mutlu Carnegie Mellon University
Lecture: Out-of-order Processors
CS203 – Advanced Computer Architecture
Flow Path Model of Superscalars
Instructional Parallelism
High-level view Out-of-order pipeline
Computer Architecture Lecture 3 – Part 1 11th May, 2006
CS 152 Computer Architecture & Engineering
Computer Architecture Lecture 3
The Microarchitecture of the Pentium 4 processor
Superscalar Pipelines Part 2
Levels of Parallelism within a Single Processor
Lecture 11: Memory Data Flow Techniques
Lecture 17: Core Design Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed.
Ka-Ming Keung Swamy D Ponpandi
Lecture 20: OOO, Memory Hierarchy
Lecture 20: OOO, Memory Hierarchy
Instruction Level Parallelism (ILP)
Lecture 19: Core Design Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed.
Instruction-Level Parallelism (ILP)
Overview Prof. Eric Rotenberg
Mattan Erez The University of Texas at Austin
Additional ILP Topics Prof. Eric Rotenberg
Levels of Parallelism within a Single Processor
pipelining: data hazards Prof. Eric Rotenberg
High-level view Out-of-order pipeline
Lecture 1 An Overview of High-Performance Computer Architecture
Ka-Ming Keung Swamy D Ponpandi
Spring 2019 Prof. Eric Rotenberg
ECE 721 Alternatives to ROB-based Retirement
Handling Stores and Loads
Spring’19 Prof. Eric Rotenberg
Sizing Structures Fixed relations Empirical (simulation-based)
ECE 721 Modern Superscalar Microarchitecture
Spring 2019 Prof. Eric Rotenberg
Dynamic Scheduling Physical Register File ready bits Issue Queue (IQ)
Presentation transcript:

ECE 721, Spring 2019 Prof. Eric Rotenberg

ECE 721, Spring 2019 Prof. Eric Rotenberg

ECE 721, Spring 2019 Prof. Eric Rotenberg

Canonical Pipeline Stages Frontend stages Instructions are physically in-order Fetch, Decode, Rename, Dispatch Backend stages Instructions are physically out-of-order Schedule, Register Read, Execute, Writeback Retire Instructions are processed in-order from Active List ECE 721, Spring 2019 Prof. Eric Rotenberg

Sub-pipelining Each canonical pipeline stage may be sub-pipelined deeper ECE 721, Spring 2019 Prof. Eric Rotenberg

Challenges for Wide Superscalar Fetch width limiters: Taken branches Multiple branch prediction Dependencies within rename bundle Large, highly-ported structures: Ports scale linearly with superscalar width Sizes scale superlinearly with width, to expose sufficient ILP IQ, LQ, and SQ are also associative structures with other specialized logic (e.g., select logic) Bypass complexity ECE 721, Spring 2019 Prof. Eric Rotenberg

Challenges for Latency-Tolerant Superscalar Very large PRF and Active List (ROB) needed for memory latency tolerance. Example: 4-wide superscalar 200-cycle miss penalty PRF / Active List must have ~800 entries to not block on miss in last-level cache. ECE 721, Spring 2019 Prof. Eric Rotenberg