Microarchitecture of Superscalars (5) Dynamic Instruction Issue Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007.

Slides:



Advertisements
Similar presentations
Scoreboarding & Tomasulos Approach Bazat pe slide-urile lui Vincent H. Berk.
Advertisements

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.
Microarchitecture of Superscalars (3) Branch Prediction Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 19, 2005 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
A scheme to overcome data hazards
Instruction Level Parallelism 2. Superscalar and VLIW processors.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
Superscalar Organization Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.
8 Processing of control transfer instructions TECH Computer Science 8.1 Introduction 8.2 Basic approaches to branch handling 8.3 Delayed branching 8.4.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
Microarchitecture of Superscalars (7) Preserving sequential consistency Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 14, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
EEL 5708 Speculation. Branch prediction. Superscalar processors. Lotzi Bölöni.
DAP Spr.‘98 ©UCB 1 Lecture 6: ILP Techniques Contd. Laxmi N. Bhuyan CS 162 Spring 2003.
National & Kapodistrian University of Athens Dep.of Informatics & Telecommunications MSc. In Computer Systems Technology Advanced Computer Architecture.
1 Microprocessor-based Systems Course 4 - Microprocessors.
CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.
Mult. Issue CSE 471 Autumn 011 Multiple Issue Alternatives Superscalar (hardware detects conflicts) –Statically scheduled (in order dispatch and hence.
1 Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections )
CIS 629 Fall 2002 Multiple Issue/Speculation Multiple Instruction Issue: CPI < 1 To improve a pipeline’s CPI to be better [less] than one, and to utilize.
The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.
Computer ArchitectureFall 2007 © October 29th, 2007 Majd F. Sakr CS-447– Computer Architecture.
The PowerPC Architecture  IBM, Motorola, and Apple Alliance  Based on the IBM POWER Architecture ­Facilitate parallel execution ­Scale well with advancing.
ECE/CS 552: Introduction to Superscalar Processors Instructor: Mikko H Lipasti Fall 2010 University of Wisconsin-Madison Lecture notes partially based.
Lecture 8 Shelving in Superscalar Processors (Part 1)
Microarchitecture of Superscalars (4) Decoding Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007.
Evolution of the ILP Processing Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007.
Computer Architecture Computer Architecture Superscalar Processors Ola Flygt Växjö University +46.
1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.
Complexity-Effective Superscalar Processors S. Palacharla, N. P. Jouppi, and J. E. Smith Presented by: Jason Zebchuk.
Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Issue and Despatch 23rd Jan, 2006.
1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.
Trace cache and Back-end Oper. CSE 4711 Instruction Fetch Unit Using I-cache I-cache I-TLB Decoder Branch Pred Register renaming Execution units.
CS5222 Advanced Computer Architecture Part 3: VLIW Architecture
A. Moshovos ©ECE Fall ‘07 ECE Toronto Out-of-Order Execution Structures.
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
15-740/ Computer Architecture Lecture 12: Issues in OoO Execution Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011.
Samira Khan University of Virginia Feb 9, 2016 COMPUTER ARCHITECTURE CS 6354 Precise Exception The content and concept of this course are adapted from.
Microarchitecture of Superscalars (6) Register renaming Dezső Sima Spring 2008 (Ver. 2.0)  Dezső Sima, 2008.
CS203 – Advanced Computer Architecture ILP and Speculation.
15-740/ Computer Architecture Lecture 7: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University.
Precise Exceptions and Out-of-Order Execution
Design of Digital Circuits Lecture 18: Out-of-Order Execution
COMP 740: Computer Architecture and Implementation
CS161 – Design and Architecture of Computer Systems
PowerPC 604 Superscalar Microprocessor
Out of Order Processors
Dynamic Scheduling and Speculation
Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 2/21/2014
Flow Path Model of Superscalars
Power-Aware Operand Delivery
I. Evolution of the ILP Processing
Out of Order Processors
Superscalar Processors & VLIW Processors
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Adapted from the slides of Prof
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
7. Microarchitecture of Superscalars (5) Dynamic Instruction Issue
Reduction of Data Hazards Stalls with Dynamic Scheduling
Adapted from the slides of Prof
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011
1. Evolution of ILP-processing
Microarchitecture of Superscalars (4) Decoding
15-740/ Computer Architecture Lecture 10: Out-of-Order Execution
Chapter 3: ILP and Its Exploitation
Prof. Onur Mutlu Carnegie Mellon University
CSL718 : Superscalar Processors
Lecture 7 Dynamic Scheduling
Presentation transcript:

Microarchitecture of Superscalars (5) Dynamic Instruction Issue Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007

Overview 1 The principle of dynamic instruction issue 2 Design space 5 Case examples 2.1 Overview 2.2 Types of issue buffers 2.3 Operand fetch policies 4 Implementation of dynamic instruction issue in superscalars 4.1 The introduction of dynamic instruction issue 4.2 Basic implementation schemes 3 Principle of operation of dynamic instruction issue 3.1 Dispatch bound operand fetching 3.2 Issue bound operand fetching

1. Principle of dynamic instruction issue (1) Aim: To eliminate the issue bottleneck of early (first generation) supercalars

1. Principle of dynamic instruction issue (2) The issue bottleneck (b): The issue process(a): Simplified structure of the mikroarchitecture assuming unbuffered issue Figure 1.1: The principle of dynamic instruction issue Icache I-buffer Instr. window (3) Decode, check, issue Dependent instructions block instruction issue EU Issue EU

1. Principle of dynamic instruction issue (3) Figure 1.2: Principle of dynamic instruction issue (b): The issue process(a): Simplified structure of the mikroarchitecture assuming buffered issue (shelving) Eliminating the issue bottleneck Dynamic instruction issue (shelving, buffered issue)

Layout of the issue buffers Scope of dynamic instr. issue Instruction issue scheme Dynamic instruction issue Operand fetch policy 2. Design space of dynamic instruction issue 2.1 Overview Types of issue buffers

2.2 Types of issue buffers Reservation stations (RS) Issue buffers in the ROB Types of issue buffers Individual RSs Central RS Group RSs RS FX EU RS FP EU FX EU RS FP EU FX EU FP EU Power1 (1990) PowerPC 603 (1993) PowerPC 604 (1995) Power4 (2001) Power5 (2004) K5 (1995) K7 (1999), K8 (2003) RS FX EU FX EU RS FX EU FP FX EU ES/9000 (1992) Power2 (1993) R10000 (1996) PM1(Sparc64)(1995) Alpha (1997) Pentium Pro (1995) Pentium II (1997) Pentium III (1999) Pentium IV (2000) Pentium M (2003) Core (2006) Lightning (1991)p K6 (1997)

Layout of the issue buffers Scope of buffered issue Instruction issue scheme Dynamic instruction issue Operand fetch policy Types of issue buffers

2.3 Operand fetch policies Dispatch bound operand fetch policy Issue bound operand fetch policy Operand fetch policies Decode / Issue EU Reg. file IB OCRdOp1/Rs1Op2/Rs2OC I-buffer Source reg. identifiers Opcodes, destination reg. identifiers Source 1 operands Source 2 operands EU Rd, result IB Rd Op1/Rs1 Op2/Rs2 I-buffer Source reg. identifiers Opcodes, destination reg. identifiers Source 1 operands Source 2 operands OCRd IB OCRd Decode / Issue Reg. file EU Source reg. identifiers Rs1Rs2 IB Rs1Rs2 Dispatch Issue Dispatch Issue Figure 2.1: Operand fetch policies

3 Principle of operation of dynamic instruction issue 3.1 Dispatch bound operand fetching (1) Checking the availability of operands Decode / Issue EU Reg. file IB OCRdOp1/Rs1Op2/Rs2OC I-buffer Source reg. identifiers Opcodes, destination reg. identifiers Source 1 operands Source 2 operands EU Rd, result IB Rd Op1/Rs1 Op2/Rs2 Dispatch Issue V V V V V

3.1 Dispatch bound operand fetching (2) Updating the issue buffers Decode / Issue EU Reg. file IB OCRdOp1/Rs1Op2/Rs2OC I-buffer Source reg. identifiers Opcodes, destination reg. identifiers Source 1 operands Source 2 operands EU Rd, result IB Rd Op1/Rs1 Op2/Rs2 Dispatch Issue V V V V V

3.2 Issue bound operand fetching Checking the availability of operands I-buffer Source reg. identifiers Opcodes, destination reg. identifiers Source 1 operands Source 2 operands OCRd IB OCRd Decode / Issue Reg. file EU Source reg. identifiers Rs1Rs2 IB Rs1Rs2 Dispatch Issue V

4. Implementation of dynamic instruction issue in superscalars 4.1 The introduction of dynamic instruction issue Figure 4.1: The introduction of dynamic instruction issue

Reservation stations (RS) Issue buffers in the ROB Basic issue buffer schemes Individual RSs Central RS Group RSs Types of issue buffers Operand fetch policy Dispatch bound Issue bound Dispatch bound Issue bound Dispatch bound Issue bound Dispatch bound Issue bound PowerPC 603 (1993) PowerPC 604 (1995) K5 (1995) Power1 (1990) Power4 (2001) Power5 (2004) Nx586 (1994) K7 (1999), K8 (2003) PM1(Sparc64) (1995) ES/9000 (1992) Power2 (1993) R10000 (1996) Alpha (1997) Pentium Pro (1995) Pentium II (1997) Pentium III (1999) Pentium IV (2000) Pentium M (2003) Core (2006) Lightning (1991)p K6 (1997) 4.2 Basic implementation schemes

5. Case example (1) Individual issue buffers Figure 5.1: The microarchitecture of the Athlon

5. Case example (1) Individual issue buffers (2) Figure 5.2: Integer issue buffers of the K8L Source: Malich, Y.„AMD's Next Generation Microarchitecture Preview: from K8 to K8L”, Aug Issue buffers Decoders EUs

5. Case example (2) Group issue buffers Figure 5.3: The microarchitecture of the Alpha Source: Kessler, R.E. et al..„The Alpha Microprocessor Architecture”, h18002.www1.hp.com/alphaserver

5. Case example (3) Central reservation station (1) Figure 5.3: The microarchitecture of the Core processor Source: Kanter, D., „Intel’s next Generation Microarchitecture Unveiled”, Real World Tech., 2006 March 9.