CSL718 : Superscalar Processors

Slides:



Advertisements
Similar presentations
Scoreboarding & Tomasulos Approach Bazat pe slide-urile lui Vincent H. Berk.
Advertisements

A scheme to overcome data hazards
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.
COMP25212 Advanced Pipelining Out of Order Processors.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 7: Dynamic Scheduling and Branch Prediction * Jeremy R. Johnson Wed. Nov. 8, 2000.
Computer Architecture
COMP381 by M. Hamdi 1 Pipelining (Dynamic Scheduling Through Hardware Schemes)
ENGS 116 Lecture 71 Scoreboarding Vincent H. Berk October 8, 2008 Reading for today: A.5 – A.6, article: Smith&Pleszkun FRIDAY: NO CLASS Reading for Monday:
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 5, 2005 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Scoreboarding)
Expl. ILP & Dyn.Sched CSE 4711 How to improve (decrease) CPI Recall: CPI = Ideal CPI + CPI contributed by stalls Ideal CPI =1 for single issue machine.
Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)
1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.
Out-of-order execution: Scoreboarding and Tomasulo Week 2
Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Issue and Despatch 23rd Jan, 2006.
1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.
CET 520/ Gannod1 Section A.8 Dynamic Scheduling using a Scoreboard.
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.
1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,
CSC 4250 Computer Architectures September 29, 2006 Appendix A. Pipelining.
Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Speculative Execution 2nd Feb, 2006.
04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;
COMP25212 Advanced Pipelining Out of Order Processors.
Tomasulo algorithm 윤진훈.
Instruction-Level Parallelism and Its Dynamic Exploitation
IBM System 360. Common architecture for a set of machines
Images from Patterson-Hennessy Book
CSL718 : Superscalar Processors
Out of Order Processors
Dynamic Scheduling and Speculation
Step by step for Tomasulo Scheme
CS203 – Advanced Computer Architecture
Microprocessor Microarchitecture Dynamic Pipeline
CSE 520 Computer Architecture Lec Chapter 2 - DS-Tomasulo
Lecture 6 Score Board And Tomasulo’s Algorithm
Parallel architectures
Chapter 3: ILP and Its Exploitation
Advantages of Dynamic Scheduling
High-level view Out-of-order pipeline
11/14/2018 CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, Electrical and Computer.
CMSC 611: Advanced Computer Architecture
A Dynamic Algorithm: Tomasulo’s
COMP s1 Seminar 3: Dynamic Scheduling
Out of Order Processors
Last Week Talks Any feedback from the talks? What did you like?
CS252 Graduate Computer Architecture Lecture 6 Scoreboard, Tomasulo, Register Renaming February 7th, 2011 John Kubiatowicz Electrical Engineering and.
ECE 2162 Reorder Buffer.
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
Out-of-Order Execution Scheduling
CS 704 Advanced Computer Architecture
Checking for issue/dispatch
Static vs. dynamic scheduling
CSCE430/830 Computer Architecture
Advanced Computer Architecture
Static vs. dynamic scheduling
September 20, 2000 Prof. John Kubiatowicz
Tomasulo Organization
Reduction of Data Hazards Stalls with Dynamic Scheduling
Lecture 5 Scoreboarding: Enforce Register Data Dependence
CS152 Computer Architecture and Engineering Lecture 16 Compiler Optimizations (Cont) Dynamic Scheduling with Scoreboards.
CS252 Graduate Computer Architecture Lecture 6 Tomasulo, Implicit Register Renaming, Loop-Level Parallelism Extraction Explicit Register Renaming February.
Scoreboarding ENGS 116 Lecture 7 Vincent H. Berk October 5, 2005
/ Computer Architecture and Design
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
September 20, 2000 Prof. John Kubiatowicz
CS252 Graduate Computer Architecture Lecture 6 Introduction to Advanced Pipelining: Out-Of-Order Execution John Kubiatowicz Electrical Engineering and.
High-level view Out-of-order pipeline
Lecture 7 Dynamic Scheduling
CSE 586 Computer Architecture Lecture 3
Conceptual execution on a processor which exploits ILP
Presentation transcript:

CSL718 : Superscalar Processors Handling Data Dependencies 24th Jan, 2006 Anshul Kumar, CSE IITD

CDC6600 : score-boarding scheme Illustration 1 CDC6600 : score-boarding scheme Dispatch bound fetch FUs : INT, MUL1, MUL2, ADD/SUB, DIV 1 RS per FU 1 RF In order issue, dispatch order trivial, out of order execution Anshul Kumar, CSE IITD

Checking in dispatch bound fetch decoded instruction Reservation station check V bits of sources update Rd set V bit Rs1,Rs2,Rd reset V bit of Rd OC Rs1 Rs2 Rd Register File OC (opcode) Os1 Os2 (operand value) EU result, Rd Anshul Kumar, CSE IITD

Instruction status Functional Units RF INSTRUCTION ISSUE READ OP EX COMPL WRITERES LF F6, 34(R2) LF F2, 45(R3) Instruction status MUL F0,F2,F4 SUB F8,F6,F2 DIVF10,F0,F6 ADD F6,F8,F2 No NAME BUSY OP Fi Fj Fk Qj Qk Rj Rk 1 INT 2 MUL1 Functional Units 3 MUL2 4 ADD 5 DIV F0 F2 F4 F6 F8 F10 F12 F14 RF FU No

Instruction status Functional Units RF INSTRUCTION ISSUE READ OP EX COMPL WRITERES LF F6, 34(R2)     LF F2, 45(R3)  Instruction status MUL F0,F2,F4  SUB F8,F6,F2  DIVF10,F0,F6  ADD F6,F8,F2 No NAME BUSY OP Fi Fj Fk Qj Qk Rj Rk 1 INT Y LF 2 MUL1 Y MUL Functional Units 3 MUL2 N 4 ADD Y SUB 5 DIV Y DIV F0 F2 F4 F6 F8 F10 F12 F14 RF FU No

Instruction status Functional Units RF INSTRUCTION ISSUE READ OP EX COMPL WRITERES LF F6, 34(R2)     LF F2, 45(R3)  Instruction status MUL F0,F2,F4  SUB F8,F6,F2  DIVF10,F0,F6  ADD F6,F8,F2 No NAME BUSY OP Fi Fj Fk Qj Qk Rj Rk 1 INT Y LF F2 R3 2 MUL1 Y MUL F0 F2 F4 Functional Units 3 MUL2 N 4 ADD Y SUB F8 F6 F2 5 DIV Y DIV F10 F0 F6 F0 F2 F4 F6 F8 F10 F12 F14 RF FU No

Instruction status Functional Units RF INSTRUCTION ISSUE READ OP EX COMPL WRITERES LF F6, 34(R2)     LF F2, 45(R3)  Instruction status MUL F0,F2,F4  SUB F8,F6,F2  DIVF10,F0,F6  ADD F6,F8,F2 No NAME BUSY OP Fi Fj Fk Qj Qk Rj Rk 1 INT Y LF F2 R3 Y Y 2 MUL1 Y MUL F0 F2 F4 1 N Y Functional Units 3 MUL2 N 4 ADD Y SUB F8 F6 F2 1 Y N 5 DIV Y DIV F10 F0 F6 2 N Y F0 F2 F4 F6 F8 F10 F12 F14 RF FU No 2 1 4 5

Instruction status Functional Units RF INSTRUCTION ISSUE READ OP EX COMPL WRITERES LF F6, 34(R2)     LF F2, 45(R3)   Instruction status MUL F0,F2,F4  SUB F8,F6,F2  DIVF10,F0,F6  ADD F6,F8,F2 No NAME BUSY OP Fi Fj Fk Qj Qk Rj Rk 1 INT Y LF F2 R3 N N 2 MUL1 Y MUL F0 F2 F4 1 N Y Functional Units 3 MUL2 N 4 ADD Y SUB F8 F6 F2 1 Y N 5 DIV Y DIV F10 F0 F6 2 N Y F0 F2 F4 F6 F8 F10 F12 F14 RF FU No 2 1 4 5

Instruction status Functional Units RF INSTRUCTION ISSUE READ OP EX COMPL WRITERES LF F6, 34(R2)     LF F2, 45(R3)     Instruction status MUL F0,F2,F4  SUB F8,F6,F2  DIVF10,F0,F6  ADD F6,F8,F2 No NAME BUSY OP Fi Fj Fk Qj Qk Rj Rk 1 INT N 2 MUL1 Y MUL F0 F2 F4  Y Y Functional Units 3 MUL2 N 4 ADD Y SUB F8 F6 F2  Y Y 5 DIV Y DIV F10 F0 F6 2 N Y F0 F2 F4 F6 F8 F10 F12 F14 RF FU No 2  4 5

Instruction status Functional Units RF INSTRUCTION ISSUE READ OP EX COMPL WRITERES LF F6, 34(R2)     LF F2, 45(R3)     Instruction status MUL F0,F2,F4   SUB F8,F6,F2     DIVF10,F0,F6  ADD F6,F8,F2 No NAME BUSY OP Fi Fj Fk Qj Qk Rj Rk 1 INT N 2 MUL1 Y MUL F0 F2 F4 N N Functional Units 3 MUL2 N 4 ADD N 5 DIV Y DIV F10 F0 F6 2 N Y F0 F2 F4 F6 F8 F10 F12 F14 RF FU No 2  5

Instruction status Functional Units RF INSTRUCTION ISSUE READ OP EX COMPL WRITERES LF F6, 34(R2)     LF F2, 45(R3)     Instruction status MUL F0,F2,F4   SUB F8,F6,F2     DIVF10,F0,F6  ADD F6,F8,F2  No NAME BUSY OP Fi Fj Fk Qj Qk Rj Rk 1 INT N 2 MUL1 Y MUL F0 F2 F4 N N Functional Units 3 MUL2 N 4 ADD Y ADD F6 F8 F2 Y Y 5 DIV Y DIV F10 F0 F6 2 N Y F0 F2 F4 F6 F8 F10 F12 F14 RF FU No 2 4 5

Instruction status Functional Units RF INSTRUCTION ISSUE READ OP EX COMPL WRITERES LF F6, 34(R2)     LF F2, 45(R3)     Instruction status MUL F0,F2,F4   SUB F8,F6,F2     DIVF10,F0,F6  ADD F6,F8,F2   No NAME BUSY OP Fi Fj Fk Qj Qk Rj Rk 1 INT N 2 MUL1 Y MUL F0 F2 F4 N N Functional Units 3 MUL2 N 4 ADD Y ADD F6 F8 F2 N N 5 DIV Y DIV F10 F0 F6 2 N Y F0 F2 F4 F6 F8 F10 F12 F14 RF FU No 2 4 5

Instruction status Functional Units RF INSTRUCTION ISSUE READ OP EX COMPL WRITERES LF F6, 34(R2)     LF F2, 45(R3)     Instruction status MUL F0,F2,F4   SUB F8,F6,F2     DIVF10,F0,F6  ADD F6,F8,F2    No NAME BUSY OP Fi Fj Fk Qj Qk Rj Rk 1 INT N 2 MUL1 Y MUL F0 F2 F4 N N Functional Units 3 MUL2 N 4 ADD Y ADD F6 F8 F2 N N 5 DIV Y DIV F10 F0 F6 2 N Y F0 F2 F4 F6 F8 F10 F12 F14 RF FU No 2 4 5

Instruction status Functional Units RF INSTRUCTION ISSUE READ OP EX COMPL WRITERES LF F6, 34(R2)     LF F2, 45(R3)     Instruction status MUL F0,F2,F4     SUB F8,F6,F2     DIVF10,F0,F6  ADD F6,F8,F2    No NAME BUSY OP Fi Fj Fk Qj Qk Rj Rk 1 INT N 2 MUL1 N Functional Units 3 MUL2 N 4 ADD Y ADD F6 F8 F2 N N 5 DIV Y DIV F10 F0 F6  Y Y F0 F2 F4 F6 F8 F10 F12 F14 RF FU No  4 5

Instruction status Functional Units RF INSTRUCTION ISSUE READ OP EX COMPL WRITERES LF F6, 34(R2)     LF F2, 45(R3)     Instruction status MUL F0,F2,F4     SUB F8,F6,F2     DIVF10,F0,F6   ADD F6,F8,F2    No NAME BUSY OP Fi Fj Fk Qj Qk Rj Rk 1 INT N 2 MUL1 N Functional Units 3 MUL2 N 4 ADD Y ADD F6 F8 F2 N N 5 DIV Y DIV F10 F0 F6 N N F0 F2 F4 F6 F8 F10 F12 F14 RF FU No 4 5

Instruction status Functional Units RF INSTRUCTION ISSUE READ OP EX COMPL WRITERES LF F6, 34(R2)     LF F2, 45(R3)     Instruction status MUL F0,F2,F4     SUB F8,F6,F2     DIVF10,F0,F6   ADD F6,F8,F2     No NAME BUSY OP Fi Fj Fk Qj Qk Rj Rk 1 INT N 2 MUL1 N Functional Units 3 MUL2 N 4 ADD N 5 DIV Y DIV F10 F0 F6 N N F0 F2 F4 F6 F8 F10 F12 F14 RF FU No  5

IBM 360/91 - Tomasulo’s scheme Illustration 2 IBM 360/91 - Tomasulo’s scheme Issue bound fetch FUs : LOAD, STORE, 3 x ADD/SUB, 2 x MUL/DIV Group RS’s with 1 slot per FU 1 RF In order issue, out of order execution Anshul Kumar, CSE IITD

Checking in issue bound fetch decoded instruction Rs1,Rs2,Rd reset V bit of Rd update Rd, set V bit Register File Os1 Os2 (operand value) Reservation station check Vs1, Vs2 OC, Os1, Os2, Rd OC Os1/Is1 Vs1 Os2/Is2 Vs2 Rd EU associative update of Is1, Is2 with Rd, set Vs bits result, Rd Anshul Kumar, CSE IITD

Instruction status Functional Units RF INSTRUCTION ISSUE EX COMPL WRITERES LF F6, 34(R2) LF F2, 45(R3) Instruction status MUL F0,F2,F4 SUB F8,F6,F2 DIVF10,F0,F6 ADD F6,F8,F2 NAME BUSY OP Vj Vk Qj Qk ADD1 ADD2 Functional Units ADD3 MUL1 MUL2 F0 F2 F4 F6 F8 F10 F12 F14 RF Qi

Instruction status Functional Units RF INSTRUCTION ISSUE EX COMPL WRITERES LF F6, 34(R2)    LF F2, 45(R3)  Instruction status MUL F0,F2,F4  SUB F8,F6,F2  DIVF10,F0,F6  ADD F6,F8,F2  NAME BUSY OP Vj Vk Qj Qk ADD1 Y SUB ADD2 Y ADD Functional Units ADD3 N MUL1 Y MUL MUL2 Y DIV F0 F2 F4 F6 F8 F10 F12 F14 RF Qi

Instruction status Functional Units RF INSTRUCTION ISSUE EX COMPL WRITERES LF F6, 34(R2)    LF F2, 45(R3)  Instruction status MUL F0,F2,F4  SUB F8,F6,F2  DIVF10,F0,F6  ADD F6,F8,F2  NAME BUSY OP Vj Vk Qj Qk ADD1 Y SUB (LD1) LD2 ADD2 Y ADD ADD1 LD2 Functional Units ADD3 N MUL1 Y MUL (F4) LD2 MUL2 Y DIV (LD1) MUL1 F0 F2 F4 F6 F8 F10 F12 F14 RF Qi MUL1 LD2 ADD2 ADD1 MUL2

Instruction status Functional Units RF INSTRUCTION ISSUE EX COMPL WRITERES LF F6, 34(R2)    LF F2, 45(R3)    Instruction status MUL F0,F2,F4  SUB F8,F6,F2  DIVF10,F0,F6  ADD F6,F8,F2  NAME BUSY OP Vj Vk Qj Qk ADD1 Y SUB (LD1) (LD2)  ADD2 Y ADD (LD2) ADD1  Functional Units ADD3 N MUL1 Y MUL (LD2) (F4)  MUL2 Y DIV (LD1) MUL1 F0 F2 F4 F6 F8 F10 F12 F14 RF Qi MUL1  ADD2 ADD1 MUL2

Instruction status Functional Units RF INSTRUCTION ISSUE EX COMPL WRITERES LF F6, 34(R2)    LF F2, 45(R3)    Instruction status MUL F0,F2,F4  SUB F8,F6,F2    DIVF10,F0,F6  ADD F6,F8,F2  NAME BUSY OP Vj Vk Qj Qk ADD1 N ADD2 Y ADD (ADD1) (LD2)  Functional Units ADD3 N MUL1 Y MUL (LD2) (F4) MUL2 Y DIV (LD1) MUL1 F0 F2 F4 F6 F8 F10 F12 F14 RF Qi MUL1 ADD2  MUL2

Instruction status Functional Units RF INSTRUCTION ISSUE EX COMPL WRITERES LF F6, 34(R2)    LF F2, 45(R3)    Instruction status MUL F0,F2,F4   SUB F8,F6,F2    DIVF10,F0,F6  ADD F6,F8,F2   NAME BUSY OP Vj Vk Qj Qk ADD1 N ADD2 Y ADD (ADD1) (LD2) Functional Units ADD3 N MUL1 Y MUL (LD2) (F4) MUL2 Y DIV (LD1) MUL1 F0 F2 F4 F6 F8 F10 F12 F14 RF Qi MUL1 ADD2 MUL2

Instruction status Functional Units RF INSTRUCTION ISSUE EX COMPL WRITERES LF F6, 34(R2)    LF F2, 45(R3)    Instruction status MUL F0,F2,F4   SUB F8,F6,F2    DIVF10,F0,F6  ADD F6,F8,F2    NAME BUSY OP Vj Vk Qj Qk ADD1 N ADD2 N Functional Units ADD3 N MUL1 Y MUL (LD2) (F4) MUL2 Y DIV (LD1) MUL1 F0 F2 F4 F6 F8 F10 F12 F14 RF Qi MUL1  MUL2

Instruction status Functional Units RF INSTRUCTION ISSUE EX COMPL WRITERES LF F6, 34(R2)    LF F2, 45(R3)    Instruction status MUL F0,F2,F4    SUB F8,F6,F2    DIVF10,F0,F6  ADD F6,F8,F2    NAME BUSY OP Vj Vk Qj Qk ADD1 N ADD2 N Functional Units ADD3 N MUL1 N MUL2 Y DIV (MUL1) (LD1)  F0 F2 F4 F6 F8 F10 F12 F14 RF Qi  MUL2

End of Illustration Ref: Hennesy & Patterson’s Book [Ch. 4] Anshul Kumar, CSE IITD

RAW, WAR and WAW (in Static Pipeline) IF D RF EX WB RAW IF D RF EX WB IF D RF EX WB WAR IF D RF EX WB IF D RF EX EX EX WB WAW IF D RF EX WB Anshul Kumar, CSE IITD

RAW, WAR and WAW (in Superscalar) write IF IS DP EX WB RAW read IF IS DP EX WB WAW WAR write IF IS DP EX WB Anshul Kumar, CSE IITD

Implementation using scoreboard bit write IF IS DP EX WB RAW read IF IS DP EX WB WAW WAR write IF IS DP EX WB b  0 Anshul Kumar, CSE IITD

CDC 6600 like Implementation b  0 b  1 write IF IS DP EX WB RAW read IF IS DP EX WB WAW WAR write IF IS DP EX WB b  0 Anshul Kumar, CSE IITD

IBM 360 like Implementation write IF IS DP EX WB RAW read IF IS DP EX WB WAW WAR write IF IS DP EX WB b  0 Anshul Kumar, CSE IITD

Use of Renaming write read write IF IS DP EX WB RAW IF IS DP EX WB WAW WAR write IF IS DP EX WB Anshul Kumar, CSE IITD