CIS 662 – Computer Architecture – Fall 2004 - Class 11 – 10/12/04 1 Scoreboarding  The following four steps replace ID, EX and WB steps  ID: Issue –

Slides:



Advertisements
Similar presentations
Scoreboarding & Tomasulos Approach Bazat pe slide-urile lui Vincent H. Berk.
Advertisements

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.
Instruction-level Parallelism Compiler Perspectives on Code Movement dependencies are a property of code, whether or not it is a HW hazard depends on.
Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
A scheme to overcome data hazards
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.
Lecture 6: ILP HW Case Study— CDC 6600 Scoreboard & Tomasulo’s Algorithm Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
COMP25212 Advanced Pipelining Out of Order Processors.
CMSC 611: Advanced Computer Architecture Scoreboard Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.
Lecture 6: Pipelining MIPS R4000 and More Kai Bu
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.
Computer Architecture
Data Hazards RAW Hazard ADD.D F3, F1, F2 SUB.D F5, F6, F3 No Solution, normal property of programs WAW Hazard DIV.D F3, F1, F2 SUB.D F3, F6, F5 This instruction.
1 Zvika Guz Slides modified from Prof. Dave Patterson, Prof. John Kubiatowicz, and Prof. Nancy Warter-Perez Out Of Order Execution.
Review of CS 203A Laxmi Narayan Bhuyan Lecture2.
1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.
COMP381 by M. Hamdi 1 Pipelining (Dynamic Scheduling Through Hardware Schemes)
1 Recap (Scoreboarding). 2 Dynamic Scheduling Dynamic Scheduling by Hardware – – Allow Out-of-order execution, Out-of-order completion – – Even though.
ENGS 116 Lecture 71 Scoreboarding Vincent H. Berk October 8, 2008 Reading for today: A.5 – A.6, article: Smith&Pleszkun FRIDAY: NO CLASS Reading for Monday:
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 5, 2005 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Scoreboarding)
CSC 4250 Computer Architectures October 13, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.
Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)
Out-of-order execution: Scoreboarding and Tomasulo Week 2
CIS 662 – Computer Architecture – Fall Class 16 – 11/09/04 1 Compiler Techniques for ILP  So far we have explored dynamic hardware techniques for.
Instruction-Level Parallelism Dynamic Scheduling
1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.
CET 520/ Gannod1 Section A.8 Dynamic Scheduling using a Scoreboard.
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.
2/24; 3/1,3/11 (quiz was 2/22, QuizAns 3/8) CSE502-S11, Lec ILP 1 Tomasulo Organization FP adders Add1 Add2 Add3 FP multipliers Mult1 Mult2 From.
1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,
CSC 4250 Computer Architectures September 29, 2006 Appendix A. Pipelining.
04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;
COMP25212 Advanced Pipelining Out of Order Processors.
CS203 – Advanced Computer Architecture ILP and Speculation.
Sections 3.2 and 3.3 Dynamic Scheduling – Tomasulo’s Algorithm 吳俊興 高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.
Code Example LD F6,34(R2) LD F2,45(R3) MULTI F0,F2,F4 SUBD F8,F6,F2
Instruction-Level Parallelism and Its Dynamic Exploitation
IBM System 360. Common architecture for a set of machines
Images from Patterson-Hennessy Book
/ Computer Architecture and Design
Lecture 07: Pipelining Multicycle, MIPS R4000, and More
Tomasulo’s Algorithm Born of necessity
Out of Order Processors
Dynamic Scheduling and Speculation
Step by step for Tomasulo Scheme
CS203 – Advanced Computer Architecture
Microprocessor Microarchitecture Dynamic Pipeline
Advantages of Dynamic Scheduling
High-level view Out-of-order pipeline
CMSC 611: Advanced Computer Architecture
A Dynamic Algorithm: Tomasulo’s
Out of Order Processors
Last Week Talks Any feedback from the talks? What did you like?
Pipelining Multicycle, MIPS R4000, and More
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
CS 704 Advanced Computer Architecture
CSCE430/830 Computer Architecture
Advanced Computer Architecture
Tomasulo Organization
Reduction of Data Hazards Stalls with Dynamic Scheduling
Lecture 5 Scoreboarding: Enforce Register Data Dependence
Scoreboarding ENGS 116 Lecture 7 Vincent H. Berk October 5, 2005
/ Computer Architecture and Design
High-level view Out-of-order pipeline
Lecture 7 Dynamic Scheduling
Conceptual execution on a processor which exploits ILP
Presentation transcript:

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 1 Scoreboarding  The following four steps replace ID, EX and WB steps  ID: Issue – if a functional unit for instruction is free and no other active instruction has the same destination register (WAW) it can proceed, otherwise it stalls  ID: Read operands – a source operand is available if no earlier instruction is going to write it  EX: Execute – once the execution is complete this stage notifies the scoreboard  WB: Write results – scoreboard checks for WAR hazards and may stall write back

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 2 Scoreboarding  Operands are always read from register file – no advantage is taken of forwarding  This is no large penalty as write occurs immediately after the execution and not after MEM stage  Read operand and write result stages cannot overlap so we have 1 cycle latency

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 3 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status Integer YesLoadF6R2Yes Issue first load Time =1

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 4 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status Integer YesLoadF6R2Yes  First load reads operands Time =2 Second load cannot be issued due to structural hazard No

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 5 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status   YesLoadF6R2No Integer First load completes execution Time =3 Second load cannot be issued due to structural hazard

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 6 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status   First load writes the result and frees ALU  Time =4 YesLoadF6R2No Integer Second load cannot be issued due to structural hazard

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 7 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    YesLoadF2R3Yes Integer  Second load is issued Time =5

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 8 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    Yes LoadF2R3Yes Integer    MultF0F2 F4 Integer No Yes Mult1 Second load reads operands Time =6 Mult is issued No

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 9 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    Yes LoadF2R3No Integer    MultF0F2 F4 Integer No Yes Mult1   Sub is issued SubF8F6 F2 Integer Yes No Add Time =7 Second load completes execution Mult is stalled waiting for F2

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 10 Integer IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    Yes    MultF0F2 F4 No Yes Mult1   Div is issued SubF8F6 F2 No Add  DivF10F0 F6 No Yes Mult1  Divide Time =8 Second load writes result Mult is stalled waiting for F2 Sub is stalled waiting for F2 YesLoadF2R3No Yes Integer

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 11 Yes IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    Yes    MultF0F2 F4 No Mult1   SubF8F6 F2 No Add  DivF10F0 F6 No Yes Mult1    Divide Time =9 Mult reads operands Sub reads operands Div is stalled waiting for F0 Add cannot be issued due to structural hazard No

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 12 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    Yes    MultF0F2 F4 No Mult1   Add cannot be issued due to structural hazard SubF8F6 F2 No Add  DivF10F0 F6 No Yes Mult1    Divide Time =10 Mult in execution (1 out of 10) Sub in execution (1 out of 2) Div is stalled waiting for F0 10

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 13 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    Yes    MultF0F2 F4 No Mult1   SubF8F6 F2 No Add  DivF10F0 F6 No Yes Mult1     Divide Time =11 Add cannot be issued due to structural hazard Mult in execution (2 out of 10) Sub completes execution Div is stalled waiting for F0 10

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 14 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    Yes    MultF0F2 F4 No Mult1    DivF10F0 F6 No Yes Mult1     Divide Time =12 Mult in execution (3 out of 10) Sub writes result, frees adder Div is stalled waiting for F0 Add cannot be issued due to structural hazard 10 Yes SubF8F6 F2 No Add

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 15 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    Yes    MultF0F2 F4 No Mult1   Add is issued Yes Add  DivF10F0 F6 No Yes Mult1      AddF6F8 F2 Divide Time =13 10 Mult in execution (4 out of 10) Div is stalled waiting for F0

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 16 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    Yes    MultF0F2 F4 No Mult1   Add reads operands Yes Add  DivF10F0 F6 No Yes Mult1      AddF6F8 F2  Divide Time =14 Mult in execution (5 out of 10) Div is stalled waiting for F0 10 No

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 17 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    Yes    MultF0F2 F4 No Mult1   Add in execution (1 out of 2) No Add  DivF10F0 F6 No Yes Mult1      AddF6F8 F2  Divide Time =15 10 Mult in execution (6 out of 10) Div is stalled waiting for F0 15

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 18 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    Yes    MultF0F2 F4 No Mult1   No Add  DivF10F0 F6 No Yes Mult1      AddF6F8 F2   Divide Time =16 Add completes execution Mult in execution (7 out of 10) Div is stalled waiting for F

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 19 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    Yes    MultF0F2 F4 No Mult1   No Add  DivF10F0 F6 No Yes Mult1      AddF6F8 F2   Divide Time =17 Add is stalled, WAR hazard Mult in execution (8 out of 10) Div is stalled waiting for F0

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 20 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    Yes    MultF0F2 F4 No Mult1   No Add  DivF10F0 F6 No Yes Mult1      AddF6F8 F2    Divide Time =19 Add is stalled, WAR hazard Mult completes execution Div is stalled waiting for F0

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 21 No IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    Yes    Add   No  DivF10F0 F6 Yes      AddF6F8 F2    Time =20 Add is stalled, WAR hazard Mult writes result Div is stalled waiting for F0 YesMultF0F2 F4 No Mult1 Divide

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 22 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    Yes    Add   No  DivF10F0 F6 Yes      AddF6F8 F2     Divide Time =21 No Div reads operands Add is stalled, WAR hazard

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 23 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    Yes      Add writes result  DivF10F0 F6 No          Divide  Time =22 Div in execution (1 out of 40) 22 Yes Add No AddF6F8 F2

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 24 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status    Yes      Div completes execution  DivF10F0 F6 No          Divide   Time =61

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 25 IssueRead operandsExecution complete Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2  Instruction status BusyOp F i F j F k Q j Q k R j R k Integer ALU FP Mult1 FP Mult2 FP Add FP Div Functional unit status F 0 … F 2 … F 4 … F 6 … F 8 … F 10 … F 12 Functional unit Register result status         Div writes result              Time =62 Yes DivF10F0 F6 No Divide

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 26 Tomasulo’s Algorithm  Use reservation stations that will hold operands for instructions waiting to issue  Reservation station fetches the operand as soon as it is available  Pending instructions read operands from reservation stations  When writes overlap in execution, only the last write actually updates the register

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 27 Tomasulo’s Algorithm FP registers Instruction queue Address unit Memory unit FP adders FP multipliers From instruction unit Reservation stations Store buffers Load buffers Data Address Common data bus LOAD-STORE OPERATIONS FP OPERATIONS

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 28 Tomasulo’s Algorithm  Each reservation station holds the opcode for the pending instruction and either operand values or names of reservation stations that will provide them  Load and store buffers hold data and addresses for memory access  Transfer of all data goes over the common data bus

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 29 Homework ● Due Tuesday, October 19 by the end of the class ● Submit either in class (paper) or by (PS or PDF only) or bring the paper copy to my office ● Show scheduling of the following code using scoreboard (assume one integer ALU, two FP multipliers, one FP adder and one FP divider) LD F2, 0(R2) LD F4, 100(R3) ADD F8, F2, F2 MUL F6, F4, F8 SUB F6, F2, F4