MS108 Computer System I Lecture 6 Scoreboarding Prof. Xiaoyao Liang 2015/4/3 1.

Slides:



Advertisements
Similar presentations
A scheme to overcome data hazards
Advertisements

Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
Lecture 6: ILP HW Case Study— CDC 6600 Scoreboard & Tomasulo’s Algorithm Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
COMP25212 Advanced Pipelining Out of Order Processors.
CSE 8383 Superscalar Processor 1 Abdullah A Alasmari & Eid S. Alharbi.
Computer Organization CS224 Fall 2012 Lesson 19. Floating-Point Example  What number is represented by the single-precision float …00 
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 7: Dynamic Scheduling and Branch Prediction * Jeremy R. Johnson Wed. Nov. 8, 2000.
CMSC 611: Advanced Computer Architecture Scoreboard Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Complex Pipelining I Steve Ko Computer Sciences and Engineering University at Buffalo.
Computer Architecture
Data Hazards RAW Hazard ADD.D F3, F1, F2 SUB.D F5, F6, F3 No Solution, normal property of programs WAW Hazard DIV.D F3, F1, F2 SUB.D F3, F6, F5 This instruction.
CS 152 Computer Architecture and Engineering Lecture 12 - Complex Pipelines Krste Asanovic Electrical Engineering and Computer Sciences University of California.
COMP381 by M. Hamdi 1 Superscalar Processors. COMP381 by M. Hamdi 2 Recall from Pipelining Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data.
CSC 4250 Computer Architectures October 17, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.
CS 152 Computer Architecture and Engineering Lecture 12 - Complex Pipelines Krste Asanovic Electrical Engineering and Computer Sciences University of California.
1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.
COMP381 by M. Hamdi 1 Pipelining (Dynamic Scheduling Through Hardware Schemes)
1 Recap (Scoreboarding). 2 Dynamic Scheduling Dynamic Scheduling by Hardware – – Allow Out-of-order execution, Out-of-order completion – – Even though.
ENGS 116 Lecture 71 Scoreboarding Vincent H. Berk October 8, 2008 Reading for today: A.5 – A.6, article: Smith&Pleszkun FRIDAY: NO CLASS Reading for Monday:
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 5, 2005 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Scoreboarding)
EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
CSC 4250 Computer Architectures October 13, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.
Computer Architecture
Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)
Out-of-order execution: Scoreboarding and Tomasulo Week 2
CSC 4250 Computer Architectures September 26, 2006 Appendix A. Pipelining.
1 Lecture 5 Overview of Superscalar Techniques CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading: Textbook, Ch. 2.1 “Complexity-Effective.
Instruction-Level Parallelism Dynamic Scheduling
1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.
CET 520/ Gannod1 Section A.8 Dynamic Scheduling using a Scoreboard.
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.
1 Lecture 5: Dependence Analysis and Superscalar Techniques Overview Instruction dependences, correctness, inst scheduling examples, renaming, speculation,
2/24; 3/1,3/11 (quiz was 2/22, QuizAns 3/8) CSE502-S11, Lec ILP 1 Tomasulo Organization FP adders Add1 Add2 Add3 FP multipliers Mult1 Mult2 From.
1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,
CSC 4250 Computer Architectures September 29, 2006 Appendix A. Pipelining.
04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;
CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 1 Scoreboarding  The following four steps replace ID, EX and WB steps  ID: Issue –
COMP25212 Advanced Pipelining Out of Order Processors.
Sections 3.2 and 3.3 Dynamic Scheduling – Tomasulo’s Algorithm 吳俊興 高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.
Code Example LD F6,34(R2) LD F2,45(R3) MULTI F0,F2,F4 SUBD F8,F6,F2
Instruction-Level Parallelism and Its Dynamic Exploitation
IBM System 360. Common architecture for a set of machines
Images from Patterson-Hennessy Book
Out of Order Processors
Dynamic Scheduling and Speculation
CS203 – Advanced Computer Architecture
Lecture 12 Reorder Buffers
Chapter 3: ILP and Its Exploitation
Advantages of Dynamic Scheduling
High-level view Out-of-order pipeline
A Dynamic Algorithm: Tomasulo’s
Out of Order Processors
Last Week Talks Any feedback from the talks? What did you like?
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
Krste Asanovic Electrical Engineering and Computer Sciences
CS 704 Advanced Computer Architecture
CSCE430/830 Computer Architecture
Advanced Computer Architecture
September 20, 2000 Prof. John Kubiatowicz
Tomasulo Organization
Reduction of Data Hazards Stalls with Dynamic Scheduling
Lecture 5 Scoreboarding: Enforce Register Data Dependence
CS152 Computer Architecture and Engineering Lecture 16 Compiler Optimizations (Cont) Dynamic Scheduling with Scoreboards.
Scoreboarding ENGS 116 Lecture 7 Vincent H. Berk October 5, 2005
/ Computer Architecture and Design
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
High-level view Out-of-order pipeline
Lecture 7 Dynamic Scheduling
CMSC 611: Advanced Computer Architecture
Presentation transcript:

MS108 Computer System I Lecture 6 Scoreboarding Prof. Xiaoyao Liang 2015/4/3 1

Complex Pipeline 2

3 Floating Point ISA

4FPU

5 Pipeline FPU

FP Multiplier Pipeline 6

Memory Timing 7

Pipeline Timing 8

Structure Hazard 9

Data Hazards 10

11 Detecting the Hazards

12 Memory Hazards

13 Instruction Scheduling

14 Dynamic Pipeline

15 Dynamic Scheduling

Two Solutions 16

CDC 6600, Seymour Cray,

Scoreboard 18

Hazards for Dynamic Pipeline 19

20Scoreboard

21 MIPS Scoreboard

22 Pipeline with Scoreboard

23 Scoreboard Components

24Example

25 Dependency Graph

26 Scoreboard Example: Cycle 1 Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult L.DF634+R2 1 L.DF245+R3 MUL.DF0F2F4 SUB.DF8F6F2 DIV.DF10F0F6 ADD.DF6F8F2 Functional unit statusdestS1S2FU for jFU for kFj?Fk? TimeNameBusyOpFiFjFkQjQkRjRk IntegerYesLoadF6R2Yes Mult1No Mult2No AddNo DivideNo Register result status Clock F0F2F4F6F8F10F12...F30 1FUInteger FP Latency: Add = 2 cycles, Multiply = 10, Divide = 40

27 Scoreboard Example: Cycle 2 FP Latency: Add = 2 cycles, Multiply = 10, Divide = 40 Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult L.DF634+R2 1 L.DF245+R3 MUL.DF0F2F4 SUB.DF8F6F2 DIV.DF10F0F6 ADD.DF6F8F2 Functional unit statusdestS1S2FU for jFU for kFj?Fk? TimeNameBusyOpFiFjFkQjQkRjRk IntegerYesLoadF6R2Yes Mult1No Mult2No AddNo DivideNo Register result status Clock F0F2F4F6F8F10F12...F30 2FUInteger 2 Issue second L.D? No, stall on structural hazard

28 Scoreboard Example: Cycle 3 Issue MUL.D? In-order issue !!! Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult L.DF634+R2123 L.DF245+R3 MUL.DF0F2F4 SUB.DF8F6F2 DIV.DF10F0F6 ADD.DF6F8F2 Functional unit statusdestS1S2FU for jFU for kFj?Fk? TimeNameBusyOpFiFjFkQjQkRjRk IntegerYesLoadF6R2Yes Mult1No Mult2No AddNo DivideNo Register result status Clock F0F2F4F6F8F10F12...F30 3FUInteger ?

29 Scoreboard Example: Cycle 4 Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult L.DF634+R L.DF245+R3 MUL.DF0F2F4 SUB.DF8F6F2 DIV.DF10F0F6 ADD.DF6F8F2 Functional unit statusdestS1S2FU for jFU for kFj?Fk? TimeNameBusyOpFiFjFkQjQkRjRk IntegerYesLoadF6R2Yes Mult1No Mult2No AddNo DivideNo Register result status Clock F0F2F4F6F8F10F12...F30 4FUInteger

30 Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult F634+R F245+R3 F0F2F4 F8F6F2 F10F0F6 L.D MUL.D SUB.D DIV.D ADD.D F6F8F2 Functional unit statusdestS1S2FU for jFU for kFj?Fk? TimeNameBusyOpFiFjFkQjQkRjRk IntegerYesLoadF2R3Yes Mult1No Mult2No AddNo DivideNo Register result status Clock F0F2F4F6F8F10F12...F30 5FUInteger 5 Scoreboard Example: Cycle 5

31 Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult F634+R F245+R3 F0F2F4 F8F6F2 F10F0F6 F8F2 Functional unit statusdestS1S2FU for jFU for kFj?Fk? TimeNameBusyOpFiFjFkQjQkRjRk IntegerYesLoadF2R3Yes Mult1 Mult2No AddNo DivideNo Register result status Clock F0F2F4F6F8F10F12...F30 6FU Integer Yes Mult F0 F2 F4 Integer No Yes Mult1 L.D MUL.D SUB.D DIV.D ADD.D Scoreboard Example: Cycle 6

32 Scoreboard Example: Cycle 7 Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult F634+R F245+R3 F0F2F4 F8F6F2 F10F0F6 F8F2 Functional unit statusdestS1S2FU for jFU for kFj?Fk? TimeNameBusyOpFiFjFkQjQkRjRk IntegerYesLoadF2R3Yes Mult1 Mult2 No Add Divide No Register result status Clock F0F2F4F6F8F10F12...F30 7FU Integer Yes Mult F0 F2 F4 Integer No Yes Yes Sub F8 F6 F2 Integer Yes No Mult1 Add 7 Read multiply operands? L.D MUL.D SUB.D DIV.D ADD.D

33 Scoreboard Example: Cycle 8a (First half of cycle 8) Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult F634+R F245+R3 F0F2F4 F8F6F2 F10F0F6 F8F2 Functional unit statusdestS1S2FU for jFU for kFj?Fk? TimeNameBusyOpFiFjFkQjQkRjRk IntegerYesLoadF2R3Yes Mult1 Mult2 No Add Divide Register result status Clock F0F2F4F6F8F10F12...F30 8FU Integer Yes Mult F0 F2 F4 Integer No Yes Yes Sub F8 F6 F2 Integer Yes No Mult1Add Divide 7 8 Yes Div F10 F0 F6 Mult1 No Yes L.D MUL.D SUB.D DIV.D ADD.D

34 Scoreboard Example: Cycle 8b (Second half of cycle 8) Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult F634+R F245+R3 F0F2F4 F8F6F2 F10F0F6 F8F2 Functional unit statusdestS1S2FU for jFU for kFj?Fk? TimeNameBusyOpFiFjFkQjQkRjRk Integer No Mult1 Mult2 No Add Divide Register result status Clock F0F2F4F6F8F10F12...F30 8FU Yes Mult F0 F2 F4 Yes Yes Yes Sub F8 F6 F2 Yes Yes Mult1Add Divide 7 8 Yes Div F10 F0 F6 Mult1 No Yes L.D MUL.D SUB.D DIV.D ADD.D

35 Scoreboard Example: Cycle 9 FP Latency: Add = 2 cycles, Multiply = 10, Divide = 40 Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult F634+R F245+R3 F0F2F4 F8F6F2 F10F0F6 F8F2 Functional unit statusdestS1S2FU for jFU for kFj?Fk? TimeNameBusyOpFiFjFkQjQkRjRk Integer No 10 Mult1 Mult2 No 2 Add Divide Register result status Clock F0F2F4F6F8F10F12...F30 9FU Yes Mult F0 F2 F4 Yes Yes Yes Sub F8 F6 F2 Yes Yes Mult1Add Divide Yes Div F10 F0 F6 Mult1 No Yes Read operands for MUL.D & SUB.D? Issue ADD.D? ? L.D MUL.D SUB.D DIV.D ADD.D

36 Scoreboard Example: Cycle 11 Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult F634+R F245+R3 F0F2F4 F8F6F2 F10F0F6 F8F2 Functional unit statusdestS1S2FU for jFU for kFj?Fk? TimeNameBusyOpFiFjFkQjQkRjRk Integer No 8 Mult1 Mult2 No 0 Add Divide Register result status Clock F0F2F4F6F8F10F12...F30 11FU Yes Mulu F0 F2 F4 Yes Yes Yes Sub F8 F6 F2 Yes Yes Mult1Add Divide Yes Div F10 F0 F6 Mult1 No Yes L.D MUL.D SUB.D DIV.D ADD.D

37 Scoreboard Example: Cycle 12 Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult F634+R F245+R3 F0F2F4 F8F6F2 F10F0F6 F8F2 Functional unit statusdestS1S2FU for jFU for kFj?Fk? TimeNameBusyOpFiFjFkQjQkRjRk Integer No 7 Mult1 Mult2 No Add Divide Register result status Clock F0F2F4F6F8F10F12...F30 12FU Yes Mult F0 F2 F4 Yes Yes No Mult1 Divide Yes Div F10 F0 F6 Mult1 No Yes Read operands for DIV.D? L.D MUL.D SUB.D DIV.D ADD.D

38 Scoreboard Example: Cycle 13 Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult F634+R F245+R3 F0F2F4 F8F6F2 F10F0F6 F8F2 Functional unit statusdestS1S2FU for jFU for kFj?Fk? TimeNameBusyOpFiFjFkQjQkRjRk Integer No 6 Mult1 Mult2 No Add Divide Register result status Clock F0F2F4F6F8F10F12...F30 13FU Yes Mult F0 F2 F4 Yes Yes Mult1 Add Divide Yes Div F10 F0 F6 Mult1 No Yes Yes Add F6 F8 F2 Yes Yes 13 L.D MUL.D SUB.D DIV.D ADD.D

39 Scoreboard Example: Cycle 17 Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult F634+R21234 F245+R35678 F0F2F469 F8F6F F10F0F68 F8F Functional unit statusdestS1S2FU for jFU for kFj?Fk? TimeNameBusyOpFiFjFkQjQkRjRk IntegerNo 2Mult1YesMultF0F2F4Yes Mult2No AddYesAddF6F8F2Yes DivideYesDivF10F0F6Mult1NoYes Register result status Clock F0F2F4F6F8F10F12...F30 17FUMult1AddDivide Write result of ADD.D? No, WAR hazard L.D MUL.D SUB.D DIV.D ADD.D ?

40 Scoreboard Example: Cycle 20 Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult F634+R21234 F245+R35678 F0F2F F8F6F F10F0F68 F8F Functional unit statusdestS1S2FU for jFU for kFj?Fk? TimeNameBusyOpFiFjFkQjQkRjRk IntegerNo Mult1 Mult2No AddYesAddF6F8F2Yes DivideYesDivF10F0F6Yes Register result status Clock F0F2F4F6F8F10F12...F30 20FUAddDivide No L.D MUL.D SUB.D DIV.D ADD.D

41 Scoreboard Example: Cycle 21 Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult F634+R21234 F245+R35678 F0F2F F8F6F F10F0F68 21 F6F8F Functional unit statusdestS1S2FU for jFU for kFj?Fk? TimeNameBusy OpFiFjFkQjQkRjRk IntegerNo Mult1 Mult2No AddYesAddF6F8F2Yes DivideYesDivF10F0F6Yes Register result status Clock F0F2F4F6F8F10F12...F30 21FUAddDivide No L.D MUL.D SUB.D DIV.D ADD.D

42 Scoreboard Example: Cycle 22 Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult F634+R21234 F245+R35678 F0F2F F8F6F F10F0F68 21 F6F8F Functional unit status destS1S2FU for jFU for kFj?Fk? TimeNameBusy OpFiFjFkQjQkRjRk IntegerNo Mult1 Mult2No AddNo 40 DivideYesDivF10F0F6Yes Register result status Clock F0F2F4F6F8F10F12...F30 22FUDivide No L.D MUL.D SUB.D DIV.D ADD.D

43 Scoreboard Example: Cycle 61 Instruction status ReadExecution Write Instructionjk IssueoperandscompleteResult F634+R21234 F245+R35678 F0F2F F8F6F F10F0F F6F8F Functional unit status destS1S2FU for jFU for kFj?Fk? TimeNameBusy OpFiFjFkQjQkRjRk IntegerNo Mult1 Mult2No AddNo 0 DivideYesDivF10F0F6Yes Register result status Clock F0F2F4F6F8F10F12...F30 61FUDivide No L.D MUL.D SUB.D DIV.D ADD.D

44 Scoreboard Example: Cycle 62 Instruction status ReadExecutionWrite Instructionjk IssueoperandscompleteResult F634+R21234 F245+R35678 F0F2F F8F6F F10F0F F6F8F Functional unit status destS1S2FU for jFU for kFj?Fk? TimeNameBusy OpFiFjFkQjQkRjRk IntegerNo Mult1No Mult2No AddNo 0DivideNo Register result status Clock F0F2F4F6F8F10F12...F30 62FU Instruction Block done We have: In-oder issue, Out-of-order execute and commit L.D MUL.D SUB.D DIV.D ADD.D