CET 520/494 -- Gannod1 Section A.8 Dynamic Scheduling using a Scoreboard.

Slides:



Advertisements
Similar presentations
Scoreboarding & Tomasulos Approach Bazat pe slide-urile lui Vincent H. Berk.
Advertisements

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.
Instruction-level Parallelism Compiler Perspectives on Code Movement dependencies are a property of code, whether or not it is a HW hazard depends on.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 3, 2005 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Introduction)
Advanced Computer Architectures Laboratory on DLX Pipelining Vittorio Zaccaria.
Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
A scheme to overcome data hazards
COMP4611 Tutorial 6 Instruction Level Parallelism
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.
Lecture 6: ILP HW Case Study— CDC 6600 Scoreboard & Tomasulo’s Algorithm Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
COMP25212 Advanced Pipelining Out of Order Processors.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 7: Dynamic Scheduling and Branch Prediction * Jeremy R. Johnson Wed. Nov. 8, 2000.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Complex Pipelining II Steve Ko Computer Sciences and Engineering University at Buffalo.
CMSC 611: Advanced Computer Architecture Scoreboard Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.
Computer Architecture
Data Hazards RAW Hazard ADD.D F3, F1, F2 SUB.D F5, F6, F3 No Solution, normal property of programs WAW Hazard DIV.D F3, F1, F2 SUB.D F3, F6, F5 This instruction.
EENG449b/Savvides Lec /22/05 March 22, 2005 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
Review of CS 203A Laxmi Narayan Bhuyan Lecture2.
COMP381 by M. Hamdi 1 Pipelining (Dynamic Scheduling Through Hardware Schemes)
1 Recap (Scoreboarding). 2 Dynamic Scheduling Dynamic Scheduling by Hardware – – Allow Out-of-order execution, Out-of-order completion – – Even though.
ENGS 116 Lecture 71 Scoreboarding Vincent H. Berk October 8, 2008 Reading for today: A.5 – A.6, article: Smith&Pleszkun FRIDAY: NO CLASS Reading for Monday:
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 5, 2005 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Scoreboarding)
Expl. ILP & Dyn.Sched CSE 4711 How to improve (decrease) CPI Recall: CPI = Ideal CPI + CPI contributed by stalls Ideal CPI =1 for single issue machine.
Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)
Out-of-order execution: Scoreboarding and Tomasulo Week 2
Instruction-Level Parallelism Dynamic Scheduling
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.
1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,
CSC 4250 Computer Architectures September 29, 2006 Appendix A. Pipelining.
Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.
04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;
CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 1 Scoreboarding  The following four steps replace ID, EX and WB steps  ID: Issue –
COMP25212 Advanced Pipelining Out of Order Processors.
Sections 3.2 and 3.3 Dynamic Scheduling – Tomasulo’s Algorithm 吳俊興 高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.
Instruction-Level Parallelism and Its Dynamic Exploitation
IBM System 360. Common architecture for a set of machines
Images from Patterson-Hennessy Book
/ Computer Architecture and Design
Out of Order Processors
Step by step for Tomasulo Scheme
CS203 – Advanced Computer Architecture
Lecture 6 Score Board And Tomasulo’s Algorithm
9/18/2018 CPE 631 Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, Electrical and Computer.
Advantages of Dynamic Scheduling
High-level view Out-of-order pipeline
CMSC 611: Advanced Computer Architecture
A Dynamic Algorithm: Tomasulo’s
COMP s1 Seminar 3: Dynamic Scheduling
Out of Order Processors
Last Week Talks Any feedback from the talks? What did you like?
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
CS 704 Advanced Computer Architecture
Checking for issue/dispatch
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
Static vs. dynamic scheduling
CSCE430/830 Computer Architecture
Advanced Computer Architecture
Static vs. dynamic scheduling
Tomasulo Organization
Reduction of Data Hazards Stalls with Dynamic Scheduling
Lecture 5 Scoreboarding: Enforce Register Data Dependence
CS152 Computer Architecture and Engineering Lecture 16 Compiler Optimizations (Cont) Dynamic Scheduling with Scoreboards.
Scoreboarding ENGS 116 Lecture 7 Vincent H. Berk October 5, 2005
/ Computer Architecture and Design
CSL718 : Superscalar Processors
High-level view Out-of-order pipeline
Lecture 7 Dynamic Scheduling
Conceptual execution on a processor which exploits ILP
Presentation transcript:

CET 520/ Gannod1 Section A.8 Dynamic Scheduling using a Scoreboard

CET 520/ Gannod2 Motivation In the pipeline that we’ve looked at, if there is an unavoidable hazard, we must stall until the hazard is resolved. –no new instructions can be fetched or issued even if they are independent. –e.g., suppose FP ADD/SUB is 2 cycles, MUL is 10 cycles and DIV is 40 cycles. L.DF6, 34(R2) L.DF2, 45(R3) MUL.DF0, F2, F4 SUB.DF8, F6, F2 DIV.DF10, F0, F6 ADD.DF6, F8, F2

CET 520/ Gannod3 Static Scheduling The compiler can attempt to reorder instructions (schedule instructions) to avoid hazards. The compiler (software) approach is called static scheduling. e.g, How should the compiler schedule the following: 1: ADD.DF1, F12, F5 2: ADD.DF2, F3, F4 3: ADD.DF2, F2, F1 4: ADD.DF6, F2, F11 5: SUB.DF1, F8, F7 6: ADD.DF5, F5, F5

CET 520/ Gannod4 Example How many cycles required?

CET 520/ Gannod5 Example cont... List all potential data hazards Find all legal orderings ideal ordering? How many cycles?

CET 520/ Gannod6 Dynamic Scheduling Previous example is a SOFTWARE solution. –instructions MUST execute in the order in which they are fetched. –the compiler’s job is to make sure the instructions are fetched in the best order. –this is not so easy In dynamic scheduling, the HARDWARE rearranges the instruction execution. –separate ID into two stages: issue and read operands –instructions can be issued in order –instructions can be exectued out-of- order –instructions can be completed out-of- order

CET 520/ Gannod7 Dynamic Pipeline w/ Scoreboard have several functional units. Let’s assume 2 FP mult, 1 FP adder, 1 FP div, and 1 integer Scoreboard –records data dependences –determines when an instruction can read its operands and begin execution –controls when an instruction can write into its destination register. –takes care of all hazard detection and resolution

CET 520/ Gannod8 Pipeline Stages IF Issue (IS) –if there is free functional unit and no WAW hazard, issue the instr. Read Operands (OP) –Monitors operands, and tells instr when operands can be read and proceed to execution Execution (EX) –complete execution and notify scoreboard Write Result (WB) –check for WAR hazard and stall if necessary.

CET 520/ Gannod9 Scoreboard There are 3 parts to the scoreboard: –Instruction Status (Issue? Read Operands? Execution Complete? Write Result?) –Functional Unit Status Busy – Is it busy? Op – operation being performed Fi – destination register Fj, Fk – source registers Qj, Qk – functional units producing Fj, Fk Rj, Rk – Flags indicating when Fj, Fk are ready and not yet read –Register Result Status indicates which functional unit will write each register

CET 520/ Gannod10 Example L.DF6, 34(R2) L.DF2, 45(R3) MUL.DF0, F2, F4 SUB.DF8, F6, F2 DIV.DF10, F0, F6 ADD.DF6, F8, F2 InstructionIssueRead Operands Execution Complete Write Result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2

CET 520/ Gannod11 NameBusyOpFiFjFkQjQkRjRk Integer Mult1 Mult2 Add Divide Functional Unit Status Register Result Status F0F2F4F6F8F10...F30 FU