Checking for issue/dispatch

Slides:

Advertisements

Similar presentations

Spring 2003CSE P5481 Out-of-Order Execution Several implementations out-of-order completion CDC 6600 with scoreboarding IBM 360/91 with Tomasulos algorithm.

Advertisements

Scoreboarding & Tomasulos Approach Bazat pe slide-urile lui Vincent H. Berk.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 3, 2005 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Introduction)

Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.

Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.

Lecture 6: ILP HW Case Study— CDC 6600 Scoreboard & Tomasulo’s Algorithm Professor Alvin R. Lebeck Computer Science 220 Fall 2001.

COMP25212 Advanced Pipelining Out of Order Processors.

CMSC 611: Advanced Computer Architecture Scoreboard Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Microprocessor Microarchitecture Dependency and OOO Execution Lynn Choi Dept. Of Computer and Electronics Engineering.

Instruction-Level Parallelism (ILP)

Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)

CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

1 Tomasulo’s Algorithm and IBM 360 Srivathsan Soundararajan.

EENG449b/Savvides Lec /20/04 February 12, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.

1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.

COMP381 by M. Hamdi 1 Pipelining (Dynamic Scheduling Through Hardware Schemes)

ENGS 116 Lecture 71 Scoreboarding Vincent H. Berk October 8, 2008 Reading for today: A.5 – A.6, article: Smith&Pleszkun FRIDAY: NO CLASS Reading for Monday:

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 5, 2005 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Scoreboarding)

Expl. ILP & Dyn.Sched CSE 4711 How to improve (decrease) CPI Recall: CPI = Ideal CPI + CPI contributed by stalls Ideal CPI =1 for single issue machine.

Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)

1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.

1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.

Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.

1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,

CSC 4250 Computer Architectures September 29, 2006 Appendix A. Pipelining.

04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;

COMP25212 Advanced Pipelining Out of Order Processors.

CS203 – Advanced Computer Architecture ILP and Speculation.

Instruction-Level Parallelism and Its Dynamic Exploitation

IBM System 360. Common architecture for a set of machines

Dynamic Scheduling Why go out of style?

Images from Patterson-Hennessy Book

/ Computer Architecture and Design

Tomasulo’s Algorithm Born of necessity

Out of Order Processors

Step by step for Tomasulo Scheme

CS203 – Advanced Computer Architecture

Microprocessor Microarchitecture Dynamic Pipeline

Lecture 6 Score Board And Tomasulo’s Algorithm

Advantages of Dynamic Scheduling

High-level view Out-of-order pipeline

CMSC 611: Advanced Computer Architecture

A Dynamic Algorithm: Tomasulo’s

COMP s1 Seminar 3: Dynamic Scheduling

Out of Order Processors

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

CS 704 Advanced Computer Architecture

Adapted from the slides of Prof

Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)

How to improve (decrease) CPI

Pipeline control unit (highly abstracted)

Static vs. dynamic scheduling

CSCE430/830 Computer Architecture

Static vs. dynamic scheduling

Instruction Level Parallelism (ILP)

Tomasulo Organization

Reduction of Data Hazards Stalls with Dynamic Scheduling

Adapted from the slides of Prof

CS152 Computer Architecture and Engineering Lecture 16 Compiler Optimizations (Cont) Dynamic Scheduling with Scoreboards.

Scoreboarding ENGS 116 Lecture 7 Vincent H. Berk October 5, 2005

Pipeline control unit (highly abstracted)

Extending simple pipeline to multiple pipes

September 20, 2000 Prof. John Kubiatowicz

Pipeline Control unit (highly abstracted)

How to improve (decrease) CPI

High-level view Out-of-order pipeline

Lecture 7 Dynamic Scheduling

CSE 586 Computer Architecture Lecture 3

Conceptual execution on a processor which exploits ILP

Presentation transcript:

Checking for issue/dispatch Split the ID stage into: Issue : decode instructions; check for structural hazards (stall if there are any). Instructions pass in this stage in order Dispatch: several options depending on whether functional units have buffers or not An instruction can be issued even if a previous one has not been dispatched After the ID stage, the instruction enters an EX stage as before 12/4/2018 CSE 471 Dynamic scheduling

Implementations of dynamic scheduling In order to compute correct results, need to keep track of : execution pipes register usage for read and write completion etc. Two major techniques Scoreboard (invented by Seymour Cray for the CDC 6600 in 1964) Tomasulo’s algorithm (used in the IBM 360/91 in 1967) 12/4/2018 CSE 471 Dynamic scheduling

Scoreboarding -- The example machine Registers Data buses Functional units (pipes) scoreboard Control lines /status 12/4/2018 CSE 471 Dynamic scheduling

CSE 471 Dynamic scheduling Scoreboard basic idea Every instruction goes through the scoreboard before being issued The scoreboard keeps a record of all data dependencies The scoreboard keeps a record of all pipe occupancies The scoreboard decides if an instruction can be issued Either the first time it sees it (no hazard) or later (when hazards are resolved) The scoreboard decides if an instruction can store its result 12/4/2018 CSE 471 Dynamic scheduling

An instruction goes through 4 steps 1. Issue The execution unit must be free (no structural hazard) There should be no WAW hazard If either of these conditions is false the instruction stalls. No further issue is allowed. 12/4/2018 CSE 471 Dynamic scheduling

CSE 471 Dynamic scheduling 2. Read operands When the instruction is issued, the execution unit is reserved (becomes busy) Operands are read in the execution unit when they are ready (i.e., are not results of still executing instructions). This prevents RAW hazards 3. Execution One or more cycles depending on functional unit latency When execution completes, the unit notifies the scoreboard it’s ready to write the result 12/4/2018 CSE 471 Dynamic scheduling

CSE 471 Dynamic scheduling 4. Write result Before writing, check for WAR hazards. If one exists, the unit is stalled until all WAR hazards are cleared (note that an instruction in progress, i.e., whose operands have been read, won’t cause a WAR) Miscellaneous historical notes In the original machine, forwarding was not implemented (hence one unit of delay between writing and reading the same register) Similarly, it took one unit of time between the release of a unit and its possible next occupancy 12/4/2018 CSE 471 Dynamic scheduling

What is needed in the scoreboard Status of each functional unit Free or busy Operation to be performed The names of the result Fi and source Fj, Fk registers, Flags Rj, Rk indicating whether the source registers are ready Names Qj,Qk of the units (if any) producing values for Fj, Fk Status of result registers For each Fi the name of the unit (if any), say Pi that will produce its contents The instruction status Been issued, in execution, ready to write, finished? 12/4/2018 CSE 471 Dynamic scheduling

CSE 471 Dynamic scheduling Tomasulo’s algorithm “Weaknesses” in scoreboard: Centralized control No forwarding (more RAW than needed) Tomasulo’s algorithm as implemented first in IBM 360/91 Control decentralized at each functional unit Forwarding Concept and implementation of renaming registers that eliminates WAR and WAW hazards 12/4/2018 CSE 471 Dynamic scheduling

CSE 471 Dynamic scheduling Reservation stations With each functional unit, have a set of buffers or reservation stations Keep operands and function to perform Operands can be values or names of reservation stations that will produce the value (one form of register renaming) with appropriate flags Not both operands have to be “ready” at the same time When both operands have values, functional unit can execute on that pair of operands When a functional unit computes a result, it broadcasts the name of the corresponding reservation station and the value. Might not store a result in a real register 12/4/2018 CSE 471 Dynamic scheduling

CSE 471 Dynamic scheduling Example machine From memory From I-unit Load buffers Fp registers Common broadcast data bus Store buffers Reservation stations To memory F-p units 12/4/2018 CSE 471 Dynamic scheduling

Tomasulo’s solution: hazards Structural hazards No free reservation station (stall at issue time) RAW hazard (detected in each functional unit --decentralized) Stall at execution time No WAR and WAW hazards Because of register renaming through reservation stations Forwarding Done at end of execution by use of a common (broadcast) data bus 12/4/2018 CSE 471 Dynamic scheduling

CSE 471 Dynamic scheduling 1. Issue Check for structural hazard (no free reservation station or no free load-store buffer for a memory operation) Rename registers if needed (see next slide) 2. Execute If one or more operands is not ready, monitor the bus for broadcast of a result When both operands have values, execute 3. Write result Broadcast name of the reservation station and value computed. 12/4/2018 CSE 471 Dynamic scheduling

CSE 471 Dynamic scheduling Implementation All registers (except load buffers) contain a tag (Qi) indicating which f-u will compute its contents OR a value The tag (or name) can be: Zero (or a special pattern) meaning that we have a value the name of a load buffer the name of a reservation station A reservation station consists of : The operation to be performed 2 pairs (value,tag) (Vj,Qj) (Vk,Qk) A flag indicating whether the accompanying f-u is busy or not 12/4/2018 CSE 471 Dynamic scheduling

Yes but …how about branches, exceptions? Recall These machines were built in the 1960’s! No concept of branch prediction No virtual memory Tomasulo’s algorithm only for the f-p unit (imprecise exception) Need of a more sophisticated retire (commit) unit To store results in order To nullify results of completed but mis-speculated instructions Need of a more sophisticated register renaming scheme (with real registers) 12/4/2018 CSE 471 Dynamic scheduling