CS 704 Advanced Computer Architecture Lecture 13 Instruction Level Parallelism (Dynamic Scheduling - Scoreboard Approach) Prof. Dr. M. Ashraf Chughtai Welcome to the 13h lecture of the series of lectures on Advanced Computer Architecture. Today we will focus on the ILP
Lecture 13 – Instruction Level Parallelism -Dynamic (2) Today's Topics Recap - Lecture 11-12 Out-of-Order Execution Dynamic Scheduling Scoreboard Technique Summary MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) Recap: Lecture 12 - FP and Integer Multiplier - FP and Integer Divider Here, we observed that : - Only one instruction is issued on every clock cycle the integer ADD instructions go through the FP pipeline as they go through in standard pipeline – as the integer ALU operations have ZERO latency the FP add and FP/integer multiply and divide instructions enter into loop when they reach EX-stage due to longer latencies of these operations – thus increases the number of stalls before the instruction is issued to EX stage MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) Recap: Lecture 12 RAW and WAR hazards may occur because the instruction are of varying length and may reach WB out-of-order There are different ways to RAW hazard: MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) Recap: Lecture 12 WAW hazard (The jth instruction writes prior to the ith instruction; the ith instruction overwrites the result of jth instruction) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) Recap: Lecture 12 Two ways to resolve WAW hazard Delay the issue of jth instruction until the ith instruction enters the MEM stage Stamp out the ith instruction by detecting the hazard and changing the control (WB) so that the ith instruction does not write. Hence, the jth instruction can be issued right-away. MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) Today's Topics Out-of-Order Execution Problems of Out-of-order execution Dynamic Scheduling Scoreboard Technique Summary MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) In-Order Execution Simple Pipelined datapath facilitates only the In-order instruction execution, i.e., Instructions are fetched, decoded and issued in the sequence of the program and no later instruction can proceed if an instruction is stalled due to hazard – structural or data dependence MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
In-order Execution … Cont’d For example: in the code DIV.D F0, F2, F4 ADD.D F10, F0, F8 SUB.D F12, F8, F14 MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
In-order Execution … Cont’d Conclusion MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
In-order Execution: MIPS 5-stage Pipeline The MIPS 5-stage pipeline, both the structural and data hazards are checked during the Instruction Decode (ID) stage; and the instruction is issued from ID stage, if it could execute properly Here, the issue process, at ID stage, is separated into two parts: Checking the structural hazard Waiting for the absence of data hazard MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Out-of-order Execution: MIPS 5-stage pipeline DIV.D F0, F2, F4 ADD.D F10, F0, F8 SUB.D F12, F8, F14 MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) Today's Topics Out-of-Order Execution Problems of Out-of-order execution Dynamic Scheduling Scoreboard Technique Summary MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Basic Problems of Out-of-order Execution Consider the example FP code DIV.D F0, F2, F4 ADD.D F6, F0, F8 SUB.D F8, F10, F14 MUL.D F6, F10, F8 MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Example Explained: RAW Hazard MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Example Explained: WAW hazard MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Example Explained: WAW hazard MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) Today's Topics Out-of-Order Execution Problems of Out-of-order execution Dynamic Scheduling Scoreboard Technique Summary MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Scheduling for out-of-order execution Static Scheduling: Rearrangement of the instruction execution by the compiler Dynamic Scheduling: Rearrangement of the instruction execution by the hardware MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) Dynamic Scheduling - Issue: - Read Operand: MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) Dynamic Scheduling MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Dynamic Scheduling: Score boarding Technique CDC 6600 contains: - 4 FP units - 5 Memory Reference Units - 7 integer operation units MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
MIPS Processor with Scoreboard Registers Integer Unit FP Adder FP Divide FP Mul Data Buses Control/status MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Features of Scoreboard The Scoreboard : MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Components of Scoreboard Instruction Status Functional Unit Status Register Result Status MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) Instruction Status These four stages are: - Issue: If a functional unit for instruction is free and no other active instruction has the same destination register, the score board issues the instruction to the functional unit and updates the internal data structure – Thus guarantees that WAW cannot be present MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) Instruction Status - Read Operand: The score board monitors availability of the source operand, i.e., checks if no earlier issued active instruction is going to write – Thus, it resolves RAW hazard - Execute: The FU begins the execution and notify the scoreboard when it has completed the execution. The scoreboard then updates the data structure MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) Instruction Status - Write Result: Once the score board is aware that the FU has completed execution then checks for the WAR hazard , it stalls if necessary and writes the result MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Instruction Status Data structure Instruction Issue Read Execution Write Operands complete result MUL.D √ √ ADD.D √ √ √ √ MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Functional Unit Status Busy: A single bit field which indicates if the FU is busy or not OP: 2 or 3 bit field specifying the operation being berformed by FU ( e.g. ADD or SUB etc. Registers: Fi – Destination register Number Fj, Fk – Source registers number Qj and Qk: The FU (ADD, MUL, ….) producing source register Fj and Fk Rj, Rk: Flags indicating source registers Rj, Rk are ready and not yet read – Set to NO when operand are read MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Typical Functional Unit Status table FU name Busy OP Fi Fj Fk Qj Qk Rj Rk MUL1 Y Mul F0 F2 F4 -- -- No No …. DIVIDE Y Div F10 F0 F6 MUL1 -- No Yes MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Register Result Status Format of the Table: F0 F2 F4 F6 F8 F10 F12 …… F30 FU Mul1 Add Divide MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) Summary Problems of Out-of-order execution Dynamic Scheduling Scoreboard Technique Summary MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)
Lecture 13 – Instruction Level Parallelism -Dynamic (2) Aslam-u-Alacun And Allah Hafiz MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)