CS 704 Advanced Computer Architecture

Slides:



Advertisements
Similar presentations
Scoreboarding & Tomasulos Approach Bazat pe slide-urile lui Vincent H. Berk.
Advertisements

Instruction-level Parallelism Compiler Perspectives on Code Movement dependencies are a property of code, whether or not it is a HW hazard depends on.
A scheme to overcome data hazards
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
Lecture 6: ILP HW Case Study— CDC 6600 Scoreboard & Tomasulo’s Algorithm Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
COMP25212 Advanced Pipelining Out of Order Processors.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 7: Dynamic Scheduling and Branch Prediction * Jeremy R. Johnson Wed. Nov. 8, 2000.
CMSC 611: Advanced Computer Architecture Scoreboard Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.
Computer Architecture
Data Hazards RAW Hazard ADD.D F3, F1, F2 SUB.D F5, F6, F3 No Solution, normal property of programs WAW Hazard DIV.D F3, F1, F2 SUB.D F3, F6, F5 This instruction.
Review of CS 203A Laxmi Narayan Bhuyan Lecture2.
1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.
COMP381 by M. Hamdi 1 Pipelining (Dynamic Scheduling Through Hardware Schemes)
1 Recap (Scoreboarding). 2 Dynamic Scheduling Dynamic Scheduling by Hardware – – Allow Out-of-order execution, Out-of-order completion – – Even though.
ENGS 116 Lecture 71 Scoreboarding Vincent H. Berk October 8, 2008 Reading for today: A.5 – A.6, article: Smith&Pleszkun FRIDAY: NO CLASS Reading for Monday:
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 5, 2005 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Scoreboarding)
EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)
CET 520/ Gannod1 Section A.8 Dynamic Scheduling using a Scoreboard.
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.
1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,
CSC 4250 Computer Architectures September 29, 2006 Appendix A. Pipelining.
04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;
CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 1 Scoreboarding  The following four steps replace ID, EX and WB steps  ID: Issue –
COMP25212 Advanced Pipelining Out of Order Processors.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Instruction-Level Parallelism and Its Dynamic Exploitation
IBM System 360. Common architecture for a set of machines
CS 352H: Computer Systems Architecture
Instruction-Level Parallelism
Images from Patterson-Hennessy Book
/ Computer Architecture and Design
Lecture 07: Pipelining Multicycle, MIPS R4000, and More
CS 704 Advanced Computer Architecture
Out of Order Processors
Step by step for Tomasulo Scheme
CS203 – Advanced Computer Architecture
Pipeline Implementation (4.6)
Lecture 6 Score Board And Tomasulo’s Algorithm
Advantages of Dynamic Scheduling
High-level view Out-of-order pipeline
CMSC 611: Advanced Computer Architecture
A Dynamic Algorithm: Tomasulo’s
COMP s1 Seminar 3: Dynamic Scheduling
Out of Order Processors
Last Week Talks Any feedback from the talks? What did you like?
Pipelining Multicycle, MIPS R4000, and More
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
CS 704 Advanced Computer Architecture
Checking for issue/dispatch
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
Static vs. dynamic scheduling
CSCE430/830 Computer Architecture
Advanced Computer Architecture
Static vs. dynamic scheduling
Tomasulo Organization
Reduction of Data Hazards Stalls with Dynamic Scheduling
Lecture 5 Scoreboarding: Enforce Register Data Dependence
CS152 Computer Architecture and Engineering Lecture 16 Compiler Optimizations (Cont) Dynamic Scheduling with Scoreboards.
Scoreboarding ENGS 116 Lecture 7 Vincent H. Berk October 5, 2005
/ Computer Architecture and Design
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
September 20, 2000 Prof. John Kubiatowicz
CSL718 : Superscalar Processors
High-level view Out-of-order pipeline
Lecture 7 Dynamic Scheduling
CMSC 611: Advanced Computer Architecture
Conceptual execution on a processor which exploits ILP
Presentation transcript:

CS 704 Advanced Computer Architecture Lecture 13 Instruction Level Parallelism (Dynamic Scheduling - Scoreboard Approach) Prof. Dr. M. Ashraf Chughtai Welcome to the 13h lecture of the series of lectures on Advanced Computer Architecture. Today we will focus on the ILP

Lecture 13 – Instruction Level Parallelism -Dynamic (2) Today's Topics Recap - Lecture 11-12 Out-of-Order Execution Dynamic Scheduling Scoreboard Technique Summary MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) Recap: Lecture 12 - FP and Integer Multiplier - FP and Integer Divider Here, we observed that : - Only one instruction is issued on every clock cycle the integer ADD instructions go through the FP pipeline as they go through in standard pipeline – as the integer ALU operations have ZERO latency the FP add and FP/integer multiply and divide instructions enter into loop when they reach EX-stage due to longer latencies of these operations – thus increases the number of stalls before the instruction is issued to EX stage MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) Recap: Lecture 12 RAW and WAR hazards may occur because the instruction are of varying length and may reach WB out-of-order There are different ways to RAW hazard: MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) Recap: Lecture 12 WAW hazard (The jth instruction writes prior to the ith instruction; the ith instruction overwrites the result of jth instruction) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) Recap: Lecture 12 Two ways to resolve WAW hazard Delay the issue of jth instruction until the ith instruction enters the MEM stage Stamp out the ith instruction by detecting the hazard and changing the control (WB) so that the ith instruction does not write. Hence, the jth instruction can be issued right-away. MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) Today's Topics Out-of-Order Execution Problems of Out-of-order execution Dynamic Scheduling Scoreboard Technique Summary MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) In-Order Execution Simple Pipelined datapath facilitates only the In-order instruction execution, i.e., Instructions are fetched, decoded and issued in the sequence of the program and no later instruction can proceed if an instruction is stalled due to hazard – structural or data dependence MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

In-order Execution … Cont’d For example: in the code DIV.D F0, F2, F4 ADD.D F10, F0, F8 SUB.D F12, F8, F14 MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

In-order Execution … Cont’d Conclusion MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

In-order Execution: MIPS 5-stage Pipeline The MIPS 5-stage pipeline, both the structural and data hazards are checked during the Instruction Decode (ID) stage; and the instruction is issued from ID stage, if it could execute properly Here, the issue process, at ID stage, is separated into two parts: Checking the structural hazard Waiting for the absence of data hazard MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Out-of-order Execution: MIPS 5-stage pipeline DIV.D F0, F2, F4 ADD.D F10, F0, F8 SUB.D F12, F8, F14 MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) Today's Topics Out-of-Order Execution Problems of Out-of-order execution Dynamic Scheduling Scoreboard Technique Summary MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Basic Problems of Out-of-order Execution Consider the example FP code DIV.D F0, F2, F4 ADD.D F6, F0, F8 SUB.D F8, F10, F14 MUL.D F6, F10, F8 MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Example Explained: RAW Hazard MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Example Explained: WAW hazard MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Example Explained: WAW hazard MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) Today's Topics Out-of-Order Execution Problems of Out-of-order execution Dynamic Scheduling Scoreboard Technique Summary MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Scheduling for out-of-order execution Static Scheduling: Rearrangement of the instruction execution by the compiler Dynamic Scheduling: Rearrangement of the instruction execution by the hardware MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) Dynamic Scheduling - Issue: - Read Operand: MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) Dynamic Scheduling MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Dynamic Scheduling: Score boarding Technique CDC 6600 contains: - 4 FP units - 5 Memory Reference Units - 7 integer operation units MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

MIPS Processor with Scoreboard Registers Integer Unit FP Adder FP Divide FP Mul Data Buses Control/status MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Features of Scoreboard The Scoreboard : MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Components of Scoreboard Instruction Status Functional Unit Status Register Result Status MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) Instruction Status These four stages are: - Issue: If a functional unit for instruction is free and no other active instruction has the same destination register, the score board issues the instruction to the functional unit and updates the internal data structure – Thus guarantees that WAW cannot be present MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) Instruction Status - Read Operand: The score board monitors availability of the source operand, i.e., checks if no earlier issued active instruction is going to write – Thus, it resolves RAW hazard - Execute: The FU begins the execution and notify the scoreboard when it has completed the execution. The scoreboard then updates the data structure MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) Instruction Status - Write Result: Once the score board is aware that the FU has completed execution then checks for the WAR hazard , it stalls if necessary and writes the result MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Instruction Status Data structure Instruction Issue Read Execution Write Operands complete result MUL.D √ √ ADD.D √ √ √ √ MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Functional Unit Status Busy: A single bit field which indicates if the FU is busy or not OP: 2 or 3 bit field specifying the operation being berformed by FU ( e.g. ADD or SUB etc. Registers: Fi – Destination register Number Fj, Fk – Source registers number Qj and Qk: The FU (ADD, MUL, ….) producing source register Fj and Fk Rj, Rk: Flags indicating source registers Rj, Rk are ready and not yet read – Set to NO when operand are read MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Typical Functional Unit Status table FU name Busy OP Fi Fj Fk Qj Qk Rj Rk MUL1 Y Mul F0 F2 F4 -- -- No No …. DIVIDE Y Div F10 F0 F6 MUL1 -- No Yes MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Register Result Status Format of the Table: F0 F2 F4 F6 F8 F10 F12 …… F30 FU Mul1 Add Divide MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) Summary Problems of Out-of-order execution Dynamic Scheduling Scoreboard Technique Summary MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)

Lecture 13 – Instruction Level Parallelism -Dynamic (2) Aslam-u-Alacun And Allah Hafiz MAC/VU-Advanced Computer Architecture Lecture 13 – Instruction Level Parallelism -Dynamic (2)