Dynamic Scheduling and Speculation

Slides:

Advertisements

Similar presentations

Scoreboarding & Tomasulos Approach Bazat pe slide-urile lui Vincent H. Berk.

Advertisements

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.

Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

A scheme to overcome data hazards

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 Instruction-Level Parallelism and Its Exploitation Computer Architecture A Quantitative.

Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.

Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.

COMP25212 Advanced Pipelining Out of Order Processors.

CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Cont. Computer Architecture.

Computer Architecture

1 Zvika Guz Slides modified from Prof. Dave Patterson, Prof. John Kubiatowicz, and Prof. Nancy Warter-Perez Out Of Order Execution.

1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.

Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)

Instruction-Level Parallelism dynamic scheduling prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University May 2015Instruction-Level Parallelism.

1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.

Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.

2/24; 3/1,3/11 (quiz was 2/22, QuizAns 3/8) CSE502-S11, Lec ILP 1 Tomasulo Organization FP adders Add1 Add2 Add3 FP multipliers Mult1 Mult2 From.

Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Speculative Execution 2nd Feb, 2006.

04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 1 Scoreboarding  The following four steps replace ID, EX and WB steps  ID: Issue –

CS203 – Advanced Computer Architecture ILP and Speculation.

Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.

15-740/ Computer Architecture Lecture 7: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University.

Sections 3.2 and 3.3 Dynamic Scheduling – Tomasulo’s Algorithm 吳俊興高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.

Instruction-Level Parallelism and Its Dynamic Exploitation

IBM System 360. Common architecture for a set of machines

Design of Digital Circuits Lecture 18: Out-of-Order Execution

The University of Adelaide, School of Computer Science

/ Computer Architecture and Design

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue

COMP 740: Computer Architecture and Implementation

Approaches to exploiting Instruction Level Parallelism (ILP)

Out of Order Processors

Step by step for Tomasulo Scheme

Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1

CS203 – Advanced Computer Architecture

Lecture 6 Score Board And Tomasulo’s Algorithm

Lecture 10 Tomasulo’s Algorithm

Lecture 12 Reorder Buffers

Chapter 3: ILP and Its Exploitation

Advantages of Dynamic Scheduling

Instruction-level Parallelism

High-level view Out-of-order pipeline

Tomasulo With Reorder buffer:

11/14/2018 CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, Electrical and Computer.

CMSC 611: Advanced Computer Architecture

A Dynamic Algorithm: Tomasulo’s

Out of Order Processors

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

The University of Adelaide, School of Computer Science

Adapted from the slides of Prof

The University of Adelaide, School of Computer Science

Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)

Advanced Computer Architecture

September 20, 2000 Prof. John Kubiatowicz

CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue

Tomasulo Organization

Reduction of Data Hazards Stalls with Dynamic Scheduling

CS5100 Advanced Computer Architecture Dynamic Scheduling

Adapted from the slides of Prof

John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution

Prof. Onur Mutlu Carnegie Mellon University

September 20, 2000 Prof. John Kubiatowicz

CSL718 : Superscalar Processors

High-level view Out-of-order pipeline

Lecture 7 Dynamic Scheduling

Conceptual execution on a processor which exploits ILP

Presentation transcript:

Dynamic Scheduling and Speculation

Outline Dynamic Scheduling Tomasulo’s Algorithm Speculation

Dynamic Scheduling Out-of-order execution Check for structural and data hazards Begin executing as soon as operands are available Implies out-of-order completion WAR and WAW hazards Imprecise exceptions DIV.D F0, F2, F4 ADD.D F10, F0, F8 SUB.D F12, F8, F14

Tomasulo's Algorithm Invented by Robert Tomasulo for the IBM 360/91 Goal: High Performance without special compilers Influenced designs of Alpha 21264, HP 8000, MIPS 10000, Pentium II, Power PC 604 … Tomasulo, [1967]. “An efficient algorithm for exploiting multiple arithmetic units,” IBM J. Research and Development 11:1 (Jan), 25-33.

Tomasulo's Algorithm From Instruction Unit Instruction Queue FP Registers Load/Store operations Store buffers ADDRESS UNIT Load buffers 3 2 2 1 1 Reservation Stations Data Address MEMORY UNIT FP ADDER FP MULTIPLIERS Common Data Bus

Steps in Tomasulo's Algorithm Issue Check for structural hazards Queue in the Reservation Station Keep track of FU generating operand if not available in RF Eliminates WAR and WAW hazards Also called dispatch Execute Monitor CDB for operand (Eliminates RAW hazards) Write result Write result on the CDB RS is marked available

Example √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Qi Mult1 Load2 Add2 Add1 Mult2 Instruction Status Instruction Issue Read operands Write result L.D F6, 34(R2) √ √ √ L.D F2, 44(R3) √ √ √ MUL.D F0,F2,F4 √ √ √ SUB.D F8,F2,F6 √ √ √ DIV.D F10,F0,F6 √ ADD.D F6,F8,F2 √ √ √ Reservation Stations Name Busy Op Vj Vk Qj Qk A Load1 Load2 Add1 Add2 Add3 Mult1 Mult2 yes no Load 34 34+Regs[R2] yes no Load 44 44+Regs[R3] yes no SUB Mem[44+Regs[R3]] Mem[34+Regs[R2]] Load2 Load1 yes no ADD Add1[F8] Mem[44+Regs[R3]] Add1 Load2 no yes no MUL Mem[44+Regs[R3]] Regs[F4] Load2 yes DIV Mem[34+Regs[R2]] Mult1 Load1 Register Status Field F0 F2 F4 F6 F8 F10 F12 ... F30 Qi Mult1 Load2 Add2 Add1 Mult2

Hardware based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register file when instruction is no longer speculative Need an additional piece of hardware to prevent any irrevocable action until an instruction commits Reorder Buffer In-order commit Stores instruction results before instruction commits Clear ROB on misprediction Exceptions

Tomasulo's Algorithm with Speculation

Dynamic Scheduling+Multiple Issue+Speculation Limit the number of instructions of a given class that can be issued in a “bundle” Eg. one integer, one FP, one load/store Examine all the dependencies among the instructions in the bundle Also need multiple completion/commit

Dynamic Scheduling + Multiple Issue 2-way Superscalar Instructions Issues at clock Executes at clock Mem Access at clock Write CDB at clock 1 LD R2, 0(R1) DADDIU R2, R2, #1 SD R2, 0(R1) DADDIU R1, R1, #8 BNE R2, R3, L 2 3 1 2 3 4 1 5 6 2 3 7 2 3 4 3 7 4 8 9 10 4 11 12 5 9 13 5 8 9 6 13 7 14 15 16 7 17 18 8 15 19 8 14 15 9 19 Next Tutorial

Dynamic Scheduling + Multiple Issue + Speculation 2-way Superscalar Instructions Issues at clock Executes at clock Mem Access at clock Write CDB at clock Commits at clock 1 LD R2, 0(R1) DADDIU R2, R2, #1 SD R2, 0(R1) DADDIU R1, R1, #8 BNE R2, R3, L 2 3 1 2 3 4 5 1 5 6 7 2 3 7 2 3 4 8 3 7 8 4 5 6 7 9 4 8 9 10 5 6 10 5 6 7 11 6 10 11 7 8 9 10 12 7 11 12 13 8 9 13 8 9 10 14 9 13 14 Next Tutorial

Multithreading Execution Slots

Paper Reading Smith and Sohi. Microarchitecture of Superscalar Processors. Proc. of IEEE. 1995.

Literature on Processors Yeager, The MIPS R10000 Processor, MICRO, 1996. Hinton et. al., The Microarchitecture of the Pentium 4 Processor. Intel Technology Journal Q1, 2001. R. E. Kessler, The Alpha 21264 Microprocessor. IEEE Micro, 19(2), 1999. Kahle, et. al. Introduction to the Cell multiprocessor. IBM J. RES. & DEV. 2005. Hammerlund, et. al., Haswell: The fourth generation Intel Processor, MICRO 2014.

References Shen and Lipasti. Modern Processor Design. Hennessy and Patterson. CA. 5ed. González, Latorre and Magklis, Processor Microarchitecture - An Implementation Perspective”, SLoCA#12.