Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Tomasulo Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advertisements

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.
Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 19, 2005 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
A scheme to overcome data hazards
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
COMP25212 Advanced Pipelining Out of Order Processors.
Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.
Computer Architecture Lec 8 – Instruction Level Parallelism.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 14, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
EEL 5708 Speculation. Branch prediction. Superscalar processors. Lotzi Bölöni.
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
DAP Spr.‘98 ©UCB 1 Lecture 6: ILP Techniques Contd. Laxmi N. Bhuyan CS 162 Spring 2003.
CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.
1 Zvika Guz Slides modified from Prof. Dave Patterson, Prof. John Kubiatowicz, and Prof. Nancy Warter-Perez Out Of Order Execution.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 9, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
Review of CS 203A Laxmi Narayan Bhuyan Lecture2.
1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.
1 Overcoming Control Hazards with Dynamic Scheduling & Speculation.
1 Chapter 2: ILP and Its Exploitation Review simple static pipeline ILP Overview Dynamic branch prediction Dynamic scheduling, out-of-order execution Hardware-based.
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.
2/24; 3/1,3/11 (quiz was 2/22, QuizAns 3/8) CSE502-S11, Lec ILP 1 Tomasulo Organization FP adders Add1 Add2 Add3 FP multipliers Mult1 Mult2 From.
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
CS 5513 Computer Architecture Lecture 6 – Instruction Level Parallelism continued.
CS203 – Advanced Computer Architecture ILP and Speculation.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Sections 3.2 and 3.3 Dynamic Scheduling – Tomasulo’s Algorithm 吳俊興 高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.
Instruction-Level Parallelism and Its Dynamic Exploitation
IBM System 360. Common architecture for a set of machines
/ Computer Architecture and Design
/ Computer Architecture and Design
COMP 740: Computer Architecture and Implementation
Step by step for Tomasulo Scheme
CS203 – Advanced Computer Architecture
CS5100 Advanced Computer Architecture Hardware-Based Speculation
CPE 631 Lecture 15: Exploiting ILP with SW Approaches
Lecture 12 Reorder Buffers
CPSC 614 Computer Architecture Lec 5 – Instruction Level Parallelism
March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002
Chapter 3: ILP and Its Exploitation
Advantages of Dynamic Scheduling
John Kubiatowicz Electrical Engineering and Computer Sciences
Tomasulo With Reorder buffer:
11/14/2018 CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, Electrical and Computer.
CMSC 611: Advanced Computer Architecture
A Dynamic Algorithm: Tomasulo’s
COMP s1 Seminar 3: Dynamic Scheduling
Out of Order Processors
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
CS252 Graduate Computer Architecture Lecture 7 Dynamic Scheduling 2: Precise Interrupts February 9th, 2010 John Kubiatowicz Electrical Engineering.
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
12/2/2018 CPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, Electrical and Computer.
Adapted from the slides of Prof
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002
John Kubiatowicz Electrical Engineering and Computer Sciences
Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1
September 20, 2000 Prof. John Kubiatowicz
1/2/2019 CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, Electrical and Computer.
Larry Wittie Computer Science, StonyBrook University and ~lw
Tomasulo Organization
CPSC 614 Computer Architecture Lec 5 – Instruction Level Parallelism
Adapted from the slides of Prof
CS252 Graduate Computer Architecture Lecture 6 Tomasulo, Implicit Register Renaming, Loop-Level Parallelism Extraction Explicit Register Renaming February.
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
Chapter 3: ILP and Its Exploitation
September 20, 2000 Prof. John Kubiatowicz
Overcoming Control Hazards with Dynamic Scheduling & Speculation
Presentation transcript:

Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 SUBI R1 R1 #8 BNEZ R1 Loop Assume Multiply takes 4 clocks Assume first load takes 8 clocks (cache miss), second load takes 1 clock (hit) To be clear, will show clocks for SUBI, BNEZ Reality: integer instructions ahead

Loop Example

Loop Example Cycle 1

Loop Example Cycle 2

Loop Example Cycle 3 Implicit renaming sets up “DataFlow” graph

Loop Example Cycle 4 Dispatching SUBI Instruction

Loop Example Cycle 5 And, BNEZ instruction

Loop Example Cycle 6 Notice that F0 never sees Load from location 80

Loop Example Cycle 7 Register file completely detached from computation First and Second iteration completely overlapped

Loop Example Cycle 8

Loop Example Cycle 9 Load1 completing: who is waiting? Note: Dispatching SUBI

Loop Example Cycle 10 Load2 completing: who is waiting? Note: Dispatching BNEZ

Loop Example Cycle 11 Next load in sequence

Loop Example Cycle 12 Why not issue third multiply?

Loop Example Cycle 13

Loop Example Cycle 14 Mult1 completing. Who is waiting?

Loop Example Cycle 15 Mult2 completing. Who is waiting?

Loop Example Cycle 16

Loop Example Cycle 17

Loop Example Cycle 18

Loop Example Cycle 19

Loop Example Cycle 20

What about Precise Interrupts? Both Scoreboard and Tomasulo have: In-order issue, out-of-order execution, and out-of-order completion Need to “fix” the out-of-order completion aspect so that we can find precise breakpoint in instruction stream.

Relationship between precise interrupts and speculation: Speculation is a form of guessing. Important for branch prediction: Need to “take our best shot” at predicting branch direction. If we issue multiple instructions per cycle, lose lots of potential instructions otherwise: Consider 4 instructions per cycle If take single cycle to decide on branch, waste from 4 - 7 instruction slots! If we speculate and are wrong, need to back up and restart execution to point at which we predicted incorrectly: This is exactly same as precise exceptions! Technique for both precise interrupts/exceptions and speculation: in-order completion or commit

HW support for precise interrupts Need HW buffer for results of uncommitted instructions: reorder buffer 3 fields: instr, destination, value Reorder buffer can be operand source => more registers like RS Use reorder buffer number instead of reservation station when execution completes Supplies operands between execution complete & commit Once operand commits, result is put into register Instructions commit As a result, easy to undo speculated instructions on mispredicted branches or on exceptions Reorder Buffer FP Op Queue FP Adder Res Stations FP Regs

Four Steps of Speculative Tomasulo Algorithm 1. Issue—get instruction from FP Op Queue If reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”) 2. Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute; checks RAW (sometimes called “issue”) 3. Write result—finish execution (WB) Write on Common Data Bus to all awaiting FUs & reorder buffer; mark reservation station available. 4. Commit—update register with reorder result When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”)