Tomasulo Speculative Example

Slides:



Advertisements
Similar presentations
Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.
Advertisements

Instruction-level Parallelism Compiler Perspectives on Code Movement dependencies are a property of code, whether or not it is a HW hazard depends on.
1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design.
Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
A scheme to overcome data hazards
Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
COMP25212 Advanced Pipelining Out of Order Processors.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 7: Dynamic Scheduling and Branch Prediction * Jeremy R. Johnson Wed. Nov. 8, 2000.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Cont. Computer Architecture.
Computer Architecture
1 Zvika Guz Slides modified from Prof. Dave Patterson, Prof. John Kubiatowicz, and Prof. Nancy Warter-Perez Out Of Order Execution.
Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.
1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.
Computer Architecture Lecture 6 Overview of Branch Prediction.
1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.
2/24; 3/1,3/11 (quiz was 2/22, QuizAns 3/8) CSE502-S11, Lec ILP 1 Tomasulo Organization FP adders Add1 Add2 Add3 FP multipliers Mult1 Mult2 From.
Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Speculative Execution 2nd Feb, 2006.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Sections 3.2 and 3.3 Dynamic Scheduling – Tomasulo’s Algorithm 吳俊興 高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.
Instruction-Level Parallelism and Its Dynamic Exploitation
IBM System 360. Common architecture for a set of machines
The University of Adelaide, School of Computer Science
/ Computer Architecture and Design
/ Computer Architecture and Design
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
COMP 740: Computer Architecture and Implementation
Out of Order Processors
Dynamic Scheduling and Speculation
Step by step for Tomasulo Scheme
Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1
CS203 – Advanced Computer Architecture
CS5100 Advanced Computer Architecture Hardware-Based Speculation
CS203 – Advanced Computer Architecture
Lecture 10 Tomasulo’s Algorithm
CPE 631 Lecture 15: Exploiting ILP with SW Approaches
Lecture 12 Reorder Buffers
Chapter 3: ILP and Its Exploitation
Advantages of Dynamic Scheduling
Instruction-level Parallelism
CS 5513 Computer Architecture Pipelining Examples
Tomasulo With Reorder buffer:
11/14/2018 CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, Electrical and Computer.
CMSC 611: Advanced Computer Architecture
A Dynamic Algorithm: Tomasulo’s
Out of Order Processors
CS203 – Advanced Computer Architecture
CS203 – Advanced Computer Architecture
ECE 2162 Reorder Buffer.
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
CS 704 Advanced Computer Architecture
Lecture 11: Memory Data Flow Techniques
Out-of-Order Execution Scheduling
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
Advanced Computer Architecture
Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1
September 20, 2000 Prof. John Kubiatowicz
Tomasulo Algorithm Example
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
Tomasulo Organization
Reduction of Data Hazards Stalls with Dynamic Scheduling
CS5100 Advanced Computer Architecture Dynamic Scheduling
Midterm 2 review Chapter
/ Computer Architecture and Design
Chapter 3: ILP and Its Exploitation
September 20, 2000 Prof. John Kubiatowicz
CS203 – Advanced Computer Architecture
CS 3853 Computer Architecture Pipelining Examples
Conceptual execution on a processor which exploits ILP
Presentation transcript:

Tomasulo Speculative Example Loop: LD R2,0(R1) DADDIU R2,R2,#1 SD R2,0(R1) DADDIU R1,R1,#8 BNE R2,R3,LOOP Assumption: Add/Branch – 1 cycle Load/Store – 1 cycle Addr. Gen 1 cycles Mem. Access *Assume 2-issue superscalar 2 instruction can commit/clock (2 CDB) Memory FP Adder Branch

Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Iter Instruction Issue Exec Mem access Wrt. CDB Commit Comment 1 LD R2, 0(R1) (R2) = 5 DADDIU R2, R2, #1 (R2) = 6 SD R2, 0(R1) Mem[100] = 6 DADDIU R1, R1, #8 (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 2 (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Type Dest. Value Ready Rob9 Rob8 Rob7 Rob6 Rob5 Rob4 Rob3 Rob2 Rob1 Tag Op Vj Vk Qj Qk Addr Add1 Add2 Add3 Br1 Br2 Load1 Load2 Load3 Busy field not shown here due to space constraint. If there’s no entry, busy = 0, else busy = 1

Cycle 1: LD1 – Issue, ADD1a – Issue Iter Instruction Issue Exec Mem access Wrt. CDB Commit Comment 1 LD R2, 0(R1) (R2) = 5 DADDIU R2, R2, #1 (R2) = 6 SD R2, 0(R1) Mem[100] = 6 DADDIU R1, R1, #8 (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 2 (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Type Dest. Value Ready Rob9 Rob8 Rob7 Rob6 Rob5 Rob4 Rob3 Rob2 ADD R2 Rob1 LD Tag Op Vj Vk Qj Qk Addr Add1 Rob2 ADD 1 Rob1 Add2 Add3 Br1 Br2 Load1 LD 100 Load2 Load3 Cycle 1: LD1 – Issue, ADD1a – Issue

Iter Instruction Issue Exec Mem access Wrt. CDB Commit Comment 1 LD R2, 0(R1) 2 (R2) = 5 DADDIU R2, R2, #1 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) Mem[100] = 6 DADDIU R1, R1, #8 (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Type Dest. Value Ready Rob9 Rob8 Rob7 Rob6 Rob5 Rob4 ADD R1 Rob3 SD 0+(R1) Rob2 R2 Rob1 LD Tag Op Vj Vk Qj Qk Addr Add1 Rob2 ADD 1 Rob1 Add2 Rob4 100 8 Add3 Br1 Br2 Load1 LD Load2 Load3 Cycle 2: LD1 – Calc. Addr., ADD1a – Wait for R2 (LD1), SD1 – Issue, ADD1b - Issue

Iter Instruction Issue Exec Mem access Wrt. CDB Commit Comment 1 LD R2, 0(R1) 2 3 (R2) = 5 DADDIU R2, R2, #1 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 Exec. Directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Type Dest. Value Ready Rob9 Rob8 Rob7 Rob6 Rob5 BNE Rob4 ADD R1 Rob3 SD 100 Rob2 R2 Rob1 LD Tag Op Vj Vk Qj Qk Addr Add1 Rob2 ADD 1 Rob1 Add2 Rob4 100 8 Add3 Br1 Rob5 BNE 10 Br2 Load1 LD Load2 Load3 Cycle 3: LD1 – Load, ADD1a – Wait for R2 (LD1), SD1 – Calc. Addr, ADD1b – Execute, BNE1 - Issue

Iter Instruction Issue Exec Mem access Wrt. CDB Commit Comment 1 LD R2, 0(R1) 2 3 4 (R2) = 5 DADDIU R2, R2, #1 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 Exec. Directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Type Dest. Value Ready Rob9 Rob8 Rob7 ADD R2 Rob6 LD Rob5 BNE Rob4 R1 108 1 Rob3 SD 100 Rob2 Rob1 5 Tag Op Vj Vk Qj Qk Addr Add1 Rob2 ADD 5 1 Add2 Rob4 100 8 Add3 Rob7 Rob6 Br1 Rob5 BNE 10 Br2 Load1 Rob1 LD Load2 108 Load3 Cycle 4: LD1 – CDB, ADD1a – Wait for R2 (LD1), SD1 – Wait for R2 (ADD1a), ADD1b – CDB, BNE1 – Wait for R2 (ADD1a), LD2 – Issue, ADD2a - Issue

Iter Instruction Issue Exec Mem access Wrt. CDB Commit Comment 1 LD R2, 0(R1) 2 3 4 5 (R2) = 5 DADDIU R2, R2, #1 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 Exec. Directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 No Exec. Delay (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Type Dest. Value Ready Rob9 ADD R1 Rob8 SD 0+(R1) Rob7 R2 Rob6 LD Rob5 BNE Rob4 108 1 Rob3 100 Rob2 Rob1 5 Tag Op Vj Vk Qj Qk Addr Add1 Rob2 ADD 5 1 Add2 Rob9 108 8 Add3 Rob7 Rob6 Br1 Rob5 BNE 10 Br2 Load1 Load2 LD Load3 Cycle 5: LD1 – Commit, ADD1a Execute, SD1 – Wait for R2 (ADD1a), ADD1b – Wait to Commit, BNE1 – Wait for R2 (ADD1a), LD2 – Calc Addr., ADD2a – Wait for R2 (LD2), SD2 – Issue, ADD2b - Issue

Iter Instruction Issue Exec Mem access Wrt. CDB Commit Comment 1 LD R2, 0(R1) 2 3 4 5 (R2) = 5 DADDIU R2, R2, #1 6 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 Exec. Directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 No Exec. Delay (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Type Dest. Value Ready Rob9 ADD R1 Rob8 SD 108 Rob7 R2 Rob6 LD Rob5 BNE Rob4 1 Rob3 100 6 Rob2 Rob1 Tag Op Vj Vk Qj Qk Addr Add1 Rob2 ADD 5 1 Add2 Rob9 108 8 Add3 Rob7 Rob6 Br1 Rob5 BNE 6 10 Br2 Rob1 Load1 Load2 LD Load3 Cycle 6: ADD1a CDB, SD1 – Wait for R2 (ADD1a), ADD1b – Wait to Commit, BNE1 – Wait for R2 (ADD1a), LD2 – Load Spec., ADD2a – Wait for R2 (LD2), SD2 – Calc. Addr., ADD2b – Execute, BNE2 - Issue

Iter Instruction Issue Exec Mem access Wrt. CDB Commit Comment 1 LD R2, 0(R1) 2 3 4 5 (R2) = 5 DADDIU R2, R2, #1 6 7 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 Exec. Directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 No Exec. Delay (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Type Dest. Value Ready Rob9 ADD R1 116 1 Rob8 SD 108 Rob7 R2 Rob6 LD 6 Rob5 BNE Rob4 Rob3 100 Rob2 Rob1 Tag Op Vj Vk Qj Qk Addr Add1 Add2 Rob9 ADD 108 8 Add3 Rob7 6 1 Br1 Rob5 BNE 10 Br2 Rob1 Load1 Load2 Rob6 LD Load3 Cycle 7: ADD1a – Commit, SD1 – Commit, ADD1b – Wait to Commit, BNE1 – Exec., LD2 – CDB, ADD2a – Wait for R2 (LD2), SD2 – Wait for R2 (ADD2a), ADD2b – CDB, BNE2 – Wait for R2 (ADD2a)

Iter Instruction Issue Exec Mem access Wrt. CDB Commit Comment 1 LD R2, 0(R1) 2 3 4 5 (R2) = 5 DADDIU R2, R2, #1 6 7 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 8 Exec. Directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 No Exec. Delay (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Type Dest. Value Ready Rob9 ADD R1 116 1 Rob8 SD 108 Rob7 R2 Rob6 LD 6 Rob5 BNE Rob4 Rob3 Rob2 Rob1 Tag Op Vj Vk Qj Qk Addr Add1 Add2 Add3 Rob7 ADD 6 1 Br1 Br2 Rob1 BNE 10 Load1 Load2 Load3 Cycle 8: ADD1b –Commit, BNE1 – Commit, LD2 – Wait to Commit, ADD2a – Exec, SD2 – Wait for R2 (ADD2a), ADD2b – Wait to Commit , BNE2 – Wait for R2 (ADD2a)

Iter Instruction Issue Exec Mem access Wrt. CDB Commit Comment 1 LD R2, 0(R1) 2 3 4 5 (R2) = 5 DADDIU R2, R2, #1 6 7 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 8 Exec. Directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 9 No Exec. Delay (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Type Dest. Value Ready Rob9 ADD R1 116 1 Rob8 SD 108 7 Rob7 R2 Rob6 LD 6 Rob5 Rob4 Rob3 Rob2 Rob1 BNE Tag Op Vj Vk Qj Qk Addr Add1 Add2 Add3 Rob7 ADD 6 1 Br1 Br2 Rob1 BNE 7 10 Load1 Load2 Load3 Cycle 9: LD2 – Commit, ADD2a – CDB, SD2 – Wait for R2 (ADD2a), ADD2b – Wait to Commit , BNE2 – Wait for R2 (ADD2a)

Iter Instruction Issue Exec Mem access Wrt. CDB Commit Comment 1 LD R2, 0(R1) 2 3 4 5 (R2) = 5 DADDIU R2, R2, #1 6 7 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 8 Exec. Directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 9 No Exec. Delay 10 (R2) = 7 Mem[108] = 7 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Type Dest. Value Ready Rob9 ADD R1 116 1 Rob8 SD 108 7 Rob7 R2 Rob6 Rob5 Rob4 Rob3 Rob2 Rob1 BNE Tag Op Vj Vk Qj Qk Addr Add1 Add2 Add3 Br1 Br2 Rob1 BNE 7 10 Load1 Load2 Load3 Cycle 10: ADD2a – Commit, SD2 – Commit, ADD2b – Wait to Commit , BNE2 – Exec

Cycle 11: ADD2b – Commit , BNE2 – Commit Iter Instruction Issue Exec Mem access Wrt. CDB Commit Comment 1 LD R2, 0(R1) 2 3 4 5 (R2) = 5 DADDIU R2, R2, #1 6 7 Wait for R2 (LD) (R2) = 6 SD R2, 0(R1) Wait for R2 (ADD) Mem[100] = 6 DADDIU R1, R1, #8 8 Exec. Directly (R1) = 108 BNE R2, R3, LOOP 6 ≠ 10 9 No Exec. Delay 10 (R2) = 7 Mem[108] = 7 11 (R1) = 116 7 ≠ 10 Assume: (R1) = 100, (R3) = 10, Mem[100] = 5, Mem[108] = 6 Type Dest. Value Ready Rob9 ADD R1 116 1 Rob8 Rob7 Rob6 Rob5 Rob4 Rob3 Rob2 Rob1 BNE Tag Op Vj Vk Qj Qk Addr Add1 Add2 Add3 Br1 Br2 Load1 Load2 Load3 Cycle 11: ADD2b – Commit , BNE2 – Commit