Download presentation
Presentation is loading. Please wait.
1
Lecture 10 Tomasulo’s Algorithm
CSCE 513 Computer Architecture Lecture 10 Tomasulo’s Algorithm Topics Dynamic Scheduling Review Tomasulo’s structure Examples Algorithm details Speculation Readings: Chapter 3: October 4, 2017
2
Overview Last Time New References Control Hazards: Data Hazards Review
Tomasulo Overview, examples revisited Figures 2.10 right one, 2.11 Tomasulo’s Algorithm details fig 2.12 Tomasulo + ReOrder Buffer (ROB) fig 2.14, 2.15, 2.16 References Chapter 2 section 2.6 Test 1 Tuesday September 30 – One Week Review ???
3
Links and things http://csg.csail.mit.edu/6.823/syllabusreadings.html
Reading list showing overlap of H&P IV and H&P III Patterson Lecture -- Parallel is Back Simulators and other tools
4
Tournament Predictors
Note 2-bit etc just use information about history of this branch(local info) Correlating predictors incorporate some info about other branches (global info) Tournament predictors uses multi level Local predictor Global predictor Selector predictor to chose between (choose the one that has been most successful in the recent past)
5
Pipelined Functional Units
Addition in Scientific Notation Floating point addition
6
Floating Adder Stages
7
Out of Order Execution
9
Figure 3.6 Tomasulo CDB Register Renaming Out of order execution
10
Tomasulo’s Multiple Reservation Stations for each Unit OP Qj, Qk
Vj, Vk A Busy Register File Qi Notes the reservation stations serve as extra temporary registers! To support OoOE – ID factored to “Issue” and “Read Operands”
11
Register Renaming Again
DIV F0, F2, F4 ADD.D F6, F0, F8 S.D F6, 0(R1) SUB.D F8,F10,F14 MUL.D F6,F10,F8
12
Tomasulo’s Example L.D F6, 32(R2) L.D F2, 44(R3) MUL.D F0, F2, F4 SUB.D F8, F2, F6 DIV.D F10, F0, F6 ADD.D F6, F8, F2
13
Figure 3.7 – Example which Cycle?
14
Figure 3.8 3
15
Figure 3.9.a Tomasulo Issue
16
Figure 3.9.b Tomasulo Execute
17
Figure 3.9.c Tomasulo Write Result
18
Tomasulo Loop Example Loop: L.D F0, 0(R1) MUL.D F4, F0, F2 S.D F4, 0(R1) DADDIU R1, R1, -8 BNE R1, R2, Loop Can’t be done on simulator! Can’t input DADDIU or BNE. Tomasulo is just for the Floating Point and Memory.
19
Tomasulo Loop Unrolled As Hardware would do
L.D F0, 0(R1) MUL.D F4, F0, F2 S.D F4, 0(R1) L.D F0, -8(R1) S.D F4, -8(R1)
20
Figure 3.10 - Two active Iterations of loop
21
Observations on Tomasulo’s Alg
Tomasulo designed for the IBM 360/91 Does not require compiler to do all of the work Changes to hardware do not require changes to compiler (adding another multiplier) Designed before caches, but OoOE really helps with cache misses Dynamic scheduling required for “speculation”
22
Hardware-Based Speculation
The University of Adelaide, School of Computer Science 18 September 2018 Hardware-Based Speculation Branch Prediction Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register file when instruction is no longer speculative Need an additional piece of hardware to prevent any irrevocable action until an instruction commits I.e. updating state or taking an execution Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer
23
Figure 3.11
24
Speculation Issue Execute Write result Commit
25
Koren’s Tools Again
26
Figure 3.12 Tomasulo + ROB example
27
Figure 3.13 Tomasulo + ROB example
28
Fig 2.17a Tomasulo+ROB Details
29
Fig 2.17b Tomasulo+ROB Execute
30
Fig 2.17c Tomasulo+ROB Write-result
31
Fig 2.17d Tomasulo+ROB Commit
32
Figure 3.15 Multiple Issue Approaches
33
Unrolling for VLIW For i=1,10000 x[i] = x[i]+ c Loop: L.D F0, 0(R1) ADD.D F4,F0,F2 S.D F4, 0(R1) DADDUI R1, R1, -8 BNE R1,R2, loop Registers for Load Sum F0 F4 F6 F8 F10 F12 F14 F16 F18 F20 F22 F24 F26 F28
34
Figure 3.16 VLIW
35
Itanium .
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.