Download presentation
Presentation is loading. Please wait.
1
Dynamic Scheduling and Speculation
2
Outline Dynamic Scheduling Tomasulo’s Algorithm Speculation
3
Dynamic Scheduling Out-of-order execution
Check for structural and data hazards Begin executing as soon as operands are available Implies out-of-order completion WAR and WAW hazards Imprecise exceptions DIV.D F0, F2, F4 ADD.D F10, F0, F8 SUB.D F12, F8, F14
4
Tomasulo's Algorithm Invented by Robert Tomasulo for the IBM 360/91
Goal: High Performance without special compilers Influenced designs of Alpha 21264, HP 8000, MIPS , Pentium II, Power PC 604 … Tomasulo, [1967]. “An efficient algorithm for exploiting multiple arithmetic units,” IBM J. Research and Development 11:1 (Jan),
5
Tomasulo's Algorithm From Instruction Unit Instruction Queue
FP Registers Load/Store operations Store buffers ADDRESS UNIT Load buffers 3 2 2 1 1 Reservation Stations Data Address MEMORY UNIT FP ADDER FP MULTIPLIERS Common Data Bus
6
Steps in Tomasulo's Algorithm
Issue Check for structural hazards Queue in the Reservation Station Keep track of FU generating operand if not available in RF Eliminates WAR and WAW hazards Also called dispatch Execute Monitor CDB for operand (Eliminates RAW hazards) Write result Write result on the CDB RS is marked available
7
Example √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Qi Mult1 Load2 Add2 Add1 Mult2
Instruction Status Instruction Issue Read operands Write result L.D F6, 34(R2) √ √ √ L.D F2, 44(R3) √ √ √ MUL.D F0,F2,F4 √ √ √ SUB.D F8,F2,F6 √ √ √ DIV.D F10,F0,F6 √ ADD.D F6,F8,F2 √ √ √ Reservation Stations Name Busy Op Vj Vk Qj Qk A Load1 Load2 Add1 Add2 Add3 Mult1 Mult2 yes no Load 34 34+Regs[R2] yes no Load 44 44+Regs[R3] yes no SUB Mem[44+Regs[R3]] Mem[34+Regs[R2]] Load2 Load1 yes no ADD Add1[F8] Mem[44+Regs[R3]] Add1 Load2 no yes no MUL Mem[44+Regs[R3]] Regs[F4] Load2 yes DIV Mem[34+Regs[R2]] Mult1 Load1 Register Status Field F0 F2 F4 F6 F8 F10 F12 ... F30 Qi Mult1 Load2 Add2 Add1 Mult2
8
Hardware based Speculation
Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register file when instruction is no longer speculative Need an additional piece of hardware to prevent any irrevocable action until an instruction commits Reorder Buffer In-order commit Stores instruction results before instruction commits Clear ROB on misprediction Exceptions
9
Tomasulo's Algorithm with Speculation
11
Dynamic Scheduling+Multiple Issue+Speculation
Limit the number of instructions of a given class that can be issued in a “bundle” Eg. one integer, one FP, one load/store Examine all the dependencies among the instructions in the bundle Also need multiple completion/commit
12
Dynamic Scheduling + Multiple Issue
2-way Superscalar Instructions Issues at clock Executes at clock Mem Access at clock Write CDB at clock 1 LD R2, 0(R1) DADDIU R2, R2, #1 SD R2, 0(R1) DADDIU R1, R1, #8 BNE R2, R3, L 2 3 1 2 3 4 1 5 6 2 3 7 2 3 4 3 7 4 8 9 10 4 11 12 5 9 13 5 8 9 6 13 7 14 15 16 7 17 18 8 15 19 8 14 15 9 19 Next Tutorial
13
Dynamic Scheduling + Multiple Issue + Speculation
2-way Superscalar Instructions Issues at clock Executes at clock Mem Access at clock Write CDB at clock Commits at clock 1 LD R2, 0(R1) DADDIU R2, R2, #1 SD R2, 0(R1) DADDIU R1, R1, #8 BNE R2, R3, L 2 3 1 2 3 4 5 1 5 6 7 2 3 7 2 3 4 8 3 7 8 4 5 6 7 9 4 8 9 10 5 6 10 5 6 7 11 6 10 11 7 8 9 10 12 7 11 12 13 8 9 13 8 9 10 14 9 13 14 Next Tutorial
14
Multithreading Execution Slots
15
Paper Reading Smith and Sohi. Microarchitecture of Superscalar Processors. Proc. of IEEE
16
Literature on Processors
Yeager, The MIPS R10000 Processor, MICRO, Hinton et. al., The Microarchitecture of the Pentium 4 Processor. Intel Technology Journal Q1, 2001. R. E. Kessler, The Alpha Microprocessor. IEEE Micro, 19(2), 1999. Kahle, et. al. Introduction to the Cell multiprocessor. IBM J. RES. & DEV Hammerlund, et. al., Haswell: The fourth generation Intel Processor, MICRO 2014.
17
References Shen and Lipasti. Modern Processor Design.
Hennessy and Patterson. CA. 5ed. González, Latorre and Magklis, Processor Microarchitecture - An Implementation Perspective”, SLoCA#12.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.