Dynamic Scheduling and Speculation
Outline Dynamic Scheduling Tomasulo’s Algorithm Speculation
Dynamic Scheduling Out-of-order execution Check for structural and data hazards Begin executing as soon as operands are available Implies out-of-order completion WAR and WAW hazards Imprecise exceptions DIV.D F0, F2, F4 ADD.D F10, F0, F8 SUB.D F12, F8, F14
Tomasulo's Algorithm Invented by Robert Tomasulo for the IBM 360/91 Goal: High Performance without special compilers Influenced designs of Alpha 21264, HP 8000, MIPS 10000, Pentium II, Power PC 604 … Tomasulo, [1967]. “An efficient algorithm for exploiting multiple arithmetic units,” IBM J. Research and Development 11:1 (Jan), 25-33.
Tomasulo's Algorithm From Instruction Unit Instruction Queue FP Registers Load/Store operations Store buffers ADDRESS UNIT Load buffers 3 2 2 1 1 Reservation Stations Data Address MEMORY UNIT FP ADDER FP MULTIPLIERS Common Data Bus
Steps in Tomasulo's Algorithm Issue Check for structural hazards Queue in the Reservation Station Keep track of FU generating operand if not available in RF Eliminates WAR and WAW hazards Also called dispatch Execute Monitor CDB for operand (Eliminates RAW hazards) Write result Write result on the CDB RS is marked available
Example √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Qi Mult1 Load2 Add2 Add1 Mult2 Instruction Status Instruction Issue Read operands Write result L.D F6, 34(R2) √ √ √ L.D F2, 44(R3) √ √ √ MUL.D F0,F2,F4 √ √ √ SUB.D F8,F2,F6 √ √ √ DIV.D F10,F0,F6 √ ADD.D F6,F8,F2 √ √ √ Reservation Stations Name Busy Op Vj Vk Qj Qk A Load1 Load2 Add1 Add2 Add3 Mult1 Mult2 yes no Load 34 34+Regs[R2] yes no Load 44 44+Regs[R3] yes no SUB Mem[44+Regs[R3]] Mem[34+Regs[R2]] Load2 Load1 yes no ADD Add1[F8] Mem[44+Regs[R3]] Add1 Load2 no yes no MUL Mem[44+Regs[R3]] Regs[F4] Load2 yes DIV Mem[34+Regs[R2]] Mult1 Load1 Register Status Field F0 F2 F4 F6 F8 F10 F12 ... F30 Qi Mult1 Load2 Add2 Add1 Mult2
Hardware based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register file when instruction is no longer speculative Need an additional piece of hardware to prevent any irrevocable action until an instruction commits Reorder Buffer In-order commit Stores instruction results before instruction commits Clear ROB on misprediction Exceptions
Tomasulo's Algorithm with Speculation
Dynamic Scheduling+Multiple Issue+Speculation Limit the number of instructions of a given class that can be issued in a “bundle” Eg. one integer, one FP, one load/store Examine all the dependencies among the instructions in the bundle Also need multiple completion/commit
Dynamic Scheduling + Multiple Issue 2-way Superscalar Instructions Issues at clock Executes at clock Mem Access at clock Write CDB at clock 1 LD R2, 0(R1) DADDIU R2, R2, #1 SD R2, 0(R1) DADDIU R1, R1, #8 BNE R2, R3, L 2 3 1 2 3 4 1 5 6 2 3 7 2 3 4 3 7 4 8 9 10 4 11 12 5 9 13 5 8 9 6 13 7 14 15 16 7 17 18 8 15 19 8 14 15 9 19 Next Tutorial
Dynamic Scheduling + Multiple Issue + Speculation 2-way Superscalar Instructions Issues at clock Executes at clock Mem Access at clock Write CDB at clock Commits at clock 1 LD R2, 0(R1) DADDIU R2, R2, #1 SD R2, 0(R1) DADDIU R1, R1, #8 BNE R2, R3, L 2 3 1 2 3 4 5 1 5 6 7 2 3 7 2 3 4 8 3 7 8 4 5 6 7 9 4 8 9 10 5 6 10 5 6 7 11 6 10 11 7 8 9 10 12 7 11 12 13 8 9 13 8 9 10 14 9 13 14 Next Tutorial
Multithreading Execution Slots
Paper Reading Smith and Sohi. Microarchitecture of Superscalar Processors. Proc. of IEEE. 1995.
Literature on Processors Yeager, The MIPS R10000 Processor, MICRO, 1996. Hinton et. al., The Microarchitecture of the Pentium 4 Processor. Intel Technology Journal Q1, 2001. R. E. Kessler, The Alpha 21264 Microprocessor. IEEE Micro, 19(2), 1999. Kahle, et. al. Introduction to the Cell multiprocessor. IBM J. RES. & DEV. 2005. Hammerlund, et. al., Haswell: The fourth generation Intel Processor, MICRO 2014.
References Shen and Lipasti. Modern Processor Design. Hennessy and Patterson. CA. 5ed. González, Latorre and Magklis, Processor Microarchitecture - An Implementation Perspective”, SLoCA#12.