CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.

CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim

CPSC614 Lec 5.2 Correlating Predictors Two-level predictors if (d == 0) d = 1; if (d == 1)

CPSC614 Lec 5.3 initial value of d b1value of d before b2 b2 0 1 2

CPSC614 Lec 5.4 1-bit Predictor (Initialized to NT) db1 predic b1 action new b1 pr b2 predic b2 action new b2 pr 2 0 2 0

CPSC614 Lec 5.5 (1,1) Predictor Every branch has two separate prediction bits. –First bit: the prediction if the last branch in the program is not taken. –Second bit: the prediction if the last branch in the program is taken. Write the pair of prediction bits together.

CPSC614 Lec 5.6 Combinations & Meaning Prediction bitsPrediction if not taken Prediction if taken

CPSC614 Lec 5.7 (m,n) Predictor Uses the last m branches to choose from 2 m branch predictors, each of which is an n-bit predictor. Yields higher prediction rates than 2-bit scheme Requires a trivial amount of additional hardware The global history of the most recent m branches are recorded in an m-bit shift register.

CPSC614 Lec 5.8

CPSC614 Lec 5.9 (m,n) Predictor Total number of bits: = 2 m x n x #prediction entries selected by the branch address Examples

CPSC614 Lec 5.10

CPSC614 Lec 5.11 Tournament Predictors Most popular multilevel branch predictors

CPSC614 Lec 5.12 Tournament Predictors By using multiple predictors (one based on global information, one based on local information, and combining them with a selector), it can select the right predictor for the right branch. Alpha 21264 –Uses most sophisticated branch predictor as of 2001.

CPSC614 Lec 5.13

CPSC614 Lec 5.14

CPSC614 Lec 5.15 Need Address at Same Time as Prediction Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken) Branch PCPredicted PC =? PC of instruction FETCH Extra prediction state bits Yes: instruction is branch and use predicted PC as next PC No: branch not predicted, proceed normally (Next PC = PC+4)

CPSC614 Lec 5.16 Multiple-Issue Processors Allow multiple instructions to issue in a clock cycle. Ideal CPI < 1 2 flavors –Superscalar –VLIW (Very Long Instruction Word)

CPSC614 Lec 5.17 Superscalar Processors Issue varying numbers of instructions per clock –statically scheduled »using compiler techniques »in-order execution –dynamically scheduled »Tomasulo ’ s algorithm »out-of-order execution

CPSC614 Lec 5.18 Superscalar MIPS: 2 instructions, 1 FP & 1 anything – Fetch 64-bits/clock cycle; Int on left, FP on right – Can only issue 2nd instruction if 1st instruction issues – More ports for FP registers to do FP load & FP op in a pair TypePipeStages Int. instructionIFIDEXMEMWB FP instructionIFIDEXMEMWB Int. instructionIFIDEXMEMWB FP instructionIFIDEXMEMWB Int. instructionIFIDEXMEMWB FP instructionIFIDEXMEMWB Figure 3.24 P.219

CPSC614 Lec 5.19 VLIW Processors issue a fixed number of instructions formatted either as one large instruction or as a fixed instruction packet with the parallelism among instructions explicitly indicated by the instruction (EPIC: Explicitly Parallel Instruction Computers). Statically scheduled by the compiler.

CPSC614 Lec 5.20 nameIssue structure Hazard detection SchedulingDistinguishing characteristic Examples Superscalar (static) dynamich/wstaticin-order execution Sun Ultra SPARC II/III Superscalar (dynamic) dynamich/wdynamicsome out-of- order execution IBM Power2 Superscalar (speculative) dynamich/wdynamic w/ speculation out-of-order execution w/ speculation Pentium III/4, MIPS R10K, Alpha 21264, VLIW/LIWstatics/wstaticno hazards between issue packets Trimedia, i860 EPICmostly static mostly s/wmostly staticexplicit dependences marked by compiler Itanium

CPSC614 Lec 5.21 Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing burden. => Speculating on the outcome of branches and executing the program as if the guesses were correct. Hardware Speculation

CPSC614 Lec 5.22 3 Key Ideas of Hardware Speculation Dynamic Branch Prediction –Choose which instruction to execute. Speculation –Allow the execution of instructions before the control dependences are resolved (with the ability to undo the effect of an incorrectly speculated sequence). Dynamic Scheduling –Deal with the scheduling of different combinations of basic blocks

CPSC614 Lec 5.23 Examples PowerPC 603/604/G3/G4 MIPS R10000/12000 Intel Pentium II/III/4 Alpha 21264 AMD K5/K6/Athlon

CPSC614 Lec 5.24 What about Precise Interrupts? Tomasulo had: In-order issue, out-of-order execution, and out-of-order completion Need to “fix” the out-of-order completion aspect so that we can find precise breakpoint in instruction stream.

CPSC614 Lec 5.25 Relationship between precise interrupts and speculation: Speculation is a form of guessing. Important for branch prediction: –Need to “take our best shot” at predicting branch direction. If we speculate and are wrong, need to back up and restart execution to point at which we predicted incorrectly: –This is exactly same as precise exceptions! Technique for both precise interrupts/exceptions and speculation: in-order completion or commit

CPSC614 Lec 5.26 HW support for precise interrupts Need HW buffer for results of uncommitted instructions: reorder buffer –3 fields: instr, destination, value –Use reorder buffer number instead of reservation station when execution completes –Supplies operands between execution complete & commit –(Reorder buffer can be operand source => more registers like RS) –Instructions commit –Once instruction commits, result is put into register –As a result, easy to undo speculated instructions on mispredicted branches or exceptions Reorder Buffer FP Op Queue FP Adder Res Stations FP Regs

CPSC614 Lec 5.27 Four Steps of Speculative Tomasulo Algorithm 1.Issue—get instruction from FP Op Queue If reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”) 2.Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute; checks RAW (sometimes called “issue”) 3.Write result—finish execution (WB) Write on Common Data Bus to all awaiting FUs & reorder buffer; mark reservation station available. 4.Commit—update register with reorder result When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”)

CPSC614 Lec 5.28 What are the hardware complexities with reorder buffer (ROB)? Reorder Buffer FP Op Queue FP Adder Res Stations FP Regs Compar network How do you find the latest version of a register? –(As specified by Smith paper) need associative comparison network –Could use future file or just use the register result status buffer to track which specific reorder buffer has received the value Need as many ports on ROB as register file Reorder Table Dest Reg ResultExceptions?Valid Program Counter

CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.

Similar presentations

Presentation on theme: "CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.

Similar presentations

Presentation on theme: "CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim."— Presentation transcript:

Similar presentations

About project

Feedback