Download presentation
Presentation is loading. Please wait.
1
EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A
2
Pipeline function for BEQ Fetch: read instruction from memory Decode: read source operands from reg Execute: calculate target address and test for equality Memory: Send target to PC if test is equal Writeback: Nothing left to do
3
Control Hazards beq1 1 10 sub 3 4 5 time fetch decode execute memory writeback fetch decode execute beq sub
4
Approaches to handling control hazards Avoidance –Make sure there are no hazards in the code Detect and Stall –Delay fetch until branch resolved. Speculate and Squash if wrong –Go ahead and fetch more instruction in case it is correct, but stop them if they shouldn’t have been executed
5
Handling branch hazards: avoid all hazards Don’t have branch instructions! –Maybe a little impractical Predication can eliminate some branches –If-conversion –Hyperblocks
6
if-conversion if (a == b) { x++; y = n / d; } subt1 a, b jnzt1, PC+2 addx x, #1 divy n, d sub t1 a, b add(t1) x x, #1 div(t1) y n, d sub t1 a, b add t2 x, #1 div t3 n, d cmov(t1) x t2 cmov(t1) y t3
7
Removing hazards by refining a branch instruction Redefine branch instructions: ptbeq regA regB offset prepare to branch if equal If (R[regA] = = R[regB]) execute instructions at PC+1, PC+2, PC+3 then PC+1+offset
8
ptbnz example t = 5 n = 7 g = c + 2 bnz g, PC + 1 m = 5 a = 3 g = c + 2 bnz g, PC + 4 t = 5 n = 7 noop m = 5 a = 3
9
Problems with this solution Old programs (legacy code) may not run correctly on new implementations –Longer pipelines tend to need more noops Programs get larger as noops are included –Especially a problem for machines that try to execute more than one instruction every cycle –Harder to find useful instructions Program execution is slower –CPI is one, but some I’s are noops
10
Handling control hazards: detect and stall Detection: –Must wait until decode –Compare opcode to beq or jalr –Alternately, this is just another control signal Stall: –Keep current instructions in fetch –Pass noop to decode stage (not execute!)
11
PC Inst mem REG file MUXMUX ALUALU MUXMUX 1 Data memory ++ MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB sign ext Control bnz r1
12
PC Inst mem REG file MUXMUX ALUALU MUXMUX 1 Data memory ++ MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB sign ext Control noop MUXMUX
13
Control Hazards beq1 1 10 sub 3 4 5 time fetch decode execute memory writeback fetch fetch fetch beq sub fetch or fetch Target:
14
Problems with detect and stall CPI increases every time a branch is detected! Is that necessary? Not always! –Only about ½ of the time is the branch taken Let’s assume that it is NOT taken… –In this case, we can ignore the beq (treat it like a noop) –Keep fetching PC + 1 What if we are wrong? –OK, as long as we do not COMPLETE any instructions we mistakenly executed (i.e. don’t perform writeback)
15
Handling data hazards: speculate and squash Speculate: assume not equal –Keep fetching from PC+1 until we know that the branch is really taken Squash: stop bad instructions if taken –Send a noop to: Decode, Execute and Memory –Send target address to PC
16
PC REG file MUXMUX ALUALU MUXMUX 1 Data memory ++ MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB sign ext Control equal MUXMUX beq sub add nand add subbeq Inst mem noop
17
Problems with fetching PC+1 CPI increases every time a branch is taken! –About ½ of the time Is that necessary? No!, but how can you fetch from the target before you even know the previous instruction is a branch – much less whether it is taken???
18
PC Inst mem REG file MUXMUX ALUALU MUXMUX 1 Data memory ++ MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB sign ext Control beq bpc MUXMUX target eq?
19
Branch Target Buffer Fetch PC Predicted target PC Send PC to BTB found? Yes use target use PC+1 No
20
Branch prediction Predict not taken: ~50% accurate –No BTB needed; always use PC+1 Predict backward taken:~65% accurate –BTB holds targets for backward branches (loops) Predict same as last time:~80% accurate –Update BTB for any taken branch
21
What about indirect branches? Could use same approach –PC+1 unlikely indirect target –Indirect jumps often have multiple targets (for same instruction) Switch statements Virtual function calls Shared library (DLL) calls
22
Indirect jump: Special Case Return address stack –Function returns have deterministic behavior (usually) Return to different locations (BTB doesn’t work well) Return location known ahead of time –In some register at the time of the call –Build a specialize structure for return addresses Call instructions write return address to R31 AND RAS Return instructions pop predicted target off stack –Issues: finite size (save or forget on overflow?); –Issues: long jumps (clear when wrong?)
23
Branch prediction Pentium:~85% accurate Pentium Pro:~92% accurate Best paper designs:~96% accurate
24
Costs of branch prediction/speculation Performance costs? –Minimal: no difference between waiting and squashing; and it is a huge gain when prediction is correct! Power? –Large: in very long/wide pipelines many instructions can be squashed Squashed = # mispredictions pipeline length/width before target resolved Area? –Can be large: predictors can get very big as we will see next time Complexity? –Designs are more complex –Testing becomes more difficult
25
What else can be speculated? Dependencies –I think this data is coming from that store instruction) Values –I think I will load a 0 value Accuracy? –Branch prediction (direction) is Boolean (T,NT) –Branch targets are stable or predictable (RAS) –Dependencies are limited –Values cover a huge space (0 – 4B)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.