CIS429.S00: Lec10- 1 Control Hazards Created by branch statements BEQZLOC ADDR1,R2,R3. LOCSUBR1,R2,R3 PC needs to be computed but it happens too late in the pipeline (MEM)
CIS429.S00: Lec10- 2 DLX Datapath
CIS429.S00: Lec10- 3 PC computation is too late --> DLX must stall for 3 cycles See Figure 3.21 Assume 30% conditional branches stall for 3 cycles new CPI = (.70) (1) + (.30)(1 + 3) = 1.9 This seriously reduces Speedup (almost factor of two in this case)
CIS429.S00: Lec10- 4 Add new DLX hardware: --> stall for only 1 cycle Figure 3.4 (before) and Figure 3.22 (after)
CIS429.S00: Lec10- 5 Four Branch Hazard Alternatives #1 STALL: until branch direction is known #2 Predict Branch Not Taken: guess that the branch will not be taken and execute successor instruction. Cancel out instructions in the pipeline if wrong guess. #3 Predict Branch Taken: opposite of above. (Doesn’t help for DLX) #4 Delayed Branch: insert useful instructions into the pipeline until the branch direction is known.
CIS429.S00: Lec10- 6 Assumptions for our DLX problems Hardware is added to compute PC in ID stage Thus only a 1 cycle stall is incurred for branch hazards Solution #1 (STALL) incurs a penalty of 1 cycle.
CIS429.S00: Lec10- 7 #2 Predict Not Taken See Figure 3.26 in text Penalty under DLX –0 if not taken –1 if taken Assume 30% conditional branches of which 60% are not taken: CPI = (.70) ( 1) + (.30) ((.60)*(1) + (.40)(1+1))
CIS429.S00: Lec10- 8 #3 Predict Taken Similar situation and analysis as Predict Not Taken Since DLX does not have this option, we will ignore it
CIS429.S00: Lec10- 9 #4 Delayed Branch Delay Slot = the slots in the pipeline that would be stalls (we are going to fill them with instructions and try to avoid stalls) (a) Non-cancelling Delayed Branch Useful instructions are inserted into the delay slots (b) Cancelling Delayed Branch Some instructions are inserted into the delay slots and cancelled if wrong guess (as in Predict Not Taken and Predict Taken)
CIS429.S00: Lec #4a: Delayed Branch - non-cancelling See Figure 3.27 Fill the slot with a useful instruction DLX penalty = 0 Sometimes no useful instruction can be found by the compiler DLX penalty = 1
CIS429.S00: Lec #4b: Delayed Branch - cancelling with predict taken See Figure 3.30 Try to fill with a useful instruction: since we predict taken, it can be chose from instructions at the taken location If the prediction was right DLX penalty = 0 If the prediction was wrong, the instruction must be cancelled DLX penalty = 1
CIS429.S00: Lec Where to get instructions to fill the branch delay slot(s)? From before the branch instruction From the target address (only OK if branch taken) From the fall through (only OK if branch not taken) Compiler effectiveness for single branch delay slot: –about 60% of slots are filled –about 80% of instructions in delay slots are useful (not cancelled)
CIS429.S00: Lec Where to get instruct. for the delay slots
CIS429.S00: Lec Figure 3.24: Branch Frequencies COND UNCOND INT 16% 4% FP 9% 1%
CIS429.S00: Lec Figure 3.25: Taken and Untaken Frequencies Taken Untaken INT 62% 38% FP 70% 30%