Reducing pipeline hazards – three techniques Department of Computer Science Southern Illinois University Edwardsville Fall, 2018 Dr. Hiroshi Fujinoki E-mail: hfujino@siue.edu Forwarding/000
Reducing pipeline hazards – three techniques Three techniques for different types of pipeline hazards 1. Forwarding – for reducing RAW data dependencies 2. Instruction Scheduling – for reducing RAW, WAR and WAW 3. Delayed Branch – for reducing control hazards Forwarding/001
Reducing pipeline hazards – three techniques Technique 1: Forwarding = Internal pipeline circuit to feedback outputs of a stage Latch Feedback-Wire IF ID EX ME WB Outputs from a pipeline stage can be fed to the same or different stages of another instruction Need hardware support Forwarding/002
Reducing pipeline hazards – three techniques Example ADD R1, R2, R3 LW R4, 10(R1) SW 12(R1), R4 // R1 = R2 + R3 // R4 MEM [R1 + 0] // MEM [R1+12] R4 Pipeline time chart for an ordinary pipeline processor IF ID EX ME WB ADD R1, R2, R3: LW R4, 10(R1): SW 12(R1), R4: IF ID EX ME WB STALL IF STALL ID EX ME WB 1 2 3 4 5 6 7 8 9 10 11 12 13 Forwarding/003
Reducing pipeline hazards – three techniques Latch Feedback-Wire IF ID EX ME WB ADD R1, R2, R3: IF ID EX ME WB LW R4, 10(R1): IF ID EX ME WB ADD R1, R2, R3: LW R4, 0(R1): Forwarding/004
Reducing pipeline hazards – three techniques IF ID EX ME WB ADD R1, R2, R3: IF ID EX ME WB LW R4, 0(R1): IF ID EX ME WB SW 12(R1), R4: IF ID EX ME WB ADD R1, R2, R3: LW R4, 0(R1): SW 12(R1), R4: 1 2 3 4 5 6 7 8 Speed-up = 13/7 = 1.85 Forwarding/005
(in high-level language, such as C++) Reducing pipeline hazards – three techniques Technique 2: Instruction scheduling by a compiler a = b + c (in high-level language, such as C++) LOAD R1, b // R1 MEM [Address of b] LOAD R2, c // R2 MEM [Address of b] a = b + c ADD R3, R1, R2 // R3 R1 + R2 STORE a, R3 // MEM [Address of a] R3 Scheduling/001
Reducing pipeline hazards – three techniques LOAD R1, b // R1 MEM [Address of b] LOAD R2, c // R2 MEM [Address of c] ADD R3, R1, R2 // R3 R1 + R2 STORE a, R3 // MEM [Address of a] R3 IF ID EX ME WB LOAD R1, b: LOAD R2, c: ADD R3, R1, R2: STORE a, R3: IF ID EX ME WB IF ID EX ME WB STALL (3) IF ID EX ME WB STALL (6) Forwarding/002
Reducing pipeline hazards – three techniques 1 LOAD R1, b 2 LOAD R2, c X ADD R3, R1, R2 7 X 8 X 9 X 10 STORE a, R3 1 2 3 4 Scheduling/002
Reducing pipeline hazards – three techniques Now, we are going to execute two instructions a = b + c d = e + f Scheduling/003
Reducing pipeline hazards – three techniques a = b + c d = e + f 1 LOAD R1, b 2 LOAD R2, c X ADD R3, R1, R2 7 X 8 X 9 X 10 STORE a, R3 Time 11 LOAD R4, e 12 LOAD R5, f 13 X 14 X 15 X 16 ADD R6, R4, R5 17 X 18 X 19 X 20 STORE d, R6 Time Scheduling/004
Reducing pipeline hazards – three techniques 1 LOAD R1, b LOAD R2, c X LOAD R4, e X LOAD R5, f X X 6 ADD R3, R1, R2 X 7 X X 8 X ADD R6, R4, R5 9 X X 10 STORE c, R3 X X STORE d, R6 a = b + c d = e + f Delay the 2nd instruction MERGE Scheduling/005
Reducing pipeline hazards – three techniques a = b + c d = e + f 1 LOAD R1, b LOAD R2, c LOAD R4, e LOAD R5, f X 6 ADD R3, R1, R2 7 X 8 ADD R6, R4, R5 9 X 10 STORE c, R3 STORE d, R6 Speed-Up = 21/12 = 1.75 Scheduling/006
Reducing pipeline hazards – three techniques Technique 3: Delayed Branch: = Fill up clock cycles that will be flashed by a branch instruction If branch NOT taken IF ID EX WB Branch Instruction(i): Instruction(i+1): Instruction(i+2): IF ID EX ME WB IF ID EX ME WB 1 2 3 4 5 6 7 8 DelayBranch/001
Reducing pipeline hazards – three techniques New destination address is set in PC If branch taken IF ID EX WB Branch Instruction(i): IF IF ID EX ME WB Instruction(i+1): IF ID EX ME WB Instruction(i+2): 1 2 3 4 5 6 7 8 9 10 11 DelayBranch/002
Reducing pipeline hazards – three techniques Before Delayed Branch Applied IF ID EX ME WB Branch Instruction(i): Instruction(i-1): Instruction(i+2): Instruction(i-2): Instruction(i-3): Instruction(i+1): IF ID EX WB IF IF ID EX ME WB IF ID EX ME WB We are going to lose 3 cycles DelayBranch/003
Reducing pipeline hazards – three techniques After Delayed Branch Applied Delayed-branch slot = 3 IF ID EX WB Branch Instruction(i): IF ID EX ME WB Instruction(i-1): Instruction(i-2): Instruction(i-3): IF ID EX ME WB Instruction(i+2): Instruction(i+1): DelayBranch/004
Reducing pipeline hazards – three techniques Problem in delayed-branch: data dependency to the branch instruction Example: SUB R1, R2, R3 JPEZ R1 LW R8, 0(R4) Conditional branch (Jump if R1 = 0) We can’t do this! IF ID EX WB JPEZ R1, 0(R5): SUB R1, R2, R3: LW R8, 0(R4): IF ID EX ME WB IF ID EX ME WB 1 2 3 4 5 6 7 8 DelayBranch/005
Reducing pipeline hazards – three techniques Advantages Wasted machine cycles in branch slot can be utilized no matter if a branch is taken or not DelayBranch/006
Reducing pipeline hazards – three techniques Disadvantages It does not work if there is data dependency Improvement only if: branch instructions, no data dependency Probability of improvement only 50% (assuming 50:50 branch or not) DelayBranch/007
Reducing pipeline hazards – three techniques Summary for Delayed Branch: To reduce machine cycle wastes due to pipeline flashes For pipeline flashed due to control dependencies Don’t throw away results of instructions in the branch slot Improvement is rather limited DelayBranch/008
Reducing pipeline hazards – three techniques Scheduling/007
Static & Dynamic Code Optimizations Static optimizations No overhead for program executions complex (time-consuming) code optimization algorithms can be applied without slowing-down programs. No additional cost for processors manufacturing cheaper processors, more reliable processors. Less complex processor internal design less heat generation (higher clock rate). Code Optimizations/001
Static & Dynamic Code Optimizations Dynamic optimizations Codes that were not optimized can be optimized. performance will be optimized for each processor. “back-ward compatibility” Code Optimizations/002