Reducing pipeline hazards – three techniques

Reducing pipeline hazards – three techniques
Department of Computer Science Southern Illinois University Edwardsville Fall, 2018 Dr. Hiroshi Fujinoki Forwarding/000

Three techniques for different types of pipeline hazards 1. Forwarding – for reducing RAW data dependencies 2. Instruction Scheduling – for reducing RAW, WAR and WAW 3. Delayed Branch – for reducing control hazards Forwarding/001

Technique 1: Forwarding = Internal pipeline circuit to feedback outputs of a stage Latch Feedback-Wire IF ID EX ME WB Outputs from a pipeline stage can be fed to the same or different stages of another instruction Need hardware support Forwarding/002

Example ADD R1, R2, R3 LW R4, 10(R1) SW 12(R1), R4 // R1 = R2 + R3 // R4  MEM [R1 + 0] // MEM [R1+12]  R4 Pipeline time chart for an ordinary pipeline processor IF ID EX ME WB ADD R1, R2, R3: LW R4, 10(R1): SW 12(R1), R4: IF ID EX ME WB STALL IF STALL ID EX ME WB 1 2 3 4 5 6 7 8 9 10 11 12 13 Forwarding/003

Latch Feedback-Wire IF ID EX ME WB ADD R1, R2, R3: IF ID EX ME WB LW R4, 10(R1): IF ID EX ME WB ADD R1, R2, R3: LW R4, 0(R1): Forwarding/004

IF ID EX ME WB ADD R1, R2, R3: IF ID EX ME WB LW R4, 0(R1): IF ID EX ME WB SW 12(R1), R4: IF ID EX ME WB ADD R1, R2, R3: LW R4, 0(R1): SW 12(R1), R4: 1 2 3 4 5 6 7 8 Speed-up = 13/7 = 1.85 Forwarding/005

(in high-level language, such as C++)
Reducing pipeline hazards – three techniques Technique 2: Instruction scheduling by a compiler a = b + c (in high-level language, such as C++) LOAD R1, b // R1  MEM [Address of b] LOAD R2, c // R2  MEM [Address of b] a = b + c ADD R3, R1, R2 // R3  R1 + R2 STORE a, R // MEM [Address of a]  R3 Scheduling/001

LOAD R1, b // R1  MEM [Address of b] LOAD R2, c // R2  MEM [Address of c] ADD R3, R1, R2 // R3  R1 + R2 STORE a, R // MEM [Address of a]  R3 IF ID EX ME WB LOAD R1, b: LOAD R2, c: ADD R3, R1, R2: STORE a, R3: IF ID EX ME WB IF ID EX ME WB STALL (3) IF ID EX ME WB STALL (6) Forwarding/002

1 LOAD R1, b 2 LOAD R2, c X ADD R3, R1, R2 7 X 8 X 9 X STORE a, R3 1 2 3 4 Scheduling/002

Now, we are going to execute two instructions a = b + c d = e + f Scheduling/003

a = b + c d = e + f 1 LOAD R1, b 2 LOAD R2, c X ADD R3, R1, R2 7 X 8 X 9 X 10 STORE a, R3 Time 11 LOAD R4, e 12 LOAD R5, f 13 X 14 X 15 X 16 ADD R6, R4, R5 17 X 18 X 19 X 20 STORE d, R6 Time Scheduling/004

1 LOAD R1, b LOAD R2, c X LOAD R4, e X LOAD R5, f X X 6 ADD R3, R1, R2 X 7 X X 8 X ADD R6, R4, R5 X X 10 STORE c, R3 X X STORE d, R6 a = b + c d = e + f Delay the 2nd instruction  MERGE  Scheduling/005

a = b + c d = e + f 1 LOAD R1, b LOAD R2, c LOAD R4, e LOAD R5, f X 6 ADD R3, R1, R2 7 X 8 ADD R6, R4, R5 X 10 STORE c, R3 STORE d, R6 Speed-Up = 21/12 = 1.75 Scheduling/006

Technique 3: Delayed Branch: = Fill up clock cycles that will be flashed by a branch instruction If branch NOT taken IF ID EX WB Branch Instruction(i): Instruction(i+1): Instruction(i+2): IF ID EX ME WB IF ID EX ME WB 1 2 3 4 5 6 7 8 DelayBranch/001

New destination address is set in PC If branch taken IF ID EX WB Branch Instruction(i): IF IF ID EX ME WB Instruction(i+1): IF ID EX ME WB Instruction(i+2): 1 2 3 4 5 6 7 8 9 10 11 DelayBranch/002

Before Delayed Branch Applied IF ID EX ME WB Branch Instruction(i): Instruction(i-1): Instruction(i+2): Instruction(i-2): Instruction(i-3): Instruction(i+1): IF ID EX WB IF IF ID EX ME WB IF ID EX ME WB We are going to lose 3 cycles DelayBranch/003

After Delayed Branch Applied Delayed-branch slot = 3 IF ID EX WB Branch Instruction(i): IF ID EX ME WB Instruction(i-1): Instruction(i-2): Instruction(i-3): IF ID EX ME WB Instruction(i+2): Instruction(i+1): DelayBranch/004

Problem in delayed-branch: data dependency to the branch instruction Example: SUB R1, R2, R3 JPEZ R1 LW R8, 0(R4) Conditional branch (Jump if R1 = 0) We can’t do this! IF ID EX WB JPEZ R1, 0(R5): SUB R1, R2, R3: LW R8, 0(R4): IF ID EX ME WB IF ID EX ME WB 1 2 3 4 5 6 7 8 DelayBranch/005

Advantages Wasted machine cycles in branch slot can be utilized no matter if a branch is taken or not DelayBranch/006

Disadvantages It does not work if there is data dependency Improvement only if: branch instructions, no data dependency Probability of improvement only 50% (assuming 50:50 branch or not) DelayBranch/007

Summary for Delayed Branch: To reduce machine cycle wastes due to pipeline flashes For pipeline flashed due to control dependencies Don’t throw away results of instructions in the branch slot Improvement is rather limited DelayBranch/008

Scheduling/007

Static & Dynamic Code Optimizations
Static optimizations No overhead for program executions  complex (time-consuming) code optimization algorithms can be applied without slowing-down programs. No additional cost for processors manufacturing  cheaper processors, more reliable processors. Less complex processor internal design  less heat generation (higher clock rate). Code Optimizations/001

Static & Dynamic Code Optimizations
Dynamic optimizations  Codes that were not optimized can be optimized. performance will be optimized for each processor. “back-ward compatibility” Code Optimizations/002

Reducing pipeline hazards – three techniques

Similar presentations

Presentation on theme: "Reducing pipeline hazards – three techniques"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reducing pipeline hazards – three techniques

Similar presentations

Presentation on theme: "Reducing pipeline hazards – three techniques"— Presentation transcript:

Similar presentations

About project

Feedback