EECE476: Computer Architecture Lecture 19: Pipelining Reducing Control Hazard Penalty Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy.

2 Reminders Midterm 1: Tomorrow! –50 minutes, not open book, calculator OK –Based on your assignments + lecture material –Covers EVERYTHING up to end of week 5 –No Verilog, Altera tools on midterm –Try Lecture 14 study problems –Try extra study problems on web Partner signup deadline: Friday! –MUST work in pairs EMAIL ME IMMEDIATELY IF YOU’RE STILL ALONE –MUST email to project TA: 2 names, stud #s, emails –Late penalty: 5% of final grade

3 Directly Reducing Penalty of Control Hazards Control hazards demand solutions: –Stalling –Nullify –Branch delay slots –All of these negatively impact performance (in various ways) Alternative –Can we directly reduce the negative impact of control hazards? Yes! –Execute branch/jump instruction earlier in pipeline –Outcome known sooner –Fetch fewer instructions enter pipeline after branch (before outcome known) –For BEQ, we must detect “equals” earlier

4 BEQ in “X” Stage Instead? Move logic gates from “M” stage into “X” stage –“AND” gate, PCSrc mux –These logic gates depend on results *after* ALU Benefit –Only 2 instructions follow BEQ into pipeline –Improves only BEQ-if-taken performance But… –ALU may be slowest part of pipeline –Causes longer delay path after ALU, “X” stage slower –Clock rate may be affected –This may negatively affect ALL instructions, be careful!!!!

5 Moving BEQ into “X” Stage Red shows extra signal delay after ALU

6 BEQ in “D” Stage Instead? Move logic gates from “M” stage into “D” stage –But we need ALU to compute (Rs – Rt) and Zero –How can we do this? Key benefit –Only 1 instruction follows “BEQ” into pipeline Notice –“EQ” can be computed efficiently in “D” stage (Rs XOR Rt) == 32’b0 Simple bitwise XOR “== 32’b0” is simple: wide 32-input NOR gate, single output –No need for subtraction Simple logic, no carry chain –Move “ + SgnExt(Imm16) logic into “D” stage as well

7 Detecting “EQ” Earlier

8 Reducing Branch Penalty Reducing branch penalty –Compute (Rs == Rt) and target address in “D” stage –Reduces branch delay to 1 cycle –Works well But, this introduces a forwarding error! Suppose ADD $1, $2, $3  previous instruction BEQ $1, $2, 7  RAW hazard: needs new $1, forwarding? NOP  delay slot –Dependency causes data hazard Solutions? –Option 1: Stall until writeback of dependent instruction –Option 2A: Forward as much as possible (stall 1 cycle for LW) –Option 2B: Forward a bit less (stall 2 cycles for LW, 1 cycle for others)

9 Data Hazard with “BEQ” Example: Clock cycle 1 1ADD $1,$2,$3IDXMW BEQ $1,$2, 7IDXMW NOPIDXMW IDXMW IDXMW IDXMW IDXMW IDXMW IDXMW IDXMW W M X D I

10 Data Hazard with “BEQ” Clock cycle 2 1ADD $1,$2,$3IDXMW 2BEQ $1,$2, 7IDXMW NOPIDXMW IDXMW IDXMW IDXMW IDXMW IDXMW IDXMW IDXMW W M X D I

11 Data Hazard with “BEQ” Clock cycle 3 normal forwarding into X doesn’t work, arrives too late! 1ADD $1,$2,$3IDXMW 2BEQ $1,$2, 7IDXMW 3NOPIDXMW IDXMW IDXMW IDXMW IDXMW IDXMW IDXMW IDXMW W M X D I

12 Forwarding with “BEQ” Clock cycle 4 Option 2B: insert “bubble”, forward to D 1ADD $1,$2,$3IDXMW 2?IDX?? 3BEQ $1,$2, 7IDDXMW 4NOPIDXMW IDXMW IDXMW IDXMW IDXMW IDXMW IDXMW W M ? D I

13 Cause of the Error? Moved “BEQ” execution from X to D –PROBLEM: pipeline currently only forwards data into X stage To resolve, we have two options: –Option 1: Low performance, add Hazard Detection Unit (HDU) condition BEQ depends on earlier instruction Stall >= 1 cycle until dependent instruction instruction finishes writeback If not told otherwise, assume this approach –Option 2, Higher performance, add Forward Detection Unit (FDU) and muxes Option 2A: Forward data from ALU out, DataMem out, W result into D stage –Avoids most stalls (no HDU needed, except for LW case again). –Longer delay, will probably slow clock and affect all instructions. Option 2B: Stall if dependent instr in X, forward data from M, W stages into D stage, stall if dependence in X –HDU must now stall 1 cycle when BEQ depends on immediately prior R-type instruction.

14 Control Hazards Summary Branches/jumps cause interruptions to control flow –This affects the stream of instructions entering pipeline afterward the branch/jump These interruptions cause a utilization problem –We may fetch the wrong instruction(s) after branch/jump Option 1: stall after every branch/jump Option 2: nullify-if-branch-taken (small performance improvement) Option 3: declare as a “delay slot”, always-execute (bad idea for future ISAs) –Default: assume MIPS behaviour – Option 3 with 1 Delay Slot To reduce Utilization problem, move branch/jump to D stage –This introduces new data hazards if the branch depends upon recent instructions These hazards introduce a new forwarding problem –Branch/jump may depend on result of recent instruction(s) Option 1: HDU forces stall until writeback (multiple cycles) Option 2: minimal HDU stall, forward data when dependence can be resolved –2A: more forwarding needed, stall only for LW, may slow down clock speed –2B: less fowarding needed, stall for LW and R-type until forwarding can be used –Default: assume FORWARDING OPTION 1 for this course

EECE476: Computer Architecture Lecture 19: Pipelining Reducing Control Hazard Penalty Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy.

Similar presentations

Presentation on theme: "EECE476: Computer Architecture Lecture 19: Pipelining Reducing Control Hazard Penalty Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EECE476: Computer Architecture Lecture 19: Pipelining Reducing Control Hazard Penalty Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy.

Similar presentations

Presentation on theme: "EECE476: Computer Architecture Lecture 19: Pipelining Reducing Control Hazard Penalty Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy."— Presentation transcript:

Similar presentations

About project

Feedback