EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Advanced Pipelining Optimally Scheduling Code Optimally Programming Code Scheduling for Superscalars (6.9) Exceptions (5.6, 6.8)
Lecture Objectives: 1)Define branch prediction. 2)Draw a state machine for a 2 bit branch prediction scheme 3)Explain the impact on the compiler of branch.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Forwarding and Hazards MemberRole William ElliottTeam Leader Jessica Tyler ShulerWiki Specialist Tyler KimseyLead Engineer Cameron CarrollEngineer Danielle.
Pipelining - Hazards.
Instruction-Level Parallelism (ILP)
Pipelining II (1) Fall 2005 Lecture 19: Pipelining II.
EECE476: Computer Architecture Lecture 23: Speculative Execution, Dynamic Superscalar (text 6.8 plus more) The University of British ColumbiaEECE 476©
EECE476: Computer Architecture Lecture 21: Faster Branches Branch Prediction with Branch-Target Buffers (not in textbook) The University of British ColumbiaEECE.
EECE476: Computer Architecture Lecture 22: Zero-cycle Branches (no text) Superpipelining (no text) vs. Superscalar (text 6.8) The University of British.
MIPS Pipeline Default behaviour and pipeline organization The University of British ColumbiaEECE 476© 2005 Guy Lemieux.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H Appendix 4.7.
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Chapter 12 Pipelining Strategies Performance Hazards.
Pipelining III Andreas Klappenecker CPSC321 Computer Architecture.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.
EECE476: Computer Architecture Lecture 19: Pipelining Reducing Control Hazard Penalty Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Goal: Reduce the Penalty of Control Hazards
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
1 Lecture 4: Advanced Pipelines Control hazards, multi-cycle in-order pipelines, static ILP (Appendix A.4-A.10, Sections )
EECE476: Computer Architecture Lecture 17: Pipelining Data Hazards: Forwarding & Stalls Chapter 6.4, 6.5 The University of British ColumbiaEECE 476© 2005.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CMPE 421 Parallel Computer Architecture
EECE 476: Computer Architecture Slide Set #5: Implementing Pipelining Tor Aamodt Slide background: Die photo of the MIPS R2000 (first commercial MIPS microprocessor)
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.
5/13/99 Ashish Sabharwal1 Pipelining and Hazards n Hazards occur because –Don’t have enough resources (ALU’s, memory,…) Structural Hazard –Need a value.
Computing Systems Pipelining: enhancing performance.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Branch Hazards and Static Branch Prediction Techniques
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.
1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.
CS203 – Advanced Computer Architecture Pipelining Review.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Chapter Six.
Computer Organization CS224
Stalling delays the entire pipeline
Pipeline Implementation (4.6)
Pipelining: Advanced ILP
Morgan Kaufmann Publishers The Processor
Lecture 6: Advanced Pipelines
The processor: Pipelining and Branching
Pipeline control unit (highly abstracted)
Chapter Six.
The Processor Lecture 3.6: Control Hazards
Chapter Six.
Instruction Execution Cycle
Pipeline control unit (highly abstracted)
CS203 – Advanced Computer Architecture
Pipeline Control unit (highly abstracted)
Guest Lecturer: Justin Hsia
Presentation transcript:

EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

2 Administratia Project –Phase 1 now online –Phases 2, 3 coming soon… Partner signup deadline: OCTOBER 21 –MUST work in pairs ME IMMEDIATELY IF YOU’RE STILL ALONE –MUST to project TA: 2 names, stud #s, s –Late penalty: 5% of final grade Phase 1: single-cycle CPU –Suggest you finish by November 1 Phase 2: convert to 5-stage pipelined CPU with FDU & HDU –Do phase 1 first !!!! –While doing phase 1, keep in mind that you’ll be adding pipelining later

3 Review: Hazards 3 Types of Hazards –Structural Hazard –Data Hazard –Control Hazard Structural – functional unit is busy –Eg, multicycle multiplier which is not fully pipelined Data Hazards –Cause: Dependency between instructions –3 Types of Data Hazards (RAW, WAR, WAW) Control Hazards –Today!

4 Review: Dependencies Dependency definition –Two instructions Executed closely together in time Share linkage by using a common register –Desired outcome is clear Defined by ORIGINAL PROGRAM ORDER Data hazard problem –Hardware sometimes overlaps or reorders instructions –Potential violation of dependency Solution –Original dependency must be preserved! –Hardware must obey ORIGINAL PROGRAM ORDER semantics

5 Review: Data Hazards Three kinds of data hazards: RAW, WAR, WAW –In each case below, instruction pairs form a dependency –Consider what may happen if the two instructions are re-ordered –Read after Write (aka true dependence) ADD $s0, $s1,$s2 <- writes $s0 ADD $s3,$s0,$s0 <- reads $s0 –Write after Read (aka false dependence) Doesn’t occur in our simple MIPS pipeline ADD $s0,$s1,$s2 <- reads $s1 ADD $s1,$s2,$s2 <- writes $s1 –Write after Write (aka output dependence) Doesn’t occur in our simple MIPS pipeline ADD $s0,$s1,$s2 <- writes $s0 ADD $s0,$s3,$s4 <- writes $s0 WAR, WAW occur in more complex CPUs with multiple pipelines

6 Review: RAW Hazards Main data hazard in our pipeline –Read-After-Write (RAW) Problem –Write to register in first instruction –Read register in subsequent instruction –Read gets stale value from RegFile due to pipelining Possible solutions –Forwarding, stalls, compiler (NOPs or reordering code) Forwarding doesn’t always work –Load-use delay –Result after “M” stage is too late, must stall

7 Control Hazards Control hazards arize due to changes in control flow This is a software thing Not the same as the “control unit” in hardware Control flow –Normal software execution is PC+4 (straight-line) –Only branches, jumps change the execution path We say the “flow of control” has changed –Often, path taken depends on outcome of operation (eg, beq) Data-dependent Hard to predict in advance

8 Pipelined Branching Logic, “beq”

9 Control Hazards Consider “beq” instruction If (Rs – Rt) == 0 then PC  (PC+4) + SgnExt(Imm16) Else PC  (PC+4) (Rs – Rt) computed by ALU, available after “X” –ALU generates “Zero” output –“Zero” decides next PC Controls PCSrc mux: PC+4 or PC+4+SgnExt(Imm16) –“Zero” computed in X stage

10 Branching Example Control hazard example due to branch Consider the machine code 40BEQ $1, $3, 7 44AND $12,$2,$5 48OR $13, $6,$2 52ADD $14, $2,$2 … 72LW $4,50($7) Let’s see the pipeline details BEQ target is ahead by 7*4=28 bytes, 44+28=72

11 Branching Details, Cycle 1 BEQ $1,$3,

12 Branching Details, Cycle 2 BEQ $1,$3, 7 [$1] [$3] 7 44 AND $12,$2,$

13 Branching Details, Cycle 3 AND $12,$2,$5 [$2] [$5] ? BEQ $1,$3, 7 [$1] [$3] 7*4 48 OR $13,$6,$

14 Branching Details, Cycle 4a OR $13,$6,$2 [$2] [$5] ? AND $12,$2,$5 [$2] [$5] ? 52 ADD $14,$2,$ ? BEQ $1,$3,

15 Branching Details, Cycle 4b OR $13,$6,$2 [$2] [$5] ? AND $12,$2,$5 [$2] [$5] ? 52 ADD $14,$2,$ ? BEQ $1,$3,

16 Branching Details, Cycle 5 ADD $14,$2,$2 [$2] ? OR $13,$6,$2 [$6] [$2] ? 56 LW $4,50($7) 72 ? AND $12,$2,$5 ? 76 BEQ $1,$3,7 (nothing left to do here)

17 BEQ Executes in “M” Stage PCSrc generated in “M” stage –Like “lw”, “beq” outcome not available until “M” Consequence? –3 instructions follow “BEQ” into the pipeline –Instructions from PC+4, PC+8, PC+12 What if we take the branch? –3 instructions in pipeline must NOT be executed! –Control hazard arises! –How to prevent their execution?

18 Control Hazard Solution 1: Stall Stall after every branch/jump –Stop fetching new instructions –Let branch/jump advance in pipeline Wait for branch/jump outcome to be decided –After branch/jump decided… –….fetch from final target PC –Resume normal pipelining Performance impact –Always wastes 3 cycles Problem –We can’t recognize the branch instruction until it reaches D stage –Already fetched PC+4 in I stage –Must now flush instruction in I stage

19 Control Hazard Solution 2: Nullify Branch result is in “M” stage –Next 3 instructions (AND,OR,ADD) after branch are fetched –Only partially executed –They alter state of machine (eg, RegFile or DataMem contents) State altered only if they reach M stage or W stage –If branch taken We can squash, cancel or nullify them before they reach M or W How to nullify? –Change control signals to “NOP” instruction (usually all zeros) –Extra control logic! Note –4 th instruction fetched (LW) is correct and is always executed

20 Nullify Performance Impact? If branch is taken –We must nullify 3 instructions –Wasted CPU effort (3 CPU cycles!) If branch is not taken –We did useful work What is frequency of taken vs not taken ? –About 80-90% backward branches are taken (loops) –About 50% of forward branches are taken (if/else statements) Taken is most frequent, so we often nullify !!! –Only a small performance gain

21 Control Hazard Solution 3: Branch Delay Slots Basic idea –Be optimistic and always execute –No stalls, no nullify Propose New ISA Rule –Always execute 3 instructions after branch –Here, we say branch has “3 Branch Delay Slots” Compiler places useful instructions after a branch –Utilizes “wasted” CPU cycles –Turns “disadvantage” into a “feature” ??

22 Code Scheduling for Branch Delay Slots Compiler moves instructions to fill delay slots Unused Delay Slots Filled Delay Slot Delay Slot Sub $t4,$t5,$t6 Add $s1,$s2,$s3

23 Code Scheduling for Branch Delay Slots Compiler checks dependencies to verify it is safe to move instructionsCompiler checks dependencies to verify it is safe to move instructions Unused Delay Slots Filled Delay Slot Delay Slot Sub $t4,$t5,$t6 Add $s1,$s2,$s3

24 Compiler Checks When we move an instruction –Check move is valid – involves checking dependencies –Reordering code is tricky subject Eg, Consider moving instruction forward in code/time from location A to B –Destination register must be ok to move Check no instructions between A and B use the destination register –Source register(s) must be ok to move Check no instructions between A and B change the source registers

25 Branch Delay Slot Performance Impact? Cool idea, but… Benefit –Compiler can put 3 useful instructions after each branch Compiler difficulty –Only ~50% of 1 st delay slots filled with “useful” instruction –Remaining delay slots are even harder! Another problem… –A pipeline detail is now exposed to software ISA Rule is now tied to current pipeline organization –Difficult to change pipeline organization in future Deeper pipeline will need more branch delay slots If we add more branch delay slots, old software will be incompatible

26 Final Word on Branch Delay Slots Delay slots once touted as a “feature” –But are now heavily discouraged MIPS ISA rules (older design) –Delay slots defined in ISA 1 branch delay slot 1 jump delay slot –Compiler must insert “NOP” or independent instruction NIOS-II ISA rules (more recent design) –0 branch delay slots –0 jump delay slots –Hardware choice: either stall or nullify to handle control hazards

27 Directly Reducing Penalty of Control Hazards Control hazards demand solutions: –Stalling –Nullify –Branch delay slots –All of these negatively impact performance (in various ways) Alternative –Can we directly reduce the negative impact of control hazards? Yes! –Execute branch/jump instruction earlier in pipeline –Outcome known sooner –Fetch fewer instructions enter pipeline after branch (before outcome known) –For BEQ, we must detect “equals” earlier