Pipeline Hazards CS365 Lecture 10. D. Barbara Pipeline Hazards CS465 2 Review  Pipelined CPU  Overlapped execution of multiple instructions  Each on.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

ECE 445 – Computer Organization
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.
Part 2 - Data Hazards and Forwarding 3/24/04++
Review: MIPS Pipeline Data and Control Paths
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 18 - Pipelined.
Chapter Six Enhancing Performance with Pipelining
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
 The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5  We forward that value to later instructions, to prevent.
1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.
Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Lecture 28: Chapter 4 Today’s topic –Data Hazards –Forwarding 1.
Control Hazards.1 Review: Datapath with Data Hazard Control Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register.
1 Stalls and flushes  So far, we have discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Enhancing Performance with Pipelining Slides developed by Rami Abielmona and modified by Miodrag Bolic High-Level Computer Systems Design.
Chapter 4B: The Processor, Part B. Review: Why Pipeline? For Performance! I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
Pipelined Datapath and Control
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
11/13/2015 8:57 AM 1 of 86 Pipelining Chapter 6. 11/13/2015 8:57 AM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
Basic Pipelining & MIPS Pipelining Chapter 6 [Computer Organization and Design, © 2007 Patterson (UCB) & Hennessy (Stanford), & Slides Adapted from: Mary.
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
2/15/02CSE Data Hazzards Data Hazards in the Pipelined Implementation.
CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (
Computing Systems Pipelining: enhancing performance.
1/24/ :00 PM 1 of 86 Pipelining Chapter 6. 1/24/ :00 PM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline.
CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Designing a Pipelined Processor
PROCESSOR PIPELINING YASSER MOHAMMAD. SINGLE DATAPATH DESIGN.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-1 Read Sections 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State.
LECTURE 9 Pipeline Hazards. PIPELINED DATAPATH AND CONTROL In the previous lecture, we finalized the pipelined datapath for instruction sequences which.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Lecture 9 Pipeline Hazards.
Computer Organization CS224
Stalling delays the entire pipeline
Note how everything goes left to right, except …
Morgan Kaufmann Publishers The Processor
ECS 154B Computer Architecture II Spring 2009
Pipelining: Advanced ILP
Data Hazards and Stalls
Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.
Chapter 4 The Processor Part 3
Review: MIPS Pipeline Data and Control Paths
Csci 136 Computer Architecture II – Data Hazard, Forwarding, Stall
Morgan Kaufmann Publishers The Processor
The processor: Pipelining and Branching
Computer Organization CS224
Pipelined Control (Simplified)
Control unit extension for data hazards
The Processor Lecture 3.5: Data Hazards
CSC3050 – Computer Architecture
Pipelining (II).
Control unit extension for data hazards
Morgan Kaufmann Publishers The Processor
Control unit extension for data hazards
Systems Architecture II
Pipelining - 1.
©2003 Craig Zilles (derived from slides by Howard Huang)
ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.
Presentation transcript:

Pipeline Hazards CS365 Lecture 10

D. Barbara Pipeline Hazards CS465 2 Review  Pipelined CPU  Overlapped execution of multiple instructions  Each on a different stage using a different major functional unit in datapath IF, ID, EX, MEM, WB Same number of stages for all instruction types  Improved overall throughput Effective CPI=1 (ideal case)

D. Barbara Pipeline Hazards CS465 3 Recap: Pipelined Datapath

D. Barbara Pipeline Hazards CS465 4 Recap: Pipeline Hazards  Hazards prevent next instruction from executing during its designated clock cycle  Structural hazards: attempt to use the same resource two different ways at the same time One memory  Data hazards: attempt to use data before it is ready Instruction depends on result of prior instruction still in the pipeline  Control hazards: attempt to make a decision before condition is evaluated Branch instructions  Pipeline implementation need to detect and resolve hazards

D. Barbara Pipeline Hazards CS465 5 Data Hazards  An example: what if initially $2=10, $1=10, $3=30? Fig. 6.28

D. Barbara Pipeline Hazards CS465 6 Resolving Data Hazard  Register file design: allow a register to be read and written in the same clock cycle:  Always write a register in the first half of CC and read it in the second half of that CC  Resolve the hazard between sub and add in previous example  Insert NOP instructions, or independent instructions by compiler  NOP: pipeline bubble  Detect the hazard, then forward the proper value  The good way

D. Barbara Pipeline Hazards CS465 7 Forwarding  From the example, sub $2, $1, $3 IF ID EX MEM WB and $12, $2, $5 IF ID EX MEM WB or$13, $6, $2 IF ID EX MEM WB  And and or needs the value of $2 at EX stage  Valid value of $2 generated by sub at EX stage  We can execute and and or without stalls if the result can be forwarded to them directly  Forwarding  Need to detect the hazards and determine when/to which instruciton data need to be passed

D. Barbara Pipeline Hazards CS465 8 Data Hazard Detection  From the example, sub $2, $1, $3 IF ID EX MEM WB and $12, $2, $5 IF ID EX MEM WB or$13, $6, $2 IF ID EX MEM WB  And and or needs the value of $2 at EX stage  For first two instructions, need to detect hazard before and enters EX stage (while sub about to enter MEM)  For the 1st and 3rd instructions, need to detect hazard before or enters EX (while sub about to enter WB)  Hazard detection conditions: EX hazard and MEM hazard  1a. EX/MEM.RegisterRd=ID/EX.RegisterRs  1b. EX/MEM.RegisterRd=ID/EX.RegisterRt  2a. MEM/WB.RegisterRd= ID/EX.RegisterRs  2b. MEM/WB.RegisterRd= ID/EX.RegisterRt

D. Barbara Pipeline Hazards CS465 9 Add Forwarding Paths

D. Barbara Pipeline Hazards CS Refine Hazard Detection Condition  Conditions 1 and 2 are true, but instruction occurs earlier does not write registers  No hazard  Check RegWrite signal in the WB field of the EX/MEM and MEM/WB pipeline register  Condition 1 and 2 are true, but RegisterRd is $0  Register $0 should always keep zero and any non-zero result should not be forwarded  No hazard

D. Barbara Pipeline Hazards CS New Hazard Detection Conditions  EX hazard if ( EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if ( EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10  One instruction ahead

D. Barbara Pipeline Hazards CS New Hazard Detection Conditions  MEM Hazard if ( MEM/WB.RegWrite and (MEM/WB.RegisterRd !=0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if ( MEM/WB.RegWrite and (MEM/WB.RegisterRd !=0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01  Two instructions ahead

D. Barbara Pipeline Hazards CS New Complication  For code sequence: add $1, $1, $2, add $1, $1, $3, add $1, $1, $4  The third instruction depends on the second, not the first  Should forward the ALU result from the second instruction  For MEM hazard, need to check additionally: EX/MEM.RegisterRd!=ID/EX.RegisterRs EX/MEM.RegisterRd!=ID/EX.RegisterRt

D. Barbara Pipeline Hazards CS Refined Hazard Detection Conditions  MEM Hazard if ( MEM/WB.RegWrite and (MEM/WB.RegisterRd !=0) and (EX/MEM.RegisterRd != ID/EX.RegisterRs) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if ( MEM/WB.RegWrite and (MEM/WB.RegisterRd !=0) and (EX/MEM.RegisterRd != ID/EX.RegisterRt) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

D. Barbara Pipeline Hazards CS Datapath with Forwarding Path

D. Barbara Pipeline Hazards CS Example  Show how forwarding works with the following instruction sequence sub$2, $1, $3 and$4, $2, $5 or$4, $4, $2 add$9, $4, $2

D. Barbara Pipeline Hazards CS Clock 3

D. Barbara Pipeline Hazards CS Clock 4

D. Barbara Pipeline Hazards CS Clock 5

D. Barbara Pipeline Hazards CS Clock 6

D. Barbara Pipeline Hazards CS Sign-Extension(lw/sw) Adding ALUSrc Mux to Datapath Fig. 6.33

D. Barbara Pipeline Hazards CS Forwarding Can’t do Anything!  When a load instruction that writes a register followed by an instruction reading the same register forwarding does not help  Stall the pipeline

D. Barbara Pipeline Hazards CS Hazard Detection  In order to insert the stall(bubble), we need an additional hazard detection unit  Detect at ID stage, why?  Detection logic if ( ID/EX.MemRead and ( (ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt) )) stall the pipeline  Stall the pipeline at ID stage  Set all control signals to 0, inserting a bubble (NOP operation)  Keep IF/ID unchanged – repeat the previous cycle  Keep PC unchanged – refetch the same instruction  Add PCWrite and IF/IDWrite control to data hazard detection logic

D. Barbara Pipeline Hazards CS Pipelined Control Fig. 6.36: Control w/ Hazard Detection and Data Forwarding Units

D. Barbara Pipeline Hazards CS Example – Clock 2

D. Barbara Pipeline Hazards CS Clock 3

D. Barbara Pipeline Hazards CS Clock 4

D. Barbara Pipeline Hazards CS Clock 5

D. Barbara Pipeline Hazards CS Clock 6

D. Barbara Pipeline Hazards CS Clock 7

D. Barbara Pipeline Hazards CS How about Store Word?  SW can cause data hazards too  Does the forwarding help?  Does the existing forwarding hardware help?  Easy case if SW depends on ALU operations  What if a LW immediately followed by a SW?

D. Barbara Pipeline Hazards CS LW and SW Sign-Ext lw$5, 0($15) … sw$4, 100($5) lw$5, 0($15) sw$8, 100($5) lw $5, 0($15) sw $5, 100($15)

D. Barbara Pipeline Hazards CS SW is in MEM Stage MEM/WB.RegWrite and EX/MEM.MemWrite and MEM/WB.RegisterRt = EX/MEM.RegisterRt and MEM/WB.RegisterRt != 0 Sign-Ext EX/MEM Data memory lw sw lw$5, 0($15) sw$5, 100($15)

D. Barbara Pipeline Hazards CS SW is In EX Stage ID/EX.MemWrite and MEM/WB.RegWrite and MEM/WB.RegisterRt = ID/EX.RegisterRt(Rs) and MEM/WB.RegisterRt != 0 Sign-Ext lw sw

D. Barbara Pipeline Hazards CS Outline  Data hazards  When does a data hazard happen? Data dependencies  Using forwarding to overcome data hazards Data is available after ALU stage Forwarding conditions  Stall the pipeline for load-use instructions Data is available after MEM stage (lw instruction) Hazard detection conditions  Next: control hazards

D. Barbara Pipeline Hazards CS Branch Hazards Control hazard: branch has a delay in determining the proper inst to fetch

D. Barbara Pipeline Hazards CS Branch Hazards flush Decision is made here

D. Barbara Pipeline Hazards CS Observations  Basic implementation  Branch decision does not occur until MEM stage  3 CCs are wasted  How to decide branch earlier and reduce delay  In EX stage - two CCs branch delay  In ID stage - one CC branch delay  How? For beq $x, $y, label, $x xor $y then or all bits, much faster than ALU operation Also we have a separate ALU to compute branch address May need additional forwarding and suffer from data hazards

D. Barbara Pipeline Hazards CS Decide Branch Earlier IF.Flush

D. Barbara Pipeline Hazards CS Pipelined Branch – An Example 36: 10 $4 $8 40: IF.Flush 44:

D. Barbara Pipeline Hazards CS : Pipelined Branch – An Example

D. Barbara Pipeline Hazards CS Observations  Basic implementation  Branch decision does not occur until MEM stage  3 CCs are wasted  How to decide branch earlier and reduce delay  In EX stage - two CCs branch delay  In ID stage - one CC branch delay  How? For beq $x, $y, label, $x xor $y then or all bits, much faster than ALU operation Also we have a separate ALU to compute branch address May need additional forwarding and suffer from data hazards  3 strategies to further improve  Branch delay slot; static branch prediction; dynamic branch prediction

D. Barbara Pipeline Hazards CS Branch Delay Slot  Will always execute the instruction scheduled for the branch delay slot  Normally only one instruction in the slot  Executed no matter the branch is taken or not  Done by compiler or assembler  Need to be able to identify an independent instruction and schedule it after the branch  Losing popularity  Why? More pipeline stages Issue more instructions per cycle

D. Barbara Pipeline Hazards CS Independent instruction, best choice Choice b is good when branch taking probability is high It must be OK to execute the sub instruction when the branch goes to the unexpected direction Scheduling the Branch Delay Slot

D. Barbara Pipeline Hazards CS Static Branch Prediction  Predict a branch as taken or not-taken  Predict not-taken continues sequential fetching and execution: simplest  If prediction is wrong, clear the effect of sequential instruction execution  How to discard instructions in the pipeline? Branch decision is made at ID stage: only need to flush IF/ID pipeline register!  Problem: different branch/program vary a lot  Misprediction ranges from 9% to 59% for SPEC

D. Barbara Pipeline Hazards CS Dynamic Branch Prediction  Static branch prediction is crude!  Take history into consideration  If a branch was taken last time, then fetching the new instruction from the same place  Branch history table / branch prediction buffer One entry for each branch, containing a bit (or bits) which tells whether the branch was recently taken or not Indexed by the lower bits of the branch instruction Table lookup might occur in stage IF How many bits for each table entry? Is the prediction correct?

D. Barbara Pipeline Hazards CS Dynamic Branch Prediction  Simplest approach: 1-bit prediction  Use 1 bit for each BHT entry Record whether or not branch taken last time Always predict branch will behave the same as last time  Problem: even if a branch is almost always taken, we will likely predict incorrectly twice Consider a loop: T, T, …, T, NT, T, T, … Mis-prediction will cause the single prediction bit flipped

D. Barbara Pipeline Hazards CS Dynamic Branch Prediction  2-bit saturating counter:  A prediction must miss twice before changed  FSA: 0-not taken, 1-taken  Improved noise tolerance  N-bit saturating counter  Predict taken if counter value > 2 n-1  2-bit counter gets most of the benefit

D. Barbara Pipeline Hazards CS In-Class Exercise  Consider a loop branch that is taken nine times in a row, then is not taken once. What is the prediction accuracy for this branch?  Assuming we initialize to predict taken  1-bit prediction?  With 2-bit prediction? Prediction Taken Prediction not Taken taken Not taken taken Not taken taken

D. Barbara Pipeline Hazards CS Hazards and Performance  Ideal pipelined performance: CPI ideal =1  Hazards introduce additional stalls  CPI pipelined =CPI ideal +Average stall cycles per instruction  Example  Half of the load followed immediately by an instruction that uses the result  Branch delay on misprediciton is 1 cycle and 1/4 of the branches are mispredicted  Jumps always pay 1 cycle of delay  Instruction mix: load 25%, store 10%, branches 11%, jumps 2%, ALU 52%  What is the average CPI?

D. Barbara Pipeline Hazards CS Hazards and Performance  Example (CPI ideal =1)  CPI pipelined =CPI ideal +Average stall cycles per inst  Half of the load followed immediately by an instruction that uses the result  Branch delay on misprediciton is 1 cycle and 1/4 of the branches are mispredicted  Jumps always pay 1 cycle of delay  Instruction mix: load 25%, store 10%, branches 11%, jumps 2%, ALU 52%  Average CPI=1.5  25%+1  10%+1.25  11%+2  2%+1  52% = 1.17  CPI load = 1.5  CPI branch = 1.25  CPI jump = 2

D. Barbara Pipeline Hazards CS Exceptions  Exceptions: events other than branch or jump that change the normal flow of instruction  Arithmetic overflow, undefined instruction, etc  Internal of the processor  Interrupts from external – IO interrupts  Use arithmetic overflow as an example  When an overflow is detected, we need to transfer control to the exception handling routine immediately because we do not want this invalid value to contaminate other registers or memory locations  Similar idea as branch hazard  Detected in the EX stage  De-assert all control signals in EX and ID stages, flush IF/ID

D. Barbara Pipeline Hazards CS Exceptions Fig. 6.42

D. Barbara Pipeline Hazards CS Example sub$11, $2, $4 and$12, $2, $5 or$13, $2, $6 add$1, $2, $1-- overflow occurs slt$15, $6, $7 lw$16, 50($7) Exceptions handling routine: hex sw$25, 1000($0) hex sw$26, 1004($0)

D. Barbara Pipeline Hazards CS Example

D. Barbara Pipeline Hazards CS Example

D. Barbara Pipeline Hazards CS Summary  Pipeline hazards detection and resolving  Data hazards Forwarding Detection and stall  Control hazards Branch delay slot Static branch prediction Dynamic branch prediction  Exception  Detection and handling

D. Barbara Pipeline Hazards CS Next Lecture  Topic:  Memory hierarchy  Reading  Patterson & Hennessy Ch7