The Processor Lecture 3.5: Data Hazards

Slides:



Advertisements
Similar presentations
ECE 445 – Computer Organization
Advertisements

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.
Part 2 - Data Hazards and Forwarding 3/24/04++
Review: MIPS Pipeline Data and Control Paths
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
Chapter Six Enhancing Performance with Pipelining
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
 The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5  We forward that value to later instructions, to prevent.
Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin.
Lecture 28: Chapter 4 Today’s topic –Data Hazards –Forwarding 1.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
1 Stalls and flushes  So far, we have discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
Chapter 4B: The Processor, Part B. Review: Why Pipeline? For Performance! I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
Pipelined Datapath and Control
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-2 Read Section 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
Basic Pipelining & MIPS Pipelining Chapter 6 [Computer Organization and Design, © 2007 Patterson (UCB) & Hennessy (Stanford), & Slides Adapted from: Mary.
CMPE 421 Parallel Computer Architecture
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
2/15/02CSE Data Hazzards Data Hazards in the Pipelined Implementation.
CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (
CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Designing a Pipelined Processor
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-1 Read Sections 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Computer Organization
Stalling delays the entire pipeline
Note how everything goes left to right, except …
Single Clock Datapath With Control
Appendix C Pipeline implementation
ECS 154B Computer Architecture II Spring 2009
ECS 154B Computer Architecture II Spring 2009
\course\cpeg323-08F\Topic6b-323
ECE232: Hardware Organization and Design
Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.
Chapter 4 The Processor Part 3
Review: MIPS Pipeline Data and Control Paths
Morgan Kaufmann Publishers The Processor
Csci 136 Computer Architecture II – Data Hazard, Forwarding, Stall
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
Pipelining review.
Single-cycle datapath, slightly rearranged
Computer Organization CS224
Pipelining in more detail
\course\cpeg323-05F\Topic6b-323
Pipelined Control (Simplified)
Pipeline control unit (highly abstracted)
The Processor Lecture 3.6: Control Hazards
The Processor Lecture 3.4: Pipelining Datapath and Control
The Processor Lecture 3.2: Building a Datapath with Control
Instruction Execution Cycle
Pipeline control unit (highly abstracted)
Pipeline Control unit (highly abstracted)
Pipelining (II).
Morgan Kaufmann Publishers The Processor
Stalls and flushes Last time, we discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
©2003 Craig Zilles (derived from slides by Howard Huang)
Need to stall for one cycle.
ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.
Presentation transcript:

The Processor Lecture 3.5: Data Hazards Be aware that this first part of new chapter 4 is review for this class, so doesn’t go into detail. If your students are learning computer organization for the first time, this set of slides needs to be expanded greatly.

Learning Objectives Explain how data hazards happen Insert the minimum number of nop operations to resolve data hazard Understand the philosophy of data forwarding Explain how to detect a data hazard Describe how to stall instructions in the pipeline Specify how to fill the gaps between stalled instructions and non-stalled instructions Generate nop instructions in the middle of pipeline

Coverage Chapter 4.7

Data Hazard Chapter 4.7, page 303

Can Pipelining Get Us Into Trouble? Yes: Pipeline Hazards structural hazards: attempt to use the same resource by two different instructions at the same time Structural hazards are solved by duplicating the necessary components data hazards: attempt to use data before it is ready An instruction’s source operand(s) are produced by a prior instruction still in the pipeline control hazards: attempt to make a decision about program control flow before the condition has been evaluated and the new PC target address calculated branch and jump instructions, exceptions Note that data hazards can come from R-type instructions or lw instructions Exceptions can’t be resolved by waiting! Can usually resolve hazards by waiting pipeline control must detect the hazard and take action to resolve hazards

A Single Memory Would Be a Structural Hazard Time (clock cycles) Reading data from memory ALU Mem Reg lw I n s t r. O r d e ALU Mem Reg Inst 2 ALU Mem Reg Inst 3 ALU Mem Reg Inst 4 Reading instruction from memory ALU Mem Reg Inst 5 Fix with separate instr and data memories (I$ and D$)

How About Register File Access? Time (clock cycles) Fix simple register file hazard by doing writes in the first half of the cycle and reads in the second half ALU IM Reg DM add $1, I n s t r. O r d e ALU IM Reg DM Inst 2 ALU IM Reg DM Inst 3 ALU IM Reg DM add $2,$1, Define register writes to occur in the first half of the cycle and register reads to occur in the second half

Register Usage Can Cause Data Hazards Example: All the dependent actions are shown in color, and “CC 1” at the top of the figure means clock cycle 1. The first instruction writes into $2, and all the following instructions read $2. This register is written in clock cycle 5, so the proper value is unavailable before clock cycle 5. (A read of a register during a clock cycle returns the value written at the end of the first half of the cycle, when such a write occurs.) The colored lines from the top datapath to the lower ones show the dependences. Those that must go backward in time are pipeline data hazards.

Register Usage Can Cause Data Hazards Dependencies backward in time cause hazards ALU IM Reg DM add $1,$8,$9 I n s t r. O r d e ALU IM Reg DM sub $4,$1,$5 ALU IM Reg DM and $6,$1,$7 ALU IM Reg DM or $8,$1,$9 For lecture ALU IM Reg DM xor $4,$1,$5 Read After Write data hazard

Loads Can Cause Data Hazards Dependencies backward in time cause hazards ALU IM Reg DM lw $1,4($2) I n s t r. O r d e ALU IM Reg DM sub $4,$1,$5 ALU IM Reg DM and $6,$1,$7 ALU IM Reg DM or $8,$1,$9 Note that lw is just another example of register usage (beyond ALU ops) ALU IM Reg DM xor $4,$1,$5 Load-use data hazard Another Read After Write hazard

Formal Definitions of Data Hazards Consider two instructions i and j, with i occurring before j in program order Three data hazards RAW (read after write) j tries to read a source before i writes it, so j incorrectly gets the old value WAW (write after write) j tries to write an operand before it is written by i, leaving the value written by i rather than the value written by j in the destination WAR (write after read) j tries to write a destination before it is read by i, so i incorrectly gets the new value In the basic 5-stage pipeline, WAW and WAR dependences do not cause any hazards Register reads take place at the 2nd stage. Register writes take place at the 5th stage. Use examples here.

One Way to “Fix” a Data Hazard Can fix data hazard by waiting – stall – but impacts CPI ALU IM Reg DM add $1, I n s t r. O r d e stall (insert nop) stall (insert nop) sub $4,$1,$5 and $6,$1,$7 ALU IM Reg DM Stall the instructions. The pipeline cannot be stalled.

Another Way to “Fix” a Data Hazard Fix data hazards by forwarding results as soon as they are available to where they are needed ALU IM Reg DM add $1, I n s t r. O r d e ALU IM Reg DM sub $4,$1,$5 ALU IM Reg DM and $6,$1,$7 ALU IM Reg DM For lecture Forwarding paths are valid only if the destination stage is later in time than the source stage. Forwarding is harder if there are multiple results to forward per instruction or if they need to write a result early in the pipeline. Notice that for now we are showing the forwarded data coming out of the ALU. After looking at the problem more closely, we will see that it is really supplied by the pipeline register EX/MEM or MEM/WB and will depict is as such. or $8,$1,$9 ALU IM Reg DM xor $4,$1,$5

Data Forwarding (aka Bypassing) Take the result from the earliest point where it exists in any of the pipeline registers and forward it to the functional units (e.g., the ALU) that need it in that cycle For ALU functional unit: the inputs can come from any pipeline register rather than just from ID/EX by adding multiplexors to both inputs of the ALU connecting the register write data in EX/MEM or MEM/WB to both ALU mux inputs in the EX’s stage adding the proper control hardware to control the new muxes With forwarding the processor can achieve a CPI close to 1 even in the presence of data dependencies

Forwarding Illustration ALU IM Reg DM add $1, I n s t r. O r d e ALU IM Reg DM sub $4,$1,$5 ALU IM Reg DM and $6,$7,$1 Now we see that the forwarded data is supplied by the pipeline register EX/MEM or MEM/WB. EX forwarding MEM forwarding

Detecting the Need to Forward Pass register numbers along pipeline ID/EX.RegisterRs register number of Rs stored in ID/EX pipeline register ID/EX.RegisterRt register number of Rt stored in ID/EX pipeline register EX/MEM.RegisterRd and MEM/WB.RegisterRd register number of destination register stored in EX/MEM and MEM/WB pipeline registers ALU operand register numbers in EX stage are given by ID/EX.RegisterRs, ID/EX.RegisterRt

Detecting the Need to Forward Data hazards only if 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt But only if forwarding instruction will write to a register! EX/MEM.RegWrite MEM/WB.RegWrite And only if Rd for that instruction is not $zero EX/MEM.RegisterRd ≠ 0 MEM/WB.RegisterRd ≠ 0 Fwd from previous instr., i.e., EX/MEM pipeline register Fwd from second previous instruction, i.e., MEM/WB pipeline register

Datapath with Forwarding Hardware PCSrc ID/EX EX EX/MEM Control M M IF/ID WB WB Add MEM/WB Branch Add 4 Shift left 2 WB Instruction Memory Read Addr 1 Data Memory Register File Read Data 1 Read Addr 2 Read Address PC Read Data ForwardA Address Write Addr ALU Read Data 2 Write Data Write Data ALU cntrl Highlighting the two control signals, i.e., ForwardA and ForwardB 16 32 Sign Extend ForwardB Forward Unit

Control Values for the Forwarding Multiplexors Mux control Source Explanation ForwardA=00 ID/EX The first ALU operand comes from the register file. ForwardA=10 EX/MEM The first ALU operand is forwarded from the prior ALU result. ForwardA=01 MEM/WB The first ALU operand is forwarded from the data memory or an earlier ALU result. ForwardB=00 The second ALU operand comes from the register file. ForwardB=10 The second ALU operand is forwarded from the prior ALU result. ForwardB=01 The second ALU operand is forwarded from the data memory or an earlier ALU result.

Yet Another Complication! Another potential data hazard can occur when there is a conflict between the outputs of EX/MEM pipeline register and MEM/WB pipeline register – which should be forwarded? I n s t r. O r d e ALU IM Reg DM add $1,$1,$2 add $1,$1,$3 ALU IM Reg DM For class handout ALU IM Reg DM add $1,$1,$4

Yet Another Complication! Another potential data hazard can occur when there is a conflict between the outputs of EX/MEM pipeline register and MEM/WB pipeline register – which should be forwarded? I n s t r. O r d e ALU IM Reg DM The forwarding we want to avoid add $1,$1,$2 add $1,$1,$3 ALU IM Reg DM For lecture What we want ALU IM Reg DM add $1,$1,$4

Statement for Forwarding Control Signals (in C) ForwardA: if ( EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10; else if ( MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01; else ForwardA = 00; Forwards the result from the previous instr. to either input of the ALU Forwards the result from the second previous instr. to either input of the ALU No forwarding ForwardB The logic is similar

Datapath with Forwarding Hardware PCSrc Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 16 32 ALU Shift left 2 Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB WB M EX Control cntrl Branch Forward Unit 00 01 10 00 01 For lecture. How many bits wide is each pipeline register now? PC – 32 IF/ID – 32*2 ID/EX – 9 + 32x4 + 10 +5 = 152 EX/MEM – 5 + 1 + 32*3 + 5 = 107 MEM/WB – 2 + 32*2 + 5 = 71 Control line inputs to Forward Unit EX/MEM.RegWrite and MEM/WB.RegWrite not shown on diagram 10 EX/MEM.RegisterRd MEM/WB.RegisterRd ID/EX.RegisterRt ID/EX.RegisterRs

Data Hazards and Stalls Chapter 4.7, page 313

Forwarding with Load-use Data Hazards (logical view) ALU IM Reg DM lw $1,4($2) I n s t r. O r d e ALU IM Reg DM sub $4,$1,$5 ALU IM Reg DM and $6,$1,$7 xor $4,$1,$5 or $8,$1,$9 ALU IM Reg DM For lecture The one case where forwarding cannot save the day is when an instruction tries to read a register following a load instruction that writes the same register. ALU IM Reg DM

Forwarding with Load-use Data Hazards (logical view) ALU IM Reg DM lw $1,4($2) I n s t r. O r d e ALU IM Reg DM stall sub $4,$1,$5 ALU Reg DM Reg sub $4,$1,$5 ALU and $6,$1,$7 xor $4,$1,$5 or $8,$1,$9 IM IM Reg DM Reg For lecture The one case where forwarding cannot save the day is when an instruction tries to read a register following a load instruction that writes the same register. ALU IM Reg DM ALU IM Reg DM Will still need one stall cycle even with forwarding

Load-use Hazard Detection Unit Need a Hazard detection Unit in the ID stage that inserts a stall between the load and its use ID Hazard detection Unit: if (ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt))) stall the pipeline (more accurate, stall the instructions in fetch and decode stages) The first line tests to see if the instruction now in the EX stage is a lw; the next two lines check to see if the destination register of the lw matches either source register of the instruction in the ID stage (the use instruction) After this one cycle stall, the forwarding logic can handle the remaining data hazards

Hazard/Stall Hardware Along with the Hazard Unit, we have to implement the stall Prevent the instructions in the IF and ID stages from progressing down the pipeline – done by preventing the PC register and the IF/ID pipeline register from changing Hazard detection Unit controls the writing of the PC (PC.write) and IF/ID (IF/ID.write) registers Insert a “bubble” between the lw instruction (in the EX stage) and the “use” instruction (in the ID stage) (i.e., insert a nop in the execution stream) Set the control bits in the EX, MEM, and WB control fields of the ID/EX pipeline register to 0 (nop). The Hazard Unit controls the mux that chooses between the real control values and the 0’s. Let the lw instruction and the following instructions in the pipeline proceed normally down the pipeline

Adding the Hazard/Stall Hardware PCSrc ID/EX.MemRead Hazard Unit ID/EX IF/ID.Write ID/EX.RegisterRt EX EX/MEM PC.Write M M IF/ID 1 WB WB Control Add MEM/WB Branch Add 4 Shift left 2 WB Instruction Memory Read Addr 1 Data Memory Register File Read Data 1 Read Addr 2 Read Address PC Read Data Address Write Addr ALU Read Data 2 Write Data Write Data ALU cntrl For lecture In reality, only the signals RegWrite and MemWrite need to be 0, the other control signals can be don’t cares. Another consideration is energy – where clock gating is called for. 16 32 Sign Extend Forward Unit

Stall/Bubble in the Pipeline

A Challenge: Memory-to-Memory Copies For loads immediately followed by stores (memory-to-memory copies), a stall can be avoided by adding forwarding hardware from the MEM/WB register to the data memory input. Would need to add a Forward Unit and a mux to the MEM stage I n s t r. O r d e ALU lw $1,4($2) IM Reg DM Reg What hazard it is? RAW hazard. It is covered by the forwarding logic so far? No. Is it possible to solve? Yes. What if lw was replaced with add $1, - is forwarding still needed? From where, to where? If $1 is only the destination register in the instruction following lw instruction, the forwarding is not needed. What if $1 was used to compute the effective address (it would be a load-use data hazard and would require a stall insertion between the lw and sw) ALU sw $1,4($3) IM Reg DM Reg