Note how everything goes left to right, except …

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

Pipeline Example: cycle 1 lw R10,9(R1) sub R11,R2, R3 and R12,R4, R5 or R13,R6, R7.
ECE 445 – Computer Organization
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.
Part 2 - Data Hazards and Forwarding 3/24/04++
Review: MIPS Pipeline Data and Control Paths
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 18 - Pipelined.
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
Chapter Six Enhancing Performance with Pipelining
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
 The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5  We forward that value to later instructions, to prevent.
Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin.
Lecture 28: Chapter 4 Today’s topic –Data Hazards –Forwarding 1.
1 Stalls and flushes  So far, we have discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
Supplementary notes for pipelining LW ____,____ SUB ____,____,____ BEQ ____,____,____ ; assume that, condition for branch is not satisfied OR ____,____,____.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
Pipelined Datapath and Control
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-2 Read Section 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.
Basic Pipelining & MIPS Pipelining Chapter 6 [Computer Organization and Design, © 2007 Patterson (UCB) & Hennessy (Stanford), & Slides Adapted from: Mary.
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.
2/15/02CSE Data Hazzards Data Hazards in the Pipelined Implementation.
CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (
CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-1 Read Sections 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Stalling delays the entire pipeline
Single-Cycle Datapath and Control
CDA 3101 Spring 2016 Introduction to Computer Organization
Computer Architecture
Single Clock Datapath With Control
Appendix C Pipeline implementation
Single Cycle Processor
ECS 154B Computer Architecture II Spring 2009
ECS 154B Computer Architecture II Spring 2009
\course\cpeg323-08F\Topic6b-323
ECE232: Hardware Organization and Design
Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.
Chapter 4 The Processor Part 3
Review: MIPS Pipeline Data and Control Paths
Morgan Kaufmann Publishers The Processor
Csci 136 Computer Architecture II – Data Hazard, Forwarding, Stall
Morgan Kaufmann Publishers The Processor
Chapter 4 The Processor Part 2
SOLUTIONS CHAPTER 4.
Single-cycle datapath, slightly rearranged
Computer Organization CS224
A pipeline diagram Clock cycle lw $t0, 4($sp) IF ID
\course\cpeg323-05F\Topic6b-323
Pipelined Control (Simplified)
Pipeline control unit (highly abstracted)
The Processor Lecture 3.6: Control Hazards
The Processor Lecture 3.2: Building a Datapath with Control
The Processor Lecture 3.5: Data Hazards
Instruction Execution Cycle
Pipeline control unit (highly abstracted)
Pipeline Control unit (highly abstracted)
Pipelining (II).
Morgan Kaufmann Publishers The Processor
Introduction to Computer Organization and Architecture
Pipelining - 1.
A relevant question Assuming you’ve got: One washer (takes 30 minutes)
Stalls and flushes Last time, we discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
©2003 Craig Zilles (derived from slides by Howard Huang)
Pipelined datapath and control
ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.
Presentation transcript:

Note how everything goes left to right, except … 1 PCSrc 4 IF/ID ID/EX EX/MEM MEM/WB Add Add P C Shift left 2 RegWrite Read register 1 Read data 1 MemWrite ALU Read address Instruction [31-0] Zero Read register 2 Read data 2 1 Result Address Write register Data memory Instruction memory MemToReg Registers ALUOp Write data ALUSrc Write data Read data 1 Instr [15 - 0] Sign extend RegDst MemRead Instr [20 - 16] 1 Instr [15 - 11]

©2003 Craig Zilles (derived from slides by Howard Huang) Forwarding Previously, we introduced a pipelined MIPS processor which executes several instructions simultaneously. Each instruction requires five stages, and five cycles, to complete. Each stage uses different functional units of the datapath. So we can execute up to five instructions in any clock cycle, with each instruction in a different stage and using different hardware. Today we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding. September 10, 2018 ©2003 Craig Zilles (derived from slides by Howard Huang)

The pipelined datapath Read address Instruction memory [31-0] Address Write data Data MemWrite MemRead 1 MemToReg 4 Shift left 2 Add ALUSrc Result Zero ALU ALUOp Instr [15 - 0] RegDst register 1 register 2 register data 2 data 1 Registers RegWrite Instr [15 - 11] Instr [20 - 16] IF/ID ID/EX EX/MEM MEM/WB Control M WB P C PCSrc Sign extend EX September 10, 2018 Forwarding

Pipeline diagram review Clock cycle 1 2 3 4 5 6 7 8 9 lw $8, 4($29) IF ID EX MEM WB sub $2, $4, $5 and $9, $10, $11 or $16, $17, $18 add $13, $14, $0 This diagram shows the execution of an ideal code fragment. Each instruction needs a total of five cycles for execution. One instruction begins on every clock cycle for the first five cycles. One instruction completes on each cycle from that time on. September 10, 2018 Forwarding

Our examples are too simple Here is the example instruction sequence used to illustrate pipelining on the previous page. lw $8, 4($29) sub $2, $4, $5 and $9, $10, $11 or $16, $17, $18 add $13, $14, $0 The instructions in this example are independent. Each instruction reads and writes completely different registers. Our datapath handles this sequence easily, as we saw last time. But most sequences of instructions are not independent! September 10, 2018 Forwarding

An example with dependencies sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) September 10, 2018 Forwarding

Data hazards in the pipeline diagram Clock cycle 1 2 3 4 5 6 7 8 9 sub $2, $1, $3 IF ID EX MEM WB and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) The SUB instruction does not write to register $2 until clock cycle 5. This causes two data hazards in our current pipelined datapath. The AND reads register $2 in cycle 3. Since SUB hasn’t modified the register yet, this will be the old value of $2, not the new one. Similarly, the OR instruction uses register $2 in cycle 4, again before it’s actually updated by SUB. September 10, 2018 Forwarding

Things that are okay Clock cycle 1 2 3 4 5 6 7 8 9 sub $2, $1, $3 IF ID EX MEM WB and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) The ADD instruction is okay, because of the register file design. Registers are written at the beginning of a clock cycle. The new value will be available by the end of that cycle. The SW is no problem at all, since it reads $2 after the SUB finishes. September 10, 2018 Forwarding

Dependency arrows Clock cycle 1 2 3 4 5 6 7 8 9 sub $2, $1, $3 IF ID EX MEM WB and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Arrows indicate the flow of data between instructions. The tails of the arrows show when register $2 is written. The heads of the arrows show when $2 is read. Any arrow that points backwards in time represents a data hazard in our basic pipelined datapath. Here, hazards exist between instructions 1 & 2 and 1 & 3. September 10, 2018 Forwarding

A fancier pipeline diagram Clock cycle 1 2 3 4 5 6 7 8 9 DM Reg IM sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) DM Reg IM DM Reg IM DM Reg IM DM Reg IM September 10, 2018 Forwarding

A more detailed look at the pipeline We have to eliminate the hazards, so the AND and OR instructions in our example will use the correct value for register $2. When is the data actually produced and consumed? What can we do? Clock cycle 1 2 3 4 5 6 7 sub $2, $1, $3 IF ID EX MEM WB and $12, $2, $5 or $13, $6, $2 September 10, 2018 Forwarding

Pipleline Registers to the rescue! Pipeline stages communicate through pipeline registers: IF/ID ID/EX EX/MEM MEM/WB We “forward” data from pipeline registers to later instructions Instruction memory Data 1 PC ALU Registers Rd Rt IF/ID ID/EX EX/MEM MEM/WB ALU output available here

Forwarding The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5 We forward that value to later instructions, to prevent data hazards: In clock cycle 4, AND gets the value $1 - $3 from EX/MEM In cycle 5, OR gets that same result from MEM/WB Clock cycle 1 2 3 4 5 6 7 DM Reg IM sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 DM Reg IM DM Reg IM

Outline of forwarding hardware A forwarding unit selects the correct ALU inputs for the EX stage: No hazard: ALU’s operands come from the register file, like normal Data hazard: operands come from either the EX/MEM or MEM/WB pipeline registers instead The ALU sources will be selected by two new multiplexers, with control signals named ForwardA and ForwardB DM Reg IM sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 DM Reg IM DM Reg IM

Simplified datapath with forwarding muxes ForwardA Instruction memory Data 1 PC ALU Registers Rd Rt IF/ID ID/EX EX/MEM MEM/WB 2 ForwardB September 10, 2018 Forwarding

Detecting EX/MEM data hazards When do we need to know that a hazard exists? So how can the hardware determine if a hazard exists? DM Reg IM sub $2, $1, $3 and $12, $2, $5 September 10, 2018 Forwarding

EX/MEM data hazard equations The first ALU source comes from the pipeline register when necessary. if (EX/MEM.RegWrite = 1 and EX/MEM.RegisterRd = ID/EX.RegisterRs) then ForwardA = 2 The second ALU source is similar. and EX/MEM.RegisterRd = ID/EX.RegisterRt) then ForwardB = 2 DM Reg IM sub $2, $1, $3 and $12, $2, $5 September 10, 2018 Forwarding

Detecting MEM/WB data hazards A MEM/WB hazard may occur between an instruction in the EX stage and the instruction from two cycles ago. One new problem is if a register is updated twice in a row. add $1, $2, $3 add $1, $1, $4 sub $5, $5, $1 Register $1 is written by both of the previous instructions; from which instruction should it receive its value? DM Reg IM add $1, $2, $3 add $1, $1, $4 sub $5, $5, $1 September 10, 2018 Forwarding

MEM/WB hazard equations Here is an equation for detecting and handling MEM/WB hazards for the first ALU source. if (MEM/WB.RegWrite = 1 and MEM/WB.RegisterRd = ID/EX.RegisterRs and (EX/MEM.RegisterRd ≠ ID/EX.RegisterRs or EX/MEM.RegWrite = 0) then ForwardA = 1 The second ALU operand is handled similarly. and MEM/WB.RegisterRd = ID/EX.RegisterRt and (EX/MEM.RegisterRd ≠ ID/EX.RegisterRt or EX/MEM.RegWrite = 0) then ForwardB = 1 September 10, 2018 Forwarding

Simplified datapath with forwarding IF/ID ID/EX EX/MEM MEM/WB PC 1 2 ForwardA Registers Instruction memory ALU 1 2 Data memory 1 ForwardB Rt 1 Rd EX/MEM.RegisterRd Rs ID/EX. RegisterRt Forwarding Unit EX/MEM.RegWrite MEM/WB.RegisterRd ID/EX. RegisterRs MEM/WB.RegWrite September 10, 2018 Forwarding

The forwarding unit The forwarding unit has several control signals as inputs. ID/EX.RegisterRs EX/MEM.RegisterRd MEM/WB.RegisterRd ID/EX.RegisterRt EX/MEM.RegWrite MEM/WB.RegWrite (The two RegWrite signals are not shown in the diagram, but they come from the control unit.) The fowarding unit outputs are selectors for the ForwardA and ForwardB multiplexers attached to the ALU. These outputs are generated from the inputs using the equations on the previous pages. Some new buses route data from pipeline registers to the new muxes. September 10, 2018 Forwarding

Example sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Assume again each register initially contains its number plus 100. After the first instruction, $2 should contain -2 (101 - 103). The other instructions should all use -2 as one of their operands. We’ll try to keep the example short. Assume no forwarding is needed except for register $2. We’ll skip the first two cycles, since they’re the same as before. September 10, 2018 Forwarding

Clock cycle 3 September 10, 2018 Forwarding IF: or $13, $6, $2 ID: and $12, $2, $5 EX: sub $2, $1, $3 IF/ID ID/EX EX/MEM MEM/WB PC 101 2 1 2 102 101 5 Registers Instruction memory ALU 103 X 105 1 2 103 -2 Data memory X 1 5 (Rt) 1 2 12 (Rd) 2 EX/MEM.RegisterRd 2 (Rs) ID/EX. RegisterRt Forwarding Unit 3 ID/EX. RegisterRs 1 MEM/WB.RegisterRd September 10, 2018 Forwarding

Clock cycle 4: forwarding $2 from EX/MEM IF: add $14, $2, $2 ID: or $13, $6, $2 EX: and $12, $2, $5 MEM: sub $2, $1, $3 IF/ID ID/EX EX/MEM MEM/WB PC 102 6 1 2 106 -2 2 2 Registers Instruction memory ALU -2 105 X 102 1 2 105 104 Data memory X 1 2 (Rt) 1 12 13 (Rd) 12 EX/MEM.RegisterRd 6 (Rs) ID/EX. RegisterRt 2 Forwarding Unit 5 2 MEM/WB.RegisterRd ID/EX. RegisterRs -2 September 10, 2018 Forwarding

Clock cycle 5: forwarding $2 from MEM/WB IF: sw $15, 100($2) ID: add $14, $2, $2 EX: or $13, $6, $2 MEM: and $12, $2, $5 WB: sub $2, $1, $3 IF/ID ID/EX EX/MEM MEM/WB PC 106 2 1 2 -2 106 2 Registers Instruction memory ALU 104 102 2 -2 1 2 -2 -2 Data memory -2 -2 X 1 -2 1 2 (Rt) 1 13 14 (Rd) 13 EX/MEM.RegisterRd 2 2 (Rs) ID/EX. RegisterRt 12 Forwarding Unit 2 ID/EX. RegisterRs 6 2 MEM/WB.RegisterRd 104 -2 September 10, 2018 Forwarding

Lots of data hazards The first data hazard occurs during cycle 4. The forwarding unit notices that the ALU’s first source register for the AND is also the destination of the SUB instruction. The correct value is forwarded from the EX/MEM register, overriding the incorrect old value still in the register file. A second hazard occurs during clock cycle 5. The ALU’s second source (for OR) is the SUB destination again. This time, the value has to be forwarded from the MEM/WB pipeline register instead. There are no other hazards involving the SUB instruction. During cycle 5, SUB writes its result back into register $2. The ADD instruction can read this new value from the register file in the same cycle. September 10, 2018 Forwarding

Complete pipelined datapath...so far ID/EX WB EX/MEM Control M WB MEM/WB IF/ID EX M WB PC Read register 1 Read data 1 1 2 Addr Instr Read register 2 ALU Zero ALUSrc Write register Read data 2 Result Address 1 2 Instruction memory 1 Data memory Write data Registers Write data Read data Instr [15 - 0] 1 RegDst Extend Rt 1 Rd EX/MEM.RegisterRd Rs Forwarding Unit MEM/WB.RegisterRd September 10, 2018 Forwarding

What about stores? Two “easy” cases: 1 2 3 4 5 6 add $1, $2, $3 DM Reg IM add $1, $2, $3 sw $4, 0($1) IM Reg DM Reg 1 2 3 4 5 6 DM Reg IM add $1, $2, $3 sw $1, 0($4) IM Reg DM Reg September 10, 2018 Forwarding

Store Bypassing: Version 1 EX: sw $4, 0($1) MEM: add $1, $2, $3 IF/ID ID/EX EX/MEM MEM/WB PC Read register 1 Read data 1 1 2 Addr Instr Read register 2 ALU Zero ALUSrc Write register Read data 2 Result Address 1 2 Instruction memory 1 Data memory Write data Registers Write data Read data Instr [15 - 0] 1 RegDst Extend Rt 1 Rd EX/MEM.RegisterRd Rs Forwarding Unit MEM/WB.RegisterRd September 10, 2018 Forwarding

Store Bypassing: Version 2 EX: sw $1, 0($4) MEM: add $1, $2, $3 IF/ID ID/EX EX/MEM MEM/WB PC Read register 1 Read data 1 1 2 Addr Instr Read register 2 ALU Zero ALUSrc Write register Read data 2 Result Address 1 2 Instruction memory 1 Data memory Write data Registers Write data Read data Instr [15 - 0] 1 RegDst Extend Rt 1 Rd EX/MEM.RegisterRd Rs Forwarding Unit MEM/WB.RegisterRd September 10, 2018 Forwarding

What about stores? A harder case: In what cycle is: The load value available? The store value needed? What do we have to add to the datapath? 1 2 3 4 5 6 DM Reg IM lw $1, 0($2) sw $1, 0($4) IM Reg DM Reg September 10, 2018 Forwarding

Load/Store Bypassing: Extend the Datapath ForwardC IF/ID ID/EX EX/MEM 1 MEM/WB PC Read register 1 Read data 1 1 2 Addr Instr Read register 2 ALU Zero ALUSrc Write register Read data 2 Result Address 1 2 Instruction memory 1 Data memory Write data Registers Write data Read data Instr [15 - 0] 1 RegDst Extend Rt 1 Rd EX/MEM.RegisterRd Rs Sequence : lw $1, 0($2) sw $1, 0($4) Forwarding Unit MEM/WB.RegisterRd September 10, 2018 Forwarding

Miscellaneous comments Each MIPS instruction writes to at most one register. This makes the forwarding hardware easier to design, since there is only one destination register that ever needs to be forwarded. Forwarding is especially important with deep pipelines like the ones in all current PC processors. Section 6.4 of the textbook has some additional material not shown here. Their hazard detection equations also ensure that the source register is not $0, which can never be modified. There is a more complex example of forwarding, with several cases covered. Take a look at it! September 10, 2018 Forwarding

Summary In real code, most instructions are dependent upon other ones. This can lead to data hazards in our original pipelined datapath. Instructions can’t write back to the register file soon enough for the next two instructions to read. Forwarding eliminates data hazards involving arithmetic instructions. The forwarding unit detects hazards by comparing the destination registers of previous instructions to the source registers of the current instruction. Hazards are avoided by grabbing results from the pipeline registers before they are written back to the register file. Next time we’ll finish up pipelining. Forwarding can’t save us in some cases involving lw. We still haven’t talked about branches for the pipelined datapath. September 10, 2018 Forwarding