University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

Chapter 4 The Processor. Chapter 4 — The Processor — 2 The Main Control Unit Control signals derived from instruction 0rsrtrdshamtfunct 31:265:025:2120:1615:1110:6.
Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original.
Pipeline Hazards CS365 Lecture 10. D. Barbara Pipeline Hazards CS465 2 Review  Pipelined CPU  Overlapped execution of multiple instructions  Each on.
ECE 445 – Computer Organization
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.
Part 2 - Data Hazards and Forwarding 3/24/04++
Review: MIPS Pipeline Data and Control Paths
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 18 - Pipelined.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
 The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5  We forward that value to later instructions, to prevent.
Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin.
Chapter 4 Pipelining and Instruction Level Parallelism.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Lecture 28: Chapter 4 Today’s topic –Data Hazards –Forwarding 1.
Control Hazards.1 Review: Datapath with Data Hazard Control Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Chapter 4B: The Processor, Part B. Review: Why Pipeline? For Performance! I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
Pipelined Datapath and Control
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (
CMPE 421 Parallel Computer Architecture Part 3: Hardware Solution: Control Hazard and Prediction.
Designing a Pipelined Processor
PROCESSOR PIPELINING YASSER MOHAMMAD. SINGLE DATAPATH DESIGN.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Exceptions and Interrupts “Unexpected” events requiring change in flow of control – Different ISAs use the terms differently Exception – Arises within.
Computer Organization CS224
Handling Exceptions In MIPS, exceptions managed by a System Control Coprocessor (CP0) Save PC of offending (or interrupted) instruction In MIPS: Exception.
Morgan Kaufmann Publishers The Processor Dr. Hussein Al-Zoubi
Note how everything goes left to right, except …
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
ECS 154B Computer Architecture II Spring 2009
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
Pipelining: Advanced ILP
Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.
Chapter 4 The Processor Part 3
Review: MIPS Pipeline Data and Control Paths
Morgan Kaufmann Publishers The Processor
The processor: Exceptions and Interrupts
Xiang Lian The University of Texas-Pan American
The processor: Pipelining and Branching
Morgan Kaufmann Publishers Enhancing Performance with Pipelining
Computer Organization CS224
Lecture 9. MIPS Processor Design – Pipelined Processor Design #2
Morgan Kaufmann Publishers The Processor
Pipelined Control (Simplified)
Control unit extension for data hazards
Morgan Kaufmann Publishers The Processor
The Processor Lecture 3.5: Data Hazards
CSC3050 – Computer Architecture
Morgan Kaufmann Publishers The Processor
ECE 445 – Computer Organization
Pipelining (II).
Morgan Kaufmann Publishers The Processor
Pipelined Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Systems Architecture II
Morgan Kaufmann Publishers The Processor
©2003 Craig Zilles (derived from slides by Howard Huang)
Need to stall for one cycle.
Morgan Kaufmann Publishers The Processor
Presentation transcript:

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline - Hazards October 1, 2009

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 2 Data Hazards in ALU Instructions Consider this sequence: sub $2, $1,$3 and $12,$2,$5 or $13,$6,$2 add $14,$2,$2 sw $15,100($2) We can resolve hazards with forwarding How do we detect when to forward?

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 3 Dependencies & Forwarding

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 4 Detecting the Need to Forward Pass register numbers along pipeline e.g., ID/EX.RegisterRs = register number for Rs sitting in ID/EX pipeline register ALU operand register numbers in EX stage are given by ID/EX.RegisterRs, ID/EX.RegisterRt Data hazards when 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt Fwd from EX/MEM pipeline reg Fwd from MEM/WB pipeline reg

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 5 Detecting the Need to Forward But only if forwarding instruction will write to a register! EX/MEM.RegWrite, MEM/WB.RegWrite And only if Rd for that instruction is not $zero EX/MEM.RegisterRd ≠ 0, MEM/WB.RegisterRd ≠ 0

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 6 Forwarding Paths

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 7 Forwarding Conditions EX hazard if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 8 Double Data Hazard Consider the sequence: add $1,$1,$2 add $1,$1,$3 add $1,$1,$4 Both hazards occur Want to use the most recent Revise MEM hazard condition Only fwd if EX hazard condition isn’t true

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 9 Revised Forwarding Condition MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 10 Datapath with Forwarding

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 11 Load-Use Data Hazard Need to stall for one cycle

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 12 Load-Use Hazard Detection Check when using instruction is decoded in ID stage ALU operand register numbers in ID stage are given by IF/ID.RegisterRs, IF/ID.RegisterRt Load-use hazard when ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt)) If detected, stall and insert bubble

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 13 How to Stall the Pipeline Force control values in ID/EX register to 0 EX, MEM and WB do nop (no-operation) Prevent update of PC and IF/ID register Using instruction is decoded again Following instruction is fetched again 1-cycle stall allows MEM to read data for lw Can subsequently forward to EX stage

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 14 Stall/Bubble in the Pipeline Stall inserted here

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 15 Stall/Bubble in the Pipeline

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 16 Datapath with Hazard Detection

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 17 Stalls and Performance Stalls reduce performance But are required to get correct results Compiler can arrange code to avoid hazards and stalls Requires knowledge of the pipeline structure

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 18 Branch Hazards If branch outcome determined in MEM PC Flush these instructions (Set control values to 0)

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 19 Reducing Branch Delay Move hardware to determine outcome to ID stage Target address adder Register comparator Example: branch taken 36: sub $10, $4, $8 40: beq $1, $3, 7 44: and $12, $2, $5 48: or $13, $2, $6 52: add $14, $4, $2 56: slt $15, $6, $ : lw $4, 50($7)

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 20 Example: Branch Taken

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 21 Example: Branch Taken

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 22 Data Hazards for Branches If a comparison register is a destination of 2 nd or 3 rd preceding ALU instruction … IFIDEXMEMWBIFIDEXMEMWBIFIDEXMEMWBIFIDEXMEMWB add $4, $5, $6 add $1, $2, $3 beq $1, $4, target Can resolve using forwarding

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 23 Data Hazards for Branches If a comparison register is a destination of preceding ALU instruction or 2 nd preceding load instruction Need 1 stall cycle beq stalled IFIDEXMEMWBIFIDEXMEMWB IFID EXMEMWB add $4, $5, $6 lw $1, addr beq $1, $4, target

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 24 Data Hazards for Branches If a comparison register is a destination of immediately preceding load instruction Need 2 stall cycles beq stalled IFIDEXMEMWB IFID EXMEMWB beq stalled lw $1, addr beq $1, $0, target

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 25 Dynamic Branch Prediction In deeper and superscalar pipelines, branch penalty is more significant Use dynamic prediction Branch prediction buffer (aka branch history table) Indexed by recent branch instruction addresses Stores outcome (taken/not taken) To execute a branch Check table, expect the same outcome Start fetching from fall-through or target If wrong, flush pipeline and flip prediction

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 26 1-Bit Predictor: Shortcoming Inner loop branches mispredicted twice! outer: … … inner: … … beq …, …, inner … beq …, …, outer Mispredict as taken on last iteration of inner loop Then mispredict as not taken on first iteration of inner loop next time around

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 27 2-Bit Predictor Only change prediction on two successive mispredictions

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 28 Calculating the Branch Target Even with predictor, still need to calculate the target address 1-cycle penalty for a taken branch Branch target buffer Cache of target addresses Indexed by PC when instruction fetched If hit and instruction is branch predicted taken, can fetch target immediately

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 29 Exceptions and Interrupts “Unexpected” events requiring change in flow of control Different ISAs use the terms differently Exception Arises within the CPU e.g., undefined opcode, overflow, syscall, … Interrupt From an external I/O controller Dealing with them without sacrificing performance is hard

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 30 Handling Exceptions In MIPS, exceptions managed by a System Control Coprocessor (CP0) Save PC of offending (or interrupted) instruction In MIPS: Exception Program Counter (EPC) Save indication of the problem In MIPS: Cause register We’ll assume 1-bit 0 for undefined opcode, 1 for overflow Jump to handler at

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 31 An Alternate Mechanism Vectored Interrupts Handler address determined by the cause Example: Undefined opcode:C Overflow:C …:C Instructions either Deal with the interrupt, or Jump to real handler

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 32 Handler Actions Read cause, and transfer to relevant handler Determine action required If restartable Take corrective action use EPC to return to program Otherwise Terminate program Report error using EPC, cause, …

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 33 Exceptions in a Pipeline Another form of control hazard Consider overflow on add in EX stage add $1, $2, $1 Prevent $1 from being clobbered Complete previous instructions Flush add and subsequent instructions Set Cause and EPC register values Transfer control to handler Similar to mispredicted branch Use much of the same hardware

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 34 Pipeline with Exceptions

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 35 Exception Properties Restartable exceptions Pipeline can flush the instruction Handler executes, then returns to the instruction Refetched and executed from scratch PC saved in EPC register Identifies causing instruction Actually PC + 4 is saved Handler must adjust

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 36 Exception Example Exception on add in 40sub $11, $2, $4 44and $12, $2, $5 48or $13, $2, $6 4Cadd $1, $2, $1 50slt $15, $6, $7 54lw $16, 50($7) … Handler sw $25, 1000($0) sw $26, 1004($0) …

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 37 Exception Example

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 38 Exception Example

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 39 Multiple Exceptions Pipelining overlaps multiple instructions Could have multiple exceptions at once Simple approach: deal with exception from earliest instruction Flush subsequent instructions “Precise” exceptions In complex pipelines Multiple instructions issued per cycle Out-of-order completion Maintaining precise exceptions is difficult!

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 40 Imprecise Exceptions Just stop pipeline and save state Including exception cause(s) Let the handler work out Which instruction(s) had exceptions Which to complete or flush May require “manual” completion Simplifies hardware, but more complex handler software Not feasible for complex multiple-issue out-of-order pipelines

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 41 Fallacies Pipelining is easy (!) The basic idea is easy The devil is in the details e.g., detecting data hazards Pipelining is independent of technology So why haven’t we always done pipelining? More transistors make more advanced techniques feasible Pipeline-related ISA design needs to take account of technology trends e.g., predicated instructions

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 42 Pitfalls Poor ISA design can make pipelining harder e.g., complex instruction sets (VAX, IA-32) Significant overhead to make pipelining work IA-32 micro-op approach e.g., complex addressing modes Register update side effects, memory indirection e.g., delayed branches Advanced pipelines have long delay slots