1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,

Slides:



Advertisements
Similar presentations
Pipeline Hazards CS365 Lecture 10. D. Barbara Pipeline Hazards CS465 2 Review  Pipelined CPU  Overlapped execution of multiple instructions  Each on.
Advertisements

ECE 445 – Computer Organization
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.
Part 2 - Data Hazards and Forwarding 3/24/04++
Review: MIPS Pipeline Data and Control Paths
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 18 - Pipelined.
Pipelining III Andreas Klappenecker CPSC321 Computer Architecture.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
Chapter Six Enhancing Performance with Pipelining
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
 The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5  We forward that value to later instructions, to prevent.
1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.
Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve performance by increasing instruction throughput.
1 Stalls and flushes  So far, we have discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
Pipelined Datapath and Control
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-2 Read Section 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
Basic Pipelining & MIPS Pipelining Chapter 6 [Computer Organization and Design, © 2007 Patterson (UCB) & Hennessy (Stanford), & Slides Adapted from: Mary.
CMPE 421 Parallel Computer Architecture
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.
2/15/02CSE Data Hazzards Data Hazards in the Pipelined Implementation.
CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (
Computing Systems Pipelining: enhancing performance.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline.
PROCESSOR PIPELINING YASSER MOHAMMAD. SINGLE DATAPATH DESIGN.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Chapter Six.
Computer Organization
Stalling delays the entire pipeline
Morgan Kaufmann Publishers The Processor
Single Clock Datapath With Control
Morgan Kaufmann Publishers The Processor
ECS 154B Computer Architecture II Spring 2009
Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.
Chapter 4 The Processor Part 3
Review: MIPS Pipeline Data and Control Paths
Csci 136 Computer Architecture II – Data Hazard, Forwarding, Stall
Chapter 6 Enhancing Performance with Pipelining
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
Computer Organization CS224
Lecture 9. MIPS Processor Design – Pipelined Processor Design #2
Pipelining in more detail
Chapter Six.
Chapter Six.
Control unit extension for data hazards
The Processor Lecture 3.5: Data Hazards
CSC3050 – Computer Architecture
Pipelining (II).
Control unit extension for data hazards
Morgan Kaufmann Publishers The Processor
Control unit extension for data hazards
Systems Architecture II
ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.
Presentation transcript:

1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann, 2007) Pipelining Hazards & Solution

2 COURSE CONTENTS Introduction Instructions Computer Arithmetic Performance Processor: Datapth Processor: Control è Pipelining Techniques Memory Input/Output Devices

3 PIPELINING & ADVANCED TECHNIQUES Data Hazards, Forwarding, Stalls Control Hazards & Exceptions

4 Problem with starting next instruction before first is finished dependencies that “go backward in time” are data hazards Data Hazards IMReg IMReg CC 1CC 2CC 3CC 4CC 5CC 6 Time (in clock cycles) sub $2, $1, $3 Program execution order (in instructions) and $12, $2, $5 IM Reg DMReg IMDMReg IMDMReg CC 7CC 8CC /–20–20–20–20–20 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Value of register $2: DM Reg Reg Reg Reg DM

5 Have compiler guarantee no hazards Method 1: Compiler inserts “nop” sub$2, $1, $3 nop nop and $12, $2, $5 or$13, $6, $2 add$14, $2, $2 sw$15, 100($2) Problem: this really slows us down! Software (Compiler) Solution 1

6 Method 2: Move independent instructions around to eliminate data hazards and fill in bubbles (scheduling) I1and $18, $9, $10 I2 sub $2, $1, $3 I3and $12, $2, $5 I4or $13, $6, $2 I5add $14, $2, $2 I6sw $15, 100($2) I7sub $16, $7, $8 I8add $17, $8, $9 ê Becomes I1and $18, $9, $10 I2 sub $2, $1, $3 I7sub $16, $7, $8 I8add $17, $8, $9 I3and $12, $2, $5 I4or $13, $6, $2 I5add $14, $2, $2 I6sw $15, 100($2) Software (Compiler) Solution 2

7 Use temporary results, don’t wait for them to be written ALU forwarding: ALU output flows in pipe registers and forwarded to ALU input Register file forwarding to handle read/write to same register in same cycle Hardware Solution: Forwarding

8 Forwarding Hardware Forwarding unit: compares appropriate register read & register write addresses of two appropriate stages; if equal, directs multiplexors to forward appropriate data to ALU inputs earlier ForwardA ForwardB

9 Load word can still cause a hazard: an instruction tries to read a register following a load instruction that writes to the same register thus, we need a hazard detection unit to “stall” the instruction Can't Always Forward

10 Stalling We can stall the pipeline by keeping an instruction in the same stage Same effect as inserting a nop instruction lw $2, 20($1) Program execution order (in instructions) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 slt $1, $6, $7 Reg IM Reg Reg IM DM CC 1CC 2CC 3CC 4CC 5CC 6 Time (in clock cycles) IM Reg DMReg IM IM DM Reg IM DM Reg CC 7CC 8CC 9CC 10 DM Reg RegReg Reg bubble Note: Figure bubble drawn a little differently.

11 Hazard Detection Hardware Hazard detection unit: If load destination register number (at ID/EX) matches one of next instruction’s source register numbers (at IF/ID): stall pipeline by (1) deasserting (set to 0s) all control signals in EX, MEM, & WB stages (= insert a “nop”); (2) disabling IF/IDWrite (= repeat same ID stage one more time) and PCWrite (= repeat same IF stage one more time)

12 PC Instruction memory Registers M u x M u x M u x Control ALU EX M WB M WB WB ID/EX EX/MEM MEM/WB Data memory M u x Hazard detection unit Forwarding unit 0 M u x IF/ID I n s t r u c t i o n ID/EX.MemRead I F / I D W r i t e P C W r i t e ID/EX.RegisterRt IF/ID.RegisterRd IF/ID.RegisterRt IF/ID.RegisterRt IF/ID.RegisterRs Rt Rs Rd Rt EX/MEM.RegisterRd MEM/WB.RegisterRd Hazard Detection Hardware

13 Stalling until branch complete is too slow, hence execute guess and move on! At beq, we predict “branch not taken” (branch prediction) and continue the pipe by following instructions When we decide to branch, those instructions are already in the pipe! Need to add hardware to flush instructions if we are wrong Control (Branch) Hazards Reg Reg CC 1 Time (in clock cycles) 40 beq $1, $3, 7 Program execution order (in instructions) IMReg IMDM IMDM IMDM DM DMReg RegReg Reg Reg Reg IM 44 and $12, $2, $5 48 or $13, $6, $2 52 add $14, $2, $2 72 lw $4, 50($7) CC 2CC 3CC 4CC 5CC 6CC 7CC 8CC 9 Reg

14 Possible solutions: Reducing delay of branches: è Move branch execution earlier, then fewer instructions need to be flushed è Move branch decision/preparation to ID stage: è (1) move branch adder from MEM stage to ID stage (since we already have PC value and immediate field in IF/ID pipeline register); è (2) instead of using the ALU to subtract, we test equality of register values (“beq”) by XOR respective bits and then OR the results (0 = equal). This is much faster than using ALU and can be placed in ID stage è Hence only one instruction to flush if branch is taken The instruction to be flushed is in IF stage, hence we add a control line IF.Flush to zeros the instruction field of IF/ID pipeline register --> transform it into “nop” Control (Branch) Hazards

15 Flushing Instructions

16 Dynamic Branch Prediction Dynamic branch prediction – prediction of branches at runtime Several branch prediction techniques: è assume branch not taken (as we have seen so far) è check branch history (need branch prediction buffer or branch history table) – 1 or more bits indicating whether the branch was recently taken è “branch delay slot”: the next instruction after a branch is always executed; rely on compiler to “fill” the slot with something useful è other more advanced techniques

17 Exception Handling Exception – another form of control hazard Exception handling: è flush instructions that followed the interrupted instruction (in addition to IF.Flush, need ID.Flush, EX.Flush, deasserting control lines, preventing writing of inappropriate results) è save return address in EPC è fetch instruction from service routine

18 CPI in Pipelined CPU Example: Assume: memory access = 200 ps, ALU operation = 100 ps, register file read/write = 50 ps half of load instructions are immediately followed by an instruction that uses the results branch delay on misprediction is 1 clock cycle and 1/4 of branches are mispreidcted “jump” pays 1 full clock cycle of delay (calculate target address in ID stage, then know where to fetch next instruction), so their average time in pipe is 2 clock cycles Ignore other hazards Solution: For pipelined, load takes 1 cycle if no load-use dependency and 2 cycles when there is; hence average load = 1.5 clock cycles Store = 1 clock cycle; R-type = 1 clock cycle; jump = 2 cycles Branch takes 1 cycle when predicted correctly, 2 when not; hence average branch = 1.25 cycles CPI = 0.22 x x x x x 2 = 1.17 Cycle time = 200 ps (longest functional unit), average instruction time = 1.17 x 200 ps = 234 ps Comparing to multi-cycle 4.04 x 200 ps = 808 ps (pipelined is 3.45 times faster) and single-cycle = 600 ps (pipelined is 2.56 times faster)

19 Some Examples All modern processors are very complicated The last Intel x86 processor without pipelining was 386 (1985). Later Intel x86 processors (486, Pentium, Pentium Pro, MMX, Pentium II, Pentium III, Pentium IV, …) employ successively more sophisticated pipelining approaches Each Intel processor since Pentium is capable of executing more than one instruction per clock (all superscalar) Each Intel processor since 486 uses a combination of hardwired control (for simple instructions) and microcode control (for more complex instructions) The first MIPS processor (R2000) was pipelined

20 Chapter Summary Three modes of execution: single-cycle, multi-cycle, pipelined Pipelined control strives for 1 clock cycle per instruction, improve instruction throughput, by overlapping of instruction executions Hazards: structural, control, data --> limit pipeline benefits Pipelined datapath: pipeline registers, use of logical components Pipelined control: passing of control signals in pipeline registers Data hazards: compiler solutions, forwarding (forwarding unit), stalling (hazard detection unit) Control hazards: branch prediction, flushing