Basic Pipelining CS 3220 Fall 2014 Hadi Esmaeilzadeh Georgia Institute of Technology Some slides adopted from Prof. Milos Prvulovic.

Slides:



Advertisements
Similar presentations
Execution Cycle. Outline (Brief) Review of MIPS Microarchitecture Execution Cycle Pipelining Big vs. Little Endian-ness CPU Execution Time 1 IF ID EX.
Advertisements

ISA Design for the Project CS 3220 Fall 2014 Hadi Esmaeilzadeh Georgia Institute of Technology Some slides adopted from Prof. Milos.
Chapter 1. Basic Structure of Computers
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 13 - A Verilog.
Pipeline Example: cycle 1 lw R10,9(R1) sub R11,R2, R3 and R12,R4, R5 or R13,R6, R7.
ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
1 ITCS 3181 Logic and Computer Systems B. Wilkinson Slides9.ppt Modification date: March 30, 2015 Processor Design.
1 Datapath and Control (Multicycle datapath) CDA 3101 Discussion Section 11.
CIS 314 Fall 2005 MIPS Datapath (Single Cycle and Multi-Cycle)
1  1998 Morgan Kaufmann Publishers We will be reusing functional units –ALU used to compute address and to increment PC –Memory used for instruction and.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Single-Cycle Processor Design CS 3220 Fall 2014 Hadi Esmaeilzadeh Georgia Institute of Technology Some slides adopted from Prof. Milos.
Computer Architecture
Microprocessor Design Multi-cycle Datapath Nia S. Bradley Vijay.
CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath
PCPC addr instr INSTR MEM R1 R2 WR W Data R Data 1 R Data 2 ALU DATA MEM ALU CTRL rs rt op +4 shift 2 zero BRANCH CTRL muxmux sign extend immed 1632 ADDADD.
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
VHDL Development for ELEC7770 VLSI Project Chris Erickson Graduate Student Department of Electrical and Computer Engineering Auburn University, Auburn,
Fall 2007 MIPS Datapath (Single Cycle and Multi-Cycle)
L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?
Lec 8: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
10/18/2005Comp 120 Fall October Questions? Instruction Execution.
Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
CS 300 – Lecture 24 Intro to Computer Architecture / Assembly Language The LAST Lecture!
A Tour of A Five-Stage Processor CS 3220 Fall 2014 Hadi Esmaeilzadeh Georgia Institute of Technology Some slides adopted from Prof.
CMPE 421 Advanced Computer Architecture Supplementary material for Pipelining PART1.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Computer Organization CS224 Fall 2012 Lesson 28. Pipelining Analogy  Pipelined laundry: overlapping execution l Parallelism improves performance §4.5.
1 COMP541 Multicycle MIPS Montek Singh Apr 4, 2012.
Morgan Kaufmann Publishers
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
Sample Code (Simple) Run the following code on a pipelined datapath: add1 2 3 ; reg 3 = reg 1 + reg 2 nand ; reg 6 = reg 4 & reg 5 lw ; reg.
CDA 3101 Fall 2013 Introduction to Computer Organization Multicycle Datapath 9 October 2013.
1 COMP541 Datapaths II & Control I Montek Singh Mar 22, 2010.
Design, Memory, and Debugging CS 3220 Fall 2014 Hadi Esmaeilzadeh Georgia Institute of Technology Some slides adopted from Prof. Milos.
COMP541 Multicycle MIPS Montek Singh Mar 25, 2010.
Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
Branch Prediction CS 3220 Fall 2014 Hadi Esmaeilzadeh Georgia Institute of Technology Some slides adopted from Prof. Milos Prvulovic.
PC Instruction Memory Address Instr. [31-0] 4 Fig 4.6 p 309 Instruction Fetch.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Circuit Optimization CS 3220 Fall 2014 Hadi Esmaeilzadeh Georgia Institute of Technology Some slides adopted from Prof. Milos Prvulovic.
PROCESSOR PIPELINING YASSER MOHAMMAD. SINGLE DATAPATH DESIGN.
Multicycle datapath.
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-1 Read Sections 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State.
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
RISC Pipelining CS 147 Spring 2011 Kui Cheung
CS161 – Design and Architecture of Computer Systems
Exceptions Another form of control hazard Could be caused by…
RISC Pipelining RISC Pipelining CS 147 Spring 2011 Kui Cheung.
Processor (I).
The fetch-execute cycle
Pipelining review.
Pipelining in more detail
The Processor Lecture 3.6: Control Hazards
Guest Lecturer TA: Shreyas Chand
Multicycle Approach We will be reusing functional units
Instruction Execution Cycle
PIPELINING Santosh Lakkaraju CS 147 Dr. Lee.
Multicycle Design.
Pipelining (II).
Morgan Kaufmann Publishers The Processor
CS161 – Design and Architecture of Computer Systems
Presentation transcript:

Basic Pipelining CS 3220 Fall 2014 Hadi Esmaeilzadeh Georgia Institute of Technology Some slides adopted from Prof. Milos Prvulovic

Two-Stage Pipeline  Why not go directly to five stages? – This is what we had in CS 2200!  Will have more stages in Project 3, but – We want to start with something easier Lots of things become more complicated with more stages Let’s first deal with simple versions of some of these complications – Will learn how to decide when/how to add stages Start with two, then decide if we want more and where to split 25 Feb 2014Basic Pipeline2

Pipelining Decisions  What gets done in which stage – Memory address for data reads must come from FFs Memory read must be at the start of some stage With only two stages, this has to be stage two! – Must be in first stage Fetch, and all the other stuff needed for memory address: decode, read regs, ALU (or at least the add for memaddr) – Must be in last stage Read memory, write result to register – Where does branch/jump stuff go As early as possible (will see why) => first stage 25 Feb 2014Basic Pipeline3

Control Creating a 2-stage pipeline 25 Feb 2014Basic Pipeline4 Instr Mem Instr Mem PCPC PCPC MXMX RF Data Mem Data Mem MXMX MXMX MXMX MXMX SE 4 A D

Control Pipeline FFs in Verilog 25 Feb 2014Basic Pipeline5 Instr Mem PCPC MXMX RF Data Mem MXMX MXMX SE ADD ALU 4 A D assign dmemaddr=aluout; reg [31:0] aluout_M; clk) aluout_M<=aluout; assign dmemaddr=aluout_M;

Two-Stage Pipeline  So far we have – Stage 1: Fetch, ReadReg, ALU – Stage 2: Read/Write Memory, WriteReg  What is left to decide? – Where is the PC incremented? Input: PC (available at start of stage 1) Work: Increment (doable in one cycle) Do it in stage 1! – Where do we make branch taken/not-taken decisions? Depends… try in cycle 1, but if this is critical path, try to break it up 25 Feb 2014Basic Pipeline6

Keep things simple  Our goal is to get this working! – Handling each type of hazard will complicate things – Avoid doing things that create hazards  Structural hazards? – Noooo! We will put enough hardware to not have any!  Control hazards? 25 Feb 2014Basic Pipeline7

Data hazard example ADD R1,R2,R3 ADD R4,R2,R1  What happens in our two stage pipeline? C1: aluout_M<=R2+R3 C2: R1<=aluout_M; aluout_M<=R2+R1 (problem!) C3: R4<=aluout_M 25 Feb 2014Basic Pipeline8

Data hazard example 2 LW R1,0(R2) ADDR3,R1,R4  What happens in our two stage pipeline? C1: aluout_M<=0+R2 C2: R1<=mem[aluout_M]; aluout_M<=R1+R4 25 Feb 2014Basic Pipeline9

Preventing data hazards  Simplest solution for HW designers – Tell programmers not to create data hazards! ADD R1,R2,R3 ADD R4,R2,R1 25 Feb 2014Basic Pipeline10 What if we have nothing to put here? NOP

What is a NOP?  Does not do anything  How about AND R0,R0,R0 ? – Whatever is in R0, leaves it unchanged  Why is this not a good NOP? ; Initially R0 is some random value XORR0,R0,R0 NOP ; Becomes AND R0,R0,R0 ADDISP,R0,StackTop ADDI A1,R0,1 25 Feb 2014Basic Pipeline11 What is in SP now? What is in A1 now?

Need a real NOP  Actually does nothing – Not just “writes the same value” – wrreg,wrmem, isbranch, isjump, etc. must all be zero!  None of our instructions is a truly perfect NOP  So let’s add one! – Hijack existing instruction, e.g. AND R0,R0,R0 ? It works! This instruction is not supposed to do anything anyway! – Add a separate instruction (and spend an opcode) Also works! But spend a secondary opcode  Let’s use ALUR with op2=1111 (and all other bits 0) – NOP translates to instruction word 32’h000000F 25 Feb 2014Basic Pipeline12

Control hazards  No problem if all insts update PC in first stage – PC+4 is easy, but branches and jumps not so easy – What if PC+4 in cycle 1, but the rest in cycle 2 JALRA,Func(Zero) BNE RV,Zero,BadResult ADDT0,RV,RV … BadResult: … Func: 25 Feb 2014Basic Pipeline13 C1: PC<=PC+4 C2: Fetch C2: PC<=Func C3: Fetch C3: PC=<BadResult C4: Fetch

Preventing control hazards  Simplest solution for HW designers – Tell programmers that branch/jump has delayed effect – Delay slot: inst after branch/jump executed anyway JALRA,Func(Zero) NOP ; Delay slot BNE RV,Zero,BadResult NOP ; Delay slot … 25 Feb 2014Basic Pipeline14

Deeper pipelines  Need more NOPs – More instructions between reg write and reg read Hard to find useful insts to put there => NOPs – More delay slots to survive control hazards Hard to find useful insts to put there => NOPs  Problem 1: Performance – Note that CPI is 1, but program has more instructions!  Problem 2: Portability – Program must change if we change the pipeline What works for 2-stage needs more NOPs to run on 3-stage, etc. 25 Feb 2014Basic Pipeline15

Architecture vs. Microarchitecture  Architecture – What the programmer must know about our machine  Microarchitecture – How we implement our processor – Can write correct code without knowing this  Our hazards solution – Pipelining = microachitecture – Delay slots, etc. = architecture – We changed architecture (in a backward-incompatible way) to make our microarchitecture work correctly! 25 Feb 2014Basic Pipeline16

Proper handling of hazards  Programs (executables) don’t change – Test2.mif, Sorter2.mif from Project2 still run correctly  Must fight hazard problems in hardware – Our big weapon: “flush” an instruction from pipeline – Our better weapon: “stall” some stages in the pipeline – Our precision weapon: forwarding Can’t fix everything, but helps reduce the number of flushed insts 25 Feb 2014Basic Pipeline17

What is a flush  Flush an inst from some stage of the pipeline == convert the instruction into a real NOP  Note: cannot flush any inst from any stage – Can’t flush inst that already modified architected state E.g. if SW already wrote to memory, can’t flush it correctly E.g. if BEQ/BNE/JAL already modified the PC, can’t flush it correctly  To prevent hazards from doing damage – Must detect which instructions should be flushed – And then flush these instructions early enough! 25 Feb 2014Basic Pipeline18

The Rules of Flushing  When we must flush an instruction – When not doing so will produce wrong result  When we can flush an instruction – Almost any time we want (if early enough), but must guarantee forward progress E.g. can’t just flush every single instruction as soon as fetched  Lots of room between the can and the must – For performance, get as close to “must” as possible – For simplicity, may do some “can but not must” flushes 25 Feb 2014Basic Pipeline19

Simple flush-based hazard handling  Find out K, the worst-case number of NOPs – # of NOPs between insts that prevents all hazards – E.g. in out 2-stage pipeline it’s 1 NOP  If stages numbered 1..N, we flush the first K stages whenever a non-NOP inst in stage K+1 – E.g. in our 2-stage pipeline, we would flush stage 1 whenever a non-NOP is in stage 2 – What is the resulting CPI for the 2-stage piepline? 25 Feb 2014Basic Pipeline20

Fewer flushes…  Data hazards - when we don’t have to flush – If without flushing NOPs would not be needed If inst in stage K+1 has wrreg=0, E.g. SW doesn’t need NOPs after it If inst in stage K+1 writes to regno we don’t read E.g. ADD R1,R2,R3 can be safely followed by ADD R2,R3,R4 – If forwarding or stalling fixes the problem We’ll talk about this later  Control hazards – when we don’t have to flush – If we fetched from the correct place, e.g. if we fetched from PC+4 and BEQ not taken 25 Feb 2014Basic Pipeline21

Flushing in Verilog code – For a pipeline FF between some stages A and M: clk or negedge reset) if(reset) wrreg_M<=1’b0; else wrreg_M<=wrreg_A; flush_A?1’b0:wrreg_A; 25 Feb 2014Basic Pipeline22

Stalling  Stops instructions in early stages of the pipeline to let farther-along instructions produce results – Creates a “bubble” (a NOP) between the stopped instructions and the ones that continue to move  For data hazards, stalls can entirely eliminate flushes – The bubble NOP is like a NOP we inserted into the program But without changing the program – Why is a stall better than a flush? When flushing some stage S (because of a dependence), must also flush stages 1..S-1 (can’t execute insts out-of-order) – Adds S new NOPs to the execution When stalling stage S, must also stall stages 1..S-1 – But each stall cycle inserts only one NOP (in stage S+1) – Control hazard => we fetched wrong instructions Delaying them won’t solve anything, so they must be flushed 25 Feb 2014Basic Pipeline23

When to Stall  Like flushes, the “must” and the “can” differ – No real must: we can avoid hazards by flushing – But we want to stall if that can avoid a flush – And we can stall whenever it’s convenient Must still ensure forward progress!  Stalling to handle data dependences – Simplest (and slowest) approach: Stall read-regs stage until nothing remains in later stages With 2-stage, stall stage 1 if a non-NOP is in stage 2 – Faster but more complex approaches Stall until no register-writing instruction remains in later stages Stall until no inst that writer to my src registers remains in later stages Stall until forwarding can get us the values we need 25 Feb 2014Basic Pipeline24

Stalling in Verilog code For a pipeline FF between some stages A and M: clk or negedge reset) if(reset) wrreg_M<=1’b0; else if(flush_A) wrreg_M<=1’b0; else if(!stall_A) wrreg_M<=wrreg_A; – Note 1: if stalling stage X, must also stall stages before it – Note 2: when stalling fetch stage, don’t let PC change! 25 Feb 2014Basic Pipeline25

How to do Project 3  Get it working with NOPs in the code – Change code to add NOPs in the right places – Note: “right” places will change with pipeline depth – This gets you 30 points  Get it working with “heavy” stalls and flushing – Must run with original code (no NOPs added in.a32) – With “flush K” support in the pipeline: +20 points – More points if you use stalls to make it faster  Then try to use smarter stalls and flushing – Very little of this will get you the other 50 points 25 Feb 2014Basic Pipeline26

Smart stalling example  With two stages (F and M): assign stall_F=wrreg_M && ( (wregno_M==rregno1_F) || (wregno_M==rregno2_F) ); 25 Feb 2014Basic Pipeline27

Smart stalling with more stages  Which stage to stall? – The first stage where hazard makes us do something wrong that we won’t fix later – With two stages, this is first stage We read wrong value from regs, and we use that wrong value in ALU  With five stages w/o forwarding, this is reg-read – Wrong value from reg, must stall to read again  With five stages w/ forwarding? – Reading wrong reg value is OK, forwarding fixes that – But if we forward the wrong value stall the stage in which we do forwarding! 25 Feb 2014Basic Pipeline28

Staling >1 stage  If we stall stage X, must also stall stages 1..X-1  Depending on what is done in which stage, different hazards might stall different stages  In general, with stages A,B,C,etc.: assign stallto_A= ; assign stallto_B= ; assign stallto_C= ; … assign stall_A=stallto_A||stall_B; assign stall_B=stallto_B||stall_C; … 25 Feb 2014Basic Pipeline29 Use these in the actual code that stalls pipeline-FF writes This is in your hazard detection logic