Computer Architecture: A Constructive Approach Six Stage Pipeline/Bypassing Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute.

Slides:



Advertisements
Similar presentations
Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Advertisements

Lecture 4: CPU Performance
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 5 CS252 Graduate Computer Architecture Spring 2014 Lecture 5: Out-of-Order Processing Krste Asanovic.
COMP381 by M. Hamdi 1 (Recap) Pipeline Hazards. COMP381 by M. Hamdi 2 I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1 Lecture 3 Pipeline Contd. (Appendix A) Instructor: L.N. Bhuyan CS 203A Advanced Computer Architecture Some slides are adapted from Roth.
Advanced Computer Architectures Laboratory on DLX Pipelining Vittorio Zaccaria.
COMP4611 Tutorial 6 Instruction Level Parallelism
2/28/2013 CS152, Spring 2013 CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction Krste.
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 8 Instruction-Level Parallelism – Part 1 Benjamin Lee Electrical and Computer Engineering Duke.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Complex Pipelining II Steve Ko Computer Sciences and Engineering University at Buffalo.
CMSC 611: Advanced Computer Architecture Scoreboard Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
ELEN 468 Advanced Logic Design
February 28, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
February 28, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction.
Multi-cycle Implementations Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology January 13, 2012L6-1
Chapter 12 Pipelining Strategies Performance Hazards.
CS 152 Computer Architecture and Engineering Lecture 12 - Complex Pipelines Krste Asanovic Electrical Engineering and Computer Sciences University of California.
CS 152 Computer Architecture and Engineering Lecture 13 - Out-of-Order Issue and Register Renaming Krste Asanovic Electrical Engineering and Computer Sciences.
CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 5, 2005 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Scoreboarding)
Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
Pipelining - II Rabi Mahapatra Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.
Lecture 12: Pipeline Datapath Design Professor Mike Schulte Computer Architecture ECE 201.
Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.
Lecture 5: Pipelining Implementation Kai Bu
CSC 4250 Computer Architectures September 26, 2006 Appendix A. Pipelining.
Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
Sample Code (Simple) Run the following code on a pipelined datapath: add1 2 3 ; reg 3 = reg 1 + reg 2 nand ; reg 6 = reg 4 & reg 5 lw ; reg.
CET 520/ Gannod1 Section A.8 Dynamic Scheduling using a Scoreboard.
Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration Joel Emer Computer Science & Artificial Intelligence.
CMPE 421 REVIEW: MIDTERM 1. A MODIFIED FIVE-Stage Pipeline PC A Y R MD1 addr inst Inst Memory Imm Ext add rd1 GPRs rs1 rs2 ws wd rd2 we wdata addr wdata.
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010
CDA3101 Recitation Section 8
William Stallings Computer Organization and Architecture 8th Edition
ELEN 468 Advanced Logic Design
Microprocessor Microarchitecture Dynamic Pipeline
Decode and Operand Read
\course\cpeg323-08F\Topic6b-323
The fetch-execute cycle
Computer Organization and ASSEMBLY LANGUAGE
Krste Asanovic Electrical Engineering and Computer Sciences
CS 704 Advanced Computer Architecture
Krste Asanovic Electrical Engineering and Computer Sciences
\course\cpeg323-05F\Topic6b-323
Bypassing Computer Architecture: A Constructive Approach Joel Emer
Pipeline control unit (highly abstracted)
Daxia Ge Friday February 9th, 2007
Pipeline control unit (highly abstracted)
CS203 – Advanced Computer Architecture
Pipeline Control unit (highly abstracted)
RTL for the SRC pipeline registers
Appendix C Practice Problem Set 1
Computer Architecture
Need to stall for one cycle.
Pipelining Hazards.
Presentation transcript:

Computer Architecture: A Constructive Approach Six Stage Pipeline/Bypassing Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 14, L11-1

Three-Cycle SMIPS: Analysis Fetch Decode/RegRead/ Execute/Memory Fetch Does Fetch or DRXM depend on Writeback? March 14, Yes, for register values pc ir Writeback mr Writeback wr L11-2

Stall calculation March 14, Stall calculation WB Stall Reg WB Feedback wbStall L11-3

Data dependence waterfall March 14, From whiteboard L11-4

Types of Data Hazards March 14, Data-dependence r 3  (r 1 ) op (r 2 ) Read-after-Write r 5  (r 3 ) op (r 4 )(RAW) hazard Consider executing a sequence of instructions like: r k (r i ) op (r j ) Anti-dependence r 3  (r 1 ) op (r 2 ) Write-after-Read r 1  (r 4 ) op (r 5 )(WAR) hazard Output-dependence r 3  (r 1 ) op (r 2 ) Write-after-Write r 3  (r 6 ) op (r 7 ) (WAW) hazard L11-5

March 14, Detecting Data Hazards Range and Domain of instruction i R(i) = Registers (or other storage) modified by instruction i D(i) = Registers (or other storage) read by instruction i Suppose instruction j follows instruction i in the program order. Executing instruction j before the effect of instruction i has taken place can cause a RAW hazard if R(i)  D(j)  WAR hazard if D(i)  R(j)  WAW hazard if R(i)  R(j)  L11-6

March 14, Register vs. Memory Data Dependence Data hazards due to register operands can be determined at the decode stage but Data hazards due to memory operands can be determined only after computing the effective address storeM[(r1) + disp1]  (r2) loadr3  M[(r4) + disp2] Does (r1 + disp1) = (r4 + disp2) ? L11-7

March 14, Data Hazards: An Example I 1 DIVDf6, f6,f4 I 2 LDf2,45(r3) I 3 MULTDf0,f2,f4 I 4 DIVDf8,f6,f2 I 5 SUBDf10,f0,f6 I 6 ADDDf6,f8,f2 RAW Hazards WAR Hazards WAW Hazards L11-8

Scoreboard Add a scoreboard of registers in use: Set bit for each register to be updated Clear bit when a register is updated Stall if bit for a register needed is set March 14, R31 R0 Register#(Bit#(32)) scoreboard <- mkReg(0); L11-9

SMIPs Pipeline Analysis StageTclock > Six stage FetchtMtM Decodet DEC Register Readt RF Executet ALU MemorytMtM Writebackt WB March 14,

Six Stage Pipeline March 14, F Fetch D Decode R Reg Read X Execute M Memory W Write- back Where do we need feedback? X to F and W to R L11-11

Six-Stage State module mkProc(Proc); Reg#(Addr) fetchPc <- mkRegU; RFile rf <- mkRFile; Memory mem <- mkMemory; FIFO#(FBundle) fr <- mkFIFO; FIFO#(DecBundle) dr <- mkFIFO; FIFO#(RegBundle) rr <- mkFIFO; FIFO#(EBundle) xr <- mkFIFO; FIFO#(WBundle) mr <- mkFIFO; FIFOF#(Addr) nextPc <- mkFIFOF; FIFOF#(Rindx) wbRind <- mkFIFOF; // and internal control state… March 14,

Six-Stage Fetch rule doFetch; Addr pc = ?; Bool epoch = fetchEpoch; if (nextPc.notEmpty) begin pc = nextPc.first; epoch = !fetchEpoch; nextPc.deq; end else pc = fetchPc + 4; fetchPc <= pc; fetchEpoch <= epoch; let instResp <- mem.op(MemReq{op:Ld, addr:pc, data:?}); fr.enq(FBundle{pc:pc,epoch:epoch,InstResp:instResp}); endrule March 14,

Six-Stage Decode rule doDecode; let fetchInst = fr.first; let pcPlus4 = fetchInst.predpc + 4; let decInst = decode(fetchInst.instResp, pcPlus4); decInst.epoch = fetchInst.epoch; dr.enq(decInst); fr.deq; endrule March 14,

Register Read Stage March 14, R Reg Read Read regs RF W RF must be available to ReadSrcs and Writeback! D Decode L11-15

Register Read Module typedef struct { Data: src1; Data: src2 } Sources; module mkRegRead#(RFile rf)(RegRead); Reg#(Bool) scoreboardReg <- mkReg(False); Dwire#(Bool) scoreboard <-mkDwire(scoreboardReg); rule updateScoreboard; scoreboardReg <= scoreboard; endrule method Maybe#(Sources) readRegs(DecodeInst decInst); let src1 = rf.rd1(decInst.op1); let src2 = … if (scoreboardReg) return tagged Invalid; else return tagged Valid Sources{src1:src1,…}; endmethod March 14,

Register Read Module typedef struct { Data: src1; Data: src2 } Sources; module mkRegRead#(RFile rf)(RegRead); Reg#(Bool) scoreboardReg <- mkReg(False); method Maybe#(Sources) readRegs(DecodeInst decInst); let src1 = rf.rd1(decInst.op1); let src2 = … if (scoreboardReg) return tagged Invalid; else return tagged Valid Sources{src1:src1,…}; endmethod March 14, To use need to instantiate with: let readreg <- mkReadReg(rf); L11-17

Six-Stage Register Read (cont) RWire#(Bool) next_available <-mkRWire; RWire#(Bool) next_unavailable <- mkRWire; method markAvailable(); next_available.wset(False); endmethod method markUnavailable(); next_unavailable.wset(True); endmethod rule updateScoreboard; if (next_unavailable matches tagged Valid.ua) scoreboardReg <= True; else if (next_available matches tagged Valid.a) scoreboardReg <= False; endrule endmodule March 14,

Six-Stage Register Read rule doRegRead; if (wbRind.notEmpty) begin readreg.markAvailable(); wbRind.deq; end let decInst = dr.first(); if (execEpoch != decInst.epoch) begin readreg.markAvailable(); dr.deq() end; else begin maybeSources = readregs.readRegs(decInst); if (maybeSources match tagged Valid.sources) begin if (writesreg(decInst)) rr.markUnavailable(); rr.enq(RegBundle{decodebundle: decInst, src1: srouces.src1, src2: sources.src2}); dr.deq(); end endrule March 14, Better scoreboard will need register index! L11-19

Six-Stage Execute rule doExecute; let decInst = rr.first.decodebundle; let epochChange = (decInst.epoch != execEpoch); let src1 = rr.first.src1; let src2 =... if (! epochChange) begin let execInst = exec.exec(decInst, src1, src2); if (execInst.cond) begin nextPc.enq(execInst.addr); execEpoch <= !execEpoch; end xr.enq(execInst); end rr.deq(); endrule March 14,

Six-Stage Memory rule doMemory; let execInst = xr.first; if (execInst.itype==Ld || execInst.itype==St) begin execInst.data <- mem(MemReq{ op:execInst.itype==Ld ? Ld : St, addr:execInst.addr, data:execInst.data}); end mr.enq(WBBundle{itype:execInst.itype, rdst:execInst.rdst, data:execInst.data}); xr.deq(); endrule March 14,

Six-Stage Writeback rule doWriteBack; wbRind.enq(mr.first.rdst); rf.wr(mr.first.rdst, mr.first.data); mr.deq; endrule March 14,

Six Stage Waterfall March 14, From whiteboard L11-23

Bypass March 14, RegRead Execute rr If scoreboard says register is not available then RegRead needs to stall, unless it sees what it needs on the bypass line…. What does RegRead need to do? L11-24

Bypass Network March 14, typedef struct { Rindx regnum; Data value;} BypassValue; Module mkBypass(BypassNetwork) Rwire#(BypassValue) bypass; method produceBypass(Rindx regnum, Data value); bypass = BypassValue{regname: regnum, value:value}; endmethod method Maybe#(Data) consumeBypass(Rindx regnum); if (bypass matches tagged Valid.b && b.regnum == regnum) return tagged Valid b.value; else return tagged Invalid; endmethod endmodule Real network will have many sources. How are they ordered? From earlier stages to later L11-25

Register Read (with bypass) module mkRegRead#(RFile rf, BypassNetwork bypass)(RegRead); method Maybe#(Sources) readRegs(DecodeInst decInst); let src1 = rf.rd1(decInst.op1); let src2 = … if (!scoreboardReg) return tagged Valid Sources{src1:src1,…}; else begin let b1 = bypass.consumebypass(decInst.op1); let b2 = if (b1 matches tagged Valid.v1 && b2 matches…) return tagged Valid Sources{src1:v1.value …); else return tagged Invalid; end endmethod March 14,