CS 61C L19 Pipelining II (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #19: Pipelining II Andy Carle
CS 61C L19 Pipelining II (2) A Carle, Summer 2005 © UCB Review: Datapath for MIPS Use datapath figure to represent pipeline IFtchDcdExecMemWB ALU I$ Reg D$Reg PC instruction memory +4 rt rs rd registers ALU Data memory imm 1. Instruction Fetch 2. Decode/ Register Read 3. Execute4. Memory 5. Write Back
CS 61C L19 Pipelining II (3) A Carle, Summer 2005 © UCB Review: Problems for Computers Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) Control hazards: Pipelining of branches & other instructions stall the pipeline until the hazard; “bubbles” in the pipeline Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock)
CS 61C L19 Pipelining II (4) A Carle, Summer 2005 © UCB Review: C.f. Branch Delay vs. Load Delay Load Delay occurs only if necessary (dependent instructions). Branch Delay always happens (part of the ISA). Why not have Branch Delay interlocked? Answer: Interlocks only work if you can detect hazard ahead of time. By the time we detect a branch, we already need its value … hence no interlock is possible!
CS 61C L19 Pipelining II (5) A Carle, Summer 2005 © UCB FYI: Historical Trivia First MIPS design did not interlock and stall on load-use data hazard Real reason for name behind MIPS: Microprocessor without Interlocked Pipeline Stages Word Play on acronym for Millions of Instructions Per Second, also called MIPS Load/Use Wrong Answer!
CS 61C L19 Pipelining II (6) A Carle, Summer 2005 © UCB Outline Pipeline Control Forwarding Control Hazard Control
CS 61C L19 Pipelining II (7) A Carle, Summer 2005 © UCB Piped Proc So Far …
CS 61C L19 Pipelining II (8) A Carle, Summer 2005 © UCB New Representation: Regs more explicit Exec Reg. File Data Mem ABS Reg File PC Next PC IR Inst. Mem DMS IF/DE DE/EX EX/ME ME/WB IF/DE.Ir = Instruction DE/EX.A = BusA out of Reg EX/ME.S = AluOut EX/ME.D = Bus B pass-through for sw ME/WB.S = ALuOut pass-through ME/WB.M = Mem Result from lw
CS 61C L19 Pipelining II (9) A Carle, Summer 2005 © UCB New Representation: Regs more explicit Exec Reg. File Data Mem ABS Reg File PC Next PC IR Inst. Mem DMS IF/DE DE/EX EX/ME ME/WB What’s Missing???
CS 61C L19 Pipelining II (10) A Carle, Summer 2005 © UCB Pipelined Processor (almost) for slides Exec Reg. File Mem Access Data Mem ABSM Reg File Equal PC Next PC IR Inst. Mem Valid IRex Dcd Ctrl IRmem Ex Ctrl IRwb Mem Ctrl WB Ctrl D Idea: Parallel Piped Control …
CS 61C L19 Pipelining II (11) A Carle, Summer 2005 © UCB Pipelined Control IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + B; R[rd] <– S; S <– A + SX; M <– Mem[S] R[rd] <– M; S <– A or ZX; R[rt] <– S; S <– A + SX; Mem[S] <- B If Cond PC < PC+SX; Exec Reg. File Mem Access Data Mem ABS Reg File Equal PC Next PC IR Inst. Mem DM
CS 61C L19 Pipelining II (12) A Carle, Summer 2005 © UCB Data Stationary Control The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc,...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later IF/ID Register ID/Ex Register Ex/Mem Register Mem/Wr Register Reg/DecExecMem ExtOp ALUOp RegDst ALUSrc Branch MemWr MemtoReg RegWr Main Control ExtOp ALUOp RegDst ALUSrc MemtoReg RegWr MemtoReg RegWr MemtoReg RegWr Branch MemWr Branch MemWr Wr
CS 61C L19 Pipelining II (13) A Carle, Summer 2005 © UCB Let’s Try it Out 10lw r1, 36(r2) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 32addr10, r11, r12 100andr13, r14, 15
CS 61C L19 Pipelining II (14) A Carle, Summer 2005 © UCB Start: Fetch 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M rsrt im 10lw r1, 36(r2) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 IF PC Next PC 10 = nnnn
CS 61C L19 Pipelining II (15) A Carle, Summer 2005 © UCB Fetch 14, Decode 10 Exec Reg. File Mem Acces s Data Mem ABS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt im 10lw r1, 36(r2) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1, 36(r2) ID IF PC Next PC 14 = nnn
CS 61C L19 Pipelining II (16) A Carle, Summer 2005 © UCB Fetch 20, Decode 14, Exec 10 Exec Reg. File Mem Acces s Data Mem r2 BS Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M 2rt 36 10lw r1, 36(r2) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 addI r2, r2, 3 ID IF EX PC Next PC 20 = n n
CS 61C L19 Pipelining II (17) A Carle, Summer 2005 © UCB Fetch 24, Decode 20, Exec 14, Mem 10 Exec Reg. File Mem Acces s Data Mem r2 B r2+36 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M lw r1, 36(r2) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 sub r3, r4, r5 addI r2, r2, 3 ID IF EX M PC Next PC 24 = n
CS 61C L19 Pipelining II (18) A Carle, Summer 2005 © UCB Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 Exec Reg. File Mem Acces s Data Mem r4 r5 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl M[r2+36] 67 10lw r1, 36(r2) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 lw r1 beq r6, r7 100 addI r2 sub r3 ID IF EX M WB PC Next PC 30 = Note Delayed Branch: always execute ori after beq
CS 61C L19 Pipelining II (19) A Carle, Summer 2005 © UCB Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14 Exec Reg. File Mem Acces s Data Mem r6 r7 r2+3 Reg File IR Inst. Mem D Decode Mem Ctrl WB Ctrl r1=M[r2+35] 9xx 10lw r1, 36(r2) 14addI r2, r2, 3 20subr3, r4, r5 24beqr6, r7, orir8, r9, 17 34addr10, r11, r12 100andr13, r14, 15 beq addI r2 sub r3 r4-r5 100 ori r8, r9 17 ID IF EX M WB PC Next PC 100 =
CS 61C L19 Pipelining II (20) A Carle, Summer 2005 © UCB ? ? Exec Reg. File Mem Acces s Data Mem ABS M Reg File Equal PC Next PC IR Inst. Mem Valid IRex Dcd Ctrl IRmem Ex Ctrl IRwb Mem Ctrl WB Ctrl D Remember: means triggered on edge. What is wrong here?
CS 61C L19 Pipelining II (21) A Carle, Summer 2005 © UCB Double-Clocked Signals Exec Reg. File Mem Acces s Data Mem A B S M Reg File Equal PC Next PC IR Inst. Mem Valid IRex Dcd Ctrl IRmem Ex Ctrl IRwb Mem Ctrl WB Ctrl D Some signals are double clocked! In general: Inputs to edge components are their own pipeline regs Watch out for stalls and such!
CS 61C L19 Pipelining II (22) A Carle, Summer 2005 © UCB Administrivia Proj 2 – Due Sunday HW6 – Due Tuesday Midterm 2: Friday, July 29: 11:00 – 2:00 Location TBD If you are really so concerned about the drop deadline that this is a problem for you, talk to me about the possibility of taking the exam on Thursday
CS 61C L19 Pipelining II (23) A Carle, Summer 2005 © UCB Megahertz Myth?
CS 61C L19 Pipelining II (24) A Carle, Summer 2005 © UCB Outline Pipeline Control Forwarding Control Hazard Control
CS 61C L19 Pipelining II (25) A Carle, Summer 2005 © UCB Fix by Forwarding result as soon as we h ave it to where we need it: Review: Forwarding sub $t4,$t0,$t3 ALU I$ Reg D$Reg and $t5,$t0,$t6 ALU I$ Reg D$Reg or $t7,$t0,$t8 * I$ ALU Reg D$Reg xor $t9,$t0,$t10 ALU I$ Reg D$Reg add $t0,$t1,$t2 IFID/RFEXMEMWB ALU I$ Reg D$ Reg * “ or ” hazard solved by register hardware
CS 61C L19 Pipelining II (26) A Carle, Summer 2005 © UCB Forwarding In general: For each stage i that has reg inputs For each stage j after I that has reg output -If i.reg == j.reg forward j value back to i. -Some exceptions ($0, invalid) ALUinput (ALUResult, MemResult) MemInput (MemResult) In particular:
CS 61C L19 Pipelining II (27) A Carle, Summer 2005 © UCB Pending Writes In Pipeline Registers npc I mem Regs B alu S D mem m IAU PC Regs A imn n n op rw rs rt
CS 61C L19 Pipelining II (28) A Carle, Summer 2005 © UCB Pending Writes In Pipeline Registers Current operand registers Pending writes hazard <= ((rs == rw ex) & regW ex ) OR ((rs == rw mem) & regW me ) OR ((rs == rw wb) & regW wb ) OR ((rt == rw ex) & regW ex ) OR ((rt == rw mem) & regW me ) OR ((rt == rw wb ) & regW wb ) npc I mem Regs B alu S D mem m IAU PC Regs A imoprwn oprwn oprwn op rw rs rt
CS 61C L19 Pipelining II (29) A Carle, Summer 2005 © UCB Forwarding Muxes Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe Increase muxes to add paths from pipeline registers Data Forwarding = Data Bypassing npc I mem Regs B alu S D mem m IAU PC Regs A imoprwn oprwn oprwn op rw rs rt Forward mux
CS 61C L19 Pipelining II (30) A Carle, Summer 2005 © UCB What about memory operations? AB op Rd Ra Rb Rd to reg file R Rd Tricky situation: MIPS: lw 0($t0) sw 0($t1) RTL: R1 <- Mem[ R2 + I ]; Mem[R3+34] <- R1 D Mem T
CS 61C L19 Pipelining II (31) A Carle, Summer 2005 © UCB What about memory operations? AB op Rd Ra Rb Rd to reg file R Rd Tricky situation: MIPS: lw 0($t0) sw 0($t1) RTL: R1 <- Mem[ R2 + I ]; Mem[R3+34] <- R1 Solution: Handle with bypass in memory stage! D Mem T
CS 61C L19 Pipelining II (32) A Carle, Summer 2005 © UCB Outline Pipeline Control Forwarding Control Hazard Control
CS 61C L19 Pipelining II (33) A Carle, Summer 2005 © UCB Forwarding works if value is available (but not written back) before it is needed. But consider … Data Hazard: Loads (1/4) sub $t3,$t0,$t2 ALU I$ Reg D$Reg lw $t0,0($t1) IFID/RFEXMEMWB ALU I$ Reg D$ Reg Need result before it is calculated! Must stall use (sub) 1 cycle and then forward. …
CS 61C L19 Pipelining II (34) A Carle, Summer 2005 © UCB Hardware must stall pipeline Called “interlock” Data Hazard: Loads (2/4) sub $t3,$t0,$t2 ALU I$ Reg D$Reg bub ble and $t5,$t0,$t4 ALU I$ Reg D$Reg bub ble or $t7,$t0,$t6 I$ ALU Reg D$ bub ble lw $t0, 0($t1) IFID/RFEXMEMWB ALU I$ Reg D$ Reg
CS 61C L19 Pipelining II (35) A Carle, Summer 2005 © UCB Data Hazard: Loads (3/4) Instruction slot after a load is called “load delay slot” If that instruction uses the result of the load, then the hardware interlock will stall it for one cycle. If the compiler puts an unrelated instruction in that slot, then no stall Letting the hardware stall the instruction in the delay slot is equivalent to putting a nop in the slot (except the latter uses more code space)
CS 61C L19 Pipelining II (36) A Carle, Summer 2005 © UCB Data Hazard: Loads (4/4) Stall is equivalent to nop sub $t3,$t0,$t2 and $t5,$t0,$t4 or $t7,$t0,$t6 I$ ALU Reg D$ lw $t0, 0($t1) ALU I$ Reg D$ Reg bub ble ALU I$ Reg D$ Reg ALU I$ Reg D$ Reg nop
CS 61C L19 Pipelining II (37) A Carle, Summer 2005 © UCB Hazards / Stalling In general: For each stage i that has reg inputs If I’s reg is being written later on in the pipe but is not ready yet -Stages 0 to i: Stall (Turn CEs off so no change) -Stage i+1: Make a bubble (do nothing) -Stages i+2 onward: As usual ALUinput (MemResult) In particular:
CS 61C L19 Pipelining II (38) A Carle, Summer 2005 © UCB Hazards / Stalling Alternative Approach: Detect non-forwarding hazards in decode Possible since our hazards are formal. -Not always the case. Stalling then becomes: -Issue nop to EX stage -Turn off nextPC update (refetch same inst) -Turn off InstReg update (re-decode same inst)
CS 61C L19 Pipelining II (39) A Carle, Summer 2005 © UCB Stall Logic 1. Detect non- resolving hazards. 2a. Insert Bubble 2b. Stall nextPC, IF/DE npc I mem Regs B alu S D mem m IAU PC Regs A imoprwn oprwn oprwn op rw rs rt Forward mux
CS 61C L19 Pipelining II (40) A Carle, Summer 2005 © UCB Stall Logic Stall-on-issue is used quite a bit More complex processors: many cases that stall on issue. More complex processors: cases that can’t be detected at decode -E.g. value needed from mem is not in cache – proc must stall multiple cycles
CS 61C L19 Pipelining II (41) A Carle, Summer 2005 © UCB By the way … Notice that our forwarding and stall logic is stateless! Big Idea: Keep it simple! Option 1: Store old fetched inst in reg (“stall_temp”), keep state reg that says whether to use stall_temp or value coming off inst mem. Option 2: Re-fetch old value by turning off PC update.