Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University
© Kavita Bala, Computer Science, Cornell University Basic Pipelining Five stage “RISC” load-store architecture 1.Instruction fetch (IF) get instruction from memory 2.Instruction Decode (ID) translate opcode into control signals and read regs 3.Execute (EX) perform ALU operation 4.Memory (MEM) Access memory if load/store 5.Writeback (WB) update register file Following slides thanks to Sally McKee
© Kavita Bala, Computer Science, Cornell University Sample Code (Simple) Assume eight-register machine Run the following code on a pipelined datapath add ; reg 3 = reg 1 + reg 2 nand ; reg 6 = ~(reg 4 & reg 5) lw 4 20 (2) ; reg 4 = Mem[reg2+20] add ; reg 5 = reg 2 + reg 5 sw 7 12(3) ; Mem[reg3+12] = reg 7 Slides thanks to Sally McKee
© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX Bits 0-2 Bits op dest offset valB valA PC+1 target ALU result op dest valB op dest ALU result mdata instruction 0 R2 R3 R4 R5 R1 R6 R0 R7 regA regB Bits data dest Slides thanks to Sally McKee
© Kavita Bala, Computer Science, Cornell University Time Graphs Time: add nand lw add sw fetch decode execute memory writeback Slides thanks to Sally McKee
© Kavita Bala, Computer Science, Cornell University Pipelining Recap Powerful technique for masking latencies –Logically, instructions execute one at a time –Physically, instructions execute in parallel Instruction level parallelism Decouples the processor model from the implementation –Interface vs. implementation BUT dependencies between instructions complicate the implementation
© Kavita Bala, Computer Science, Cornell University What Can Go Wrong? Data hazards –register reads occur in stage 2 –register writes occur in stage 5 –could read the wrong value if is about to be written Control hazards –branch instruction may change the PC in stage 4 –what do we fetch before that?
© Kavita Bala, Computer Science, Cornell University Handling Data Hazards Detect and Stall –If hazards exist, stall the processor until they go away –Safe, but not great for performance Detect and Forward –If hazards exist, fix up the pipeline to get the correct value (if possible) –Most common solution for high performance
© Kavita Bala, Computer Science, Cornell University Handling Data Hazards III: Detect –dest of early instrs == source of current Forward: –New bypass datapaths route computed data to where it is needed –New MUX and control to pick the right data Beware: Stalling may still be required even in the presence of forwarding
© Kavita Bala, Computer Science, Cornell University Different type of Hazards Use register value in subsequent instruction add3 1 2 or nand sub Slides thanks to Sally McKee
© Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 nand time fetch decode execute memory writeback add nand If not careful, you read the wrong value of R3
© Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 nand time fetch decode execute memory writeback add nand If not careful, you read the wrong value of R3
© Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 nand time fetch decode execute memory writeback add nand If not careful, you read the wrong value of R3
© Kavita Bala, Computer Science, Cornell University Handling Data Hazards: Forwarding No point forwarding to decode Forward to EX stage From output of ALU and MEM stages
© Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 nand time fetch decode execute memory writeback add nand
© Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 and nand time fetch decode execute memory writeback add fetch decode execute memory writeback nand
© Kavita Bala, Computer Science, Cornell University Forwarding Illustration I n s t r. O r d e r add $3 add nand $5 $3 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg
© Kavita Bala, Computer Science, Cornell University Data Hazards: AL ops add3 1 2 and and nand time fetch decode execute memory writeback add fetch decode execute memory writeback nand fetch decode execute memory writeback add Write in first half of cycle, read in second half of cycle
© Kavita Bala, Computer Science, Cornell University Data Hazards: lw lw3 12(zero) and nand time fetch decode execute memory writeback lw add fetch decode execute memory writeback nand
© Kavita Bala, Computer Science, Cornell University Data Hazards: lw lw3 12(zero) and nand time fetch decode execute memory writeback lw add fetch decode execute memory writeback nand
© Kavita Bala, Computer Science, Cornell University Data Hazards: lw lw3 12(zero) and nand time fetch decode execute memory writeback lw fetch decode execute memory writeback nand
© Kavita Bala, Computer Science, Cornell University A tricky case add add add I n s t r. O r d e r add ALU IM Reg DMReg add add ALU IM Reg DMReg ALU IM Reg DMReg
© Kavita Bala, Computer Science, Cornell University Another tricky case add0 4 1 add 3 0 4
© Kavita Bala, Computer Science, Cornell University Handling Data Hazards: Summary Forward: –New bypass datapaths route computed data to where it is needed Beware: Stalling may still be required even in the presence of forwarding For lw –MIPS 2000/3000: a delay slot –MIPS 4000 onwards: stall But really rely on compiler to treat it like delay slot
© Kavita Bala, Computer Science, Cornell University Detection Detection: –Compare regA with previous DestRegs –Compare regB with previous DestRegs –regA and regB going to ALU –Previous destregs Previously in ALU, Memory and Writeback
© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX op dest offset valB valA PC+1 target ALU result op dest valB op dest ALU result mdata instruction 0 R2 R3 R4 R5 R1 R6 R0 R7 regA regB data dest Slides thanks to Sally McKee
© Kavita Bala, Computer Science, Cornell University Sample Code Which data hazards do you see? add nand add lw 6 24(3) sw 6 12(2) Slides thanks to Sally McKee
© Kavita Bala, Computer Science, Cornell University Sample Code Which data hazards do you see? add nand add lw 6 24(3) sw 6 12(2) Slides thanks to Sally McKee
© Kavita Bala, Computer Science, Cornell University Hazard PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX add nand R2 R3 R4 R5 R1 R6 R0 R7 regA regB data 3 fwd 3 First half of cycle 3 Slides thanks to Sally McKee
© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX nand add add R2 R3 R4 R5 R1 R6 R0 R7 regA regB 5 data H1 3 End of cycle 3 Slides thanks to Sally McKee
© Kavita Bala, Computer Science, Cornell University New Hazard PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX nand add add R2 R3 R4 R5 R1 R6 R0 R7 regA regB 5 data 3 MUXMUX H1 3 First half of cycle Slides thanks to Sally McKee
© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX add nand add 21 lw 6 24(3) R2 R3 R4 R5 R1 R6 R0 R7 regA regB 753 data MUXMUX H2H1 End of cycle 4 Slides thanks to Sally McKee
© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX add nand add 21 lw 6 24(3) R2 R3 R4 R5 R1 R6 R0 R7 regA regB 753 data MUXMUX H2H1 First half of cycle Slides thanks to Sally McKee
© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX lw add nand -2 sw 6 12(2) R2 R3 R4 R5 R1 R6 R0 R7 regA regB 75 data MUXMUX H2H1 6 End of cycle 5 Slides thanks to Sally McKee
© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX lw add nand -2 sw 6 12(2) R2 R3 R4 R5 R1 R6 R0 R7 regA regB 675 data MUXMUX H2H1 First half of cycle 6 Hazard 6 en L Slides thanks to Sally McKee
© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX 5 31 lw add 22 sw 6 12(2) R2 R3 R4 R5 R1 R6 R0 R7 regA regB 67 data MUXMUX H2 End of cycle 6 nop Slides thanks to Sally McKee
© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX nop 5 31 lw add 22 sw 6 12(2) R2 R3 R4 R5 R1 R6 R0 R7 regA regB 67 data MUXMUX H2 First half of cycle 7 Hazard 6 Slides thanks to Sally McKee
© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX sw nop lw R2 R3 R4 R5 R1 R6 R0 R7 regA regB 6 data MUXMUX H3 End of cycle 7 Slides thanks to Sally McKee 6
© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX sw nop lw R2 R3 R4 R5 R1 R6 R0 R7 regA regB data MUXMUX H3 First half of cycle Slides thanks to Sally McKee 6 99
© Kavita Bala, Computer Science, Cornell University PC Inst mem Register file MUXMUX ALUALU MUXMUX 1 Data mem + MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB MUXMUX sw nop R2 R3 R4 R5 R1 R6 R0 R7 regA regB data MUXMUX H3 End of cycle 8 Slides thanks to Sally McKee 6 99
© Kavita Bala, Computer Science, Cornell University Control Hazards beqz3 goToZero nand add time fetch decode execute memory writeback beqz fetch decode execute memory writeback nand
© Kavita Bala, Computer Science, Cornell University Control Hazards for 4-stage pipeline beqz3 goToZero nand add time F/D execute memory writeback beqz F/D execute memory writeback nand
© Kavita Bala, Computer Science, Cornell University Control Hazards Stall –Inject NOPs into the pipeline when the next instruction is not known –Pros: simple, clean; Cons: slow Delay Slots –Tell the programmer that the N instructions after a jump will always be executed, no matter what the outcome of the branch –Pros: The compiler may be able to fill the slots with useful instructions; Cons: breaks abstraction boundary Speculative Execution –Insert instructions into the pipeline –Replace instructions with NOPs if the branch comes out opposite of what the processor expected –Pros: Clean model, fast; Cons: complex
© Kavita Bala, Computer Science, Cornell University Control Hazards for 4-stage pipeline beqz3 goToZero nand add time F/D execute memory writeback beqz F/D execute memory writeback nand goToZero: lui 2 1 F/D execute memory writeback lui Delay slot!
© Kavita Bala, Computer Science, Cornell University Hazard summary Data hazards –Forward –Stall for lw/sw Control hazard –Delay slot
© Kavita Bala, Computer Science, Cornell University Pipelining for Other Circuits Pipelining can be applied to any combinational logic circuit What’s the circuit latency at 2ns. per gate? What’s the throughput? What is the stage breakdown? Where would you place latches? What’s the throughput after pipelining?