CSCI206 - Computer Organization & Programming Pipeline Datapath and Control zyBook: 11.6
The MIPS Pipeline
Hazard Summary data - An instruction depends on a data value produced or consumed by another instruction -- Reorder -- Forwarding (EX-EX, Mem-EX) control - The execution of an instruction depends on a control decision made by an earlier instruction (e.g., branch) -- Delay slot (nop) -- Compute diff at the ID stage structural - An instruction in the pipeline needs a resource being used by another instruction in the pipeline at the same moment -- Reorder if possible -- Delay
EXAMPLES
Show the pipeline diagram CYCLE 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F add v1, v1, v2 beq v0, v1, loop addi v0, v0, 1 li v1, 64 MIPS branch uses 2 optimizations, assume the branch is NOT taken
Show the pipeline diagram CYCLE 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D add v1, v1, v2 beq v0, v1, loop addi v0, v0, 1 li v1, 64 MIPS branch uses 2 optimizations, assume the branch is NOT taken
Show the pipeline diagram CYCLE 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E add v1, v1, v2 beq v0, v1, loop addi v0, v0, 1 li v1, 64 MIPS branch uses 2 optimizations, assume the branch is NOT taken
Show the pipeline diagram CYCLE 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E M add v1, v1, v2 beq v0, v1, loop - addi v0, v0, 1 li v1, 64 MIPS branch uses 2 optimizations, assume the branch is NOT taken In cycle 4 branch wants to execute, but needs the new value of v1. It is available at the end of cycle 4. So we have to stall. This stalls everything before this stage in the pipeline, so we cannot fetch the addi.
Show the pipeline diagram CYCLE 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E M W add v1, v1, v2 beq v0, v1, loop - addi v0, v0, 1 li v1, 64 MIPS branch uses 2 optimizations, assume the branch is NOT taken In cycle 5, we have the value of v1 in EX. But MIPS only has forwarding EX-EX, MEM-EX, and MEM-MEM. Not EX-ID. So, we have to again stall. (no fetch again)
Show the pipeline diagram CYCLE 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E M W add v1, v1, v2 beq v0, v1, loop - addi v0, v0, 1 li v1, 64 Finally in cycle 6 we can decode the new value of v1 (without forwarding). Since we were able to decode, we can also fetch the next instruction in cycle 6. Since branch is resolved in Decode, we don’t have to show EMW stages (they are NOPs) MIPS branch uses 2 optimizations, assume the branch is NOT taken
Show the pipeline diagram CYCLE 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E M W add v1, v1, v2 beq v0, v1, loop - addi v0, v0, 1 li v1, 64 no hazards for addi MIPS branch uses 2 optimizations, assume the branch is NOT taken
Show the pipeline diagram CYCLE 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E M W add v1, v1, v2 beq v0, v1, loop - addi v0, v0, 1 li v1, 64 Fast forward to cycle 11. execution took 11 cycles. IPC = 5 / 11 = 0.45 MIPS branch uses 2 optimizations, assume the branch is NOT taken
Show the pipeline diagram for CYCLE 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) lw r2, 400(r4) addi r3, r1, r2 sw r3, 0(r4) subi r4, r4, 4 First two instructions are hazard free addi depends on both r1 and r2. sw depends on r3 (addi)
Show the pipeline diagram for CYCLE 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) lw r2, 400(r4) addi r3, r1, r2 sw r3, 0(r4) subi r4, r4, 4 First two instructions are hazard free addi depends on both r1 and r2. sw depends on r3 (addi)
Show the pipeline diagram for CYCLE 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E lw r2, 400(r4) addi r3, r1, r2 sw r3, 0(r4) subi r4, r4, 4 No issues until addi goes to decode
Show the pipeline diagram for CYCLE 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 Decode in cycle 4 would get both old values (r1, r2) We could forward r1 from MEM to EX in 5 But r1 is not yet available, so we must stall, since D stalls, sw cannot fetch.
Show the pipeline diagram for CYCLE 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 Decode in cycle 5. Load new value for r1 (WB in same cycle is OK) Need to forward MEM->EX for r2 in cycle 6.
Show the pipeline diagram for CYCLE 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 Forward MEM->EX for r2. Draw an arrow from previous cycle’s M to current cycle’s E Decode sw in 6, but we get the old value for r3. But that’s OK, sw doesn’t need the new value until the start of MEM, we can use a forwarding path
Show the pipeline diagram for CYCLE 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 No issues
Show the pipeline diagram for CYCLE 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 SW fetched the wrong r3, but the new value for r3 is at the output of the MEM stage, so we need a MEM-MEM forward.
Show the pipeline diagram for CYCLE 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4
Show the pipeline diagram for CYCLE 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 IPC = 5 / 10 = 0.5
Show the pipeline diagram for 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) add r1, r1, r2 sw r1, 0(sp)
Show the pipeline diagram for 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) add r1, r1, r2 sw r1, 0(sp)
Show the pipeline diagram for CYCLE 1 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F add r1, r1, r2 sw r1, 0(sp)
Show the pipeline diagram for CYCLE 2 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D add r1, r1, r2 sw r1, 0(sp)
Show the pipeline diagram for CYCLE 3 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E add r1, r1, r2 - sw r1, 0(sp) if we decode in 3, add will need r1 to execute in 4. the value isn’t available until the end of cycle 4 (lw finishes mem). So we need to stall in cycle 3.
Show the pipeline diagram for CYCLE 4 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E M add r1, r1, r2 - sw r1, 0(sp) We will forward from the output of MEM to the input of EX in the next cycle (r1 for add)
Show the pipeline diagram for CYCLE 5 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E M W add r1, r1, r2 - sw r1, 0(sp) We will forward from the output of MEM to the input of EX in the next cycle (r1 for add) sw needs the new r1 at the beginning of MEM, that will be in cycle 7, we can get it from the output of MEM
Show the pipeline diagram for CYCLE 6 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E M W add r1, r1, r2 - sw r1, 0(sp) sw computes the memory address 0 + sp in EX, so no forward needed.
Show the pipeline diagram for CYCLE 7 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E M W add r1, r1, r2 - sw r1, 0(sp) sw writes r1 at mem[0+sp] in cycle 7, the value r1 is at the output of MEM so forward it to the input.
Show the pipeline diagram for CYCLE 8 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E M W add r1, r1, r2 - sw r1, 0(sp) Done. IPC = 3 / 8