CSCI206 - Computer Organization & Programming

CSCI206 - Computer Organization & Programming
Pipeline Datapath and Control zyBook: 11.6

The MIPS Pipeline

Hazard Summary data - An instruction depends on a data value produced or consumed by another instruction -- Reorder -- Forwarding (EX-EX, Mem-EX) control - The execution of an instruction depends on a control decision made by an earlier instruction (e.g., branch) -- Delay slot (nop) -- Compute diff at the ID stage structural - An instruction in the pipeline needs a resource being used by another instruction in the pipeline at the same moment -- Reorder if possible -- Delay

EXAMPLES

Show the pipeline diagram
CYCLE 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F add v1, v1, v2 beq v0, v1, loop addi v0, v0, 1 li v1, 64 MIPS branch uses 2 optimizations, assume the branch is NOT taken

CYCLE 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D add v1, v1, v2 beq v0, v1, loop addi v0, v0, 1 li v1, 64 MIPS branch uses 2 optimizations, assume the branch is NOT taken

CYCLE 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E add v1, v1, v2 beq v0, v1, loop addi v0, v0, 1 li v1, 64 MIPS branch uses 2 optimizations, assume the branch is NOT taken

CYCLE 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E M add v1, v1, v2 beq v0, v1, loop - addi v0, v0, 1 li v1, 64 MIPS branch uses 2 optimizations, assume the branch is NOT taken In cycle 4 branch wants to execute, but needs the new value of v1. It is available at the end of cycle 4. So we have to stall. This stalls everything before this stage in the pipeline, so we cannot fetch the addi.

CYCLE 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E M W add v1, v1, v2 beq v0, v1, loop - addi v0, v0, 1 li v1, 64 MIPS branch uses 2 optimizations, assume the branch is NOT taken In cycle 5, we have the value of v1 in EX. But MIPS only has forwarding EX-EX, MEM-EX, and MEM-MEM. Not EX-ID. So, we have to again stall. (no fetch again)

CYCLE 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E M W add v1, v1, v2 beq v0, v1, loop - addi v0, v0, 1 li v1, 64 Finally in cycle 6 we can decode the new value of v1 (without forwarding). Since we were able to decode, we can also fetch the next instruction in cycle 6. Since branch is resolved in Decode, we don’t have to show EMW stages (they are NOPs) MIPS branch uses 2 optimizations, assume the branch is NOT taken

CYCLE 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E M W add v1, v1, v2 beq v0, v1, loop - addi v0, v0, 1 li v1, 64 no hazards for addi MIPS branch uses 2 optimizations, assume the branch is NOT taken

CYCLE 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 li v0, 100 F D E M W add v1, v1, v2 beq v0, v1, loop - addi v0, v0, 1 li v1, 64 Fast forward to cycle 11. execution took 11 cycles. IPC = 5 / 11 = 0.45 MIPS branch uses 2 optimizations, assume the branch is NOT taken

Show the pipeline diagram for
CYCLE 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) lw r2, 400(r4) addi r3, r1, r2 sw r3, 0(r4) subi r4, r4, 4 First two instructions are hazard free addi depends on both r1 and r2. sw depends on r3 (addi)

CYCLE 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E lw r2, 400(r4) addi r3, r1, r2 sw r3, 0(r4) subi r4, r4, 4 No issues until addi goes to decode

CYCLE 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 Decode in cycle 4 would get both old values (r1, r2) We could forward r1 from MEM to EX in 5 But r1 is not yet available, so we must stall, since D stalls, sw cannot fetch.

CYCLE 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 Decode in cycle 5. Load new value for r1 (WB in same cycle is OK) Need to forward MEM->EX for r2 in cycle 6.

CYCLE 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 Forward MEM->EX for r2. Draw an arrow from previous cycle’s M to current cycle’s E Decode sw in 6, but we get the old value for r3. But that’s OK, sw doesn’t need the new value until the start of MEM, we can use a forwarding path

CYCLE 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 No issues

CYCLE 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 SW fetched the wrong r3, but the new value for r3 is at the output of the MEM stage, so we need a MEM-MEM forward.

CYCLE 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4

CYCLE 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 lw r1, 0(r4) F D E M W lw r2, 400(r4) addi r3, r1, r2 - sw r3, 0(r4) subi r4, r4, 4 IPC = 5 / 10 = 0.5

1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) add r1, r1, r2 sw r1, 0(sp)

CYCLE 1 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F add r1, r1, r2 sw r1, 0(sp)

CYCLE 2 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D add r1, r1, r2 sw r1, 0(sp)

CYCLE 3 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E add r1, r1, r2 - sw r1, 0(sp) if we decode in 3, add will need r1 to execute in 4. the value isn’t available until the end of cycle 4 (lw finishes mem). So we need to stall in cycle 3.

CYCLE 4 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E M add r1, r1, r2 - sw r1, 0(sp) We will forward from the output of MEM to the input of EX in the next cycle (r1 for add)

CYCLE 5 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E M W add r1, r1, r2 - sw r1, 0(sp) We will forward from the output of MEM to the input of EX in the next cycle (r1 for add) sw needs the new r1 at the beginning of MEM, that will be in cycle 7, we can get it from the output of MEM

CYCLE 6 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E M W add r1, r1, r2 - sw r1, 0(sp) sw computes the memory address 0 + sp in EX, so no forward needed.

CYCLE 7 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E M W add r1, r1, r2 - sw r1, 0(sp) sw writes r1 at mem[0+sp] in cycle 7, the value r1 is at the output of MEM so forward it to the input.

CYCLE 8 1 2 3 4 5 6 7 8 9 10 11 12 lw r1, 0(sp) F D E M W add r1, r1, r2 - sw r1, 0(sp) Done. IPC = 3 / 8

CSCI206 - Computer Organization & Programming

Similar presentations

Presentation on theme: "CSCI206 - Computer Organization & Programming"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSCI206 - Computer Organization & Programming

Similar presentations

Presentation on theme: "CSCI206 - Computer Organization & Programming"— Presentation transcript:

Similar presentations

About project

Feedback