Delayed Load What will happen if...... lw $6 $0($1) add $4 $6 $1
Critical path “DM” to “EX” ? 0x30 lw $6 $0($1) IM Reg DM Reg IM Reg DM Reg 0x34 add $4 $6 $1 0x38 add $7 $6 $2 IM Reg DM Reg
The Model We Use Zero ext. = = Branch logic A ALU 4 B + = = 31 + A ALU 4 B + = = 31 + Sgn/Ze extend
Fix or Not? The Critical path would be 2T (ALU+DM) Clockspeed only half WE CHOOSE NOT TO FIX
“DM” to “DE” 0x30 lw $6 $0($1) 0x34 add $4 $6 $1 0x38 add $7 $6 $2 IM Reg DM Reg IM Reg DM Reg 0x34 add $4 $6 $1 0x38 add $7 $6 $2 IM Reg DM Reg
Delayed Load One “delayed load” slot Still better than NO forward lw $6 $0($1) other useful operation, or nop add $4 $6 $1 add $7 $6 $4 Still better than NO forward
Pipeline Efficiency Critical path cut to 1/4 Can we do the same with only three stages?
4 Stage Pipe IM Reg DM Reg 3 Stage Pipe IM Reg Reg DM
4 Stage Pipe Zero ext. = = Branch logic A ALU 4 B + = = 31 + Sgn/Ze A ALU 4 B + = = 31 + Sgn/Ze extend
Zero ext. = = Branch logic A ALU 4 B + = = 31 + Sgn/Ze extend
Critical Path? ALU + DM No, it’s too long, no can do!
Zero ext. = = Branch logic A ALU 4 B + = = 31 + Sgn/Ze extend
What about the instruction set? lw $t2 4($t4)? NO, ALU is not in path lw $t2 $t4? OK, No need for ALU
Avoid Delayed Load? Yes, by moving DM to EX, we can forward the result
Different Pipelength/depth Is it possible to implement both version in one structure (MIPS pipe). NO! There might be collisions, both EX, and DM accesses memory at the same time.
Pipeline Efficiency Did we change the critical path? NO!, ALU and DM are not in sequence