Datorteknik DelayedLoad bild 1 Delayed Load
Datorteknik DelayedLoad bild 2 All problems solved? NO, what will happen if lw $6 $0($1) add $4 $6 $1 add $7 $6 $2
Datorteknik DelayedLoad bild 3 IM Reg DMReg Critical path “DM” to “EX” ? IM Reg DMReg IM Reg DMReg 0x30 lw $6 $0($1) 0x34 add $4 $6 $1 0x38 add $7 $6 $2
Datorteknik DelayedLoad bild 4 Branch logic Sgn/Ze extend Zero ext. ALU A B = = = = The Model We Use
Datorteknik DelayedLoad bild 5 Fix or Not? The Critical path would be 2T (ALU+DM) Clockspeed only half WE CHOOSE NOT TO FIX
Datorteknik DelayedLoad bild 6 Delayed Load One “delayed load” slot –lw $6 $0($1) –other useful operation, or nop –add $4 $6 $1 –add $7 $6 $4 Still better than NO forward –lw $6 $0($1) –other useful operation, or nop –add $4 $6 $1 –other useful operation, or nop –add $7 $6 $4
Datorteknik DelayedLoad bild 7 Pipeline Efficiency Critical path cut to 1/4 Can we do the same with only three stages?
Datorteknik DelayedLoad bild 8 IM Reg DMReg 4 Stage Pipe IM Reg DM Reg 3 Stage Pipe
Datorteknik DelayedLoad bild 9 Branch logic Sgn/Ze extend Zero ext. ALU A B = = = = 4 Stage Pipe
Datorteknik DelayedLoad bild 10 Branch logic Sgn/Ze extend Zero ext. ALU A B = = = =
Datorteknik DelayedLoad bild 11 Critical Path? ALU + DM No, it’s too long, no can do!
Datorteknik DelayedLoad bild 12 Branch logic Sgn/Ze extend Zero ext. ALU A B = = = =
Datorteknik DelayedLoad bild 13 What about the instruction set? lw $t2 4($t4)? NO, ALU is not in path lw $t2 $t4? OK, No need for ALU
Datorteknik DelayedLoad bild 14 Avoid Delayed Load? Yes, by moving DM to EX, we can forward the result
Datorteknik DelayedLoad bild 15 Different Pipelength/depth Is it possible to implement both version in one structure (MIPS pipe). NO! There might be collisions, both EX, and DM accesses memory at the same time.
Datorteknik DelayedLoad bild 16 Pipeline Efficiency Did we change the critical path? NO!, ALU and DM are not in sequence