1 Pipelining CDA 3101 Discussion Section 08
2 Question 1 – 6.1 Suppose that time for an ALU operation can be shortened by 25% in the following figure a. Will it affect the speedup obtained from pipelining? If yes, by how much? If no, why? b. What if the ALU operation now takes 25% more time?
3 Question 1 Shortening time for an ALU operation by 25% The slowest stage time for pipeline still remains 200ps(IF and MEM stage) The single-cycle one will be reduced from 800ps to 750ps Original speedup = 800/200 = 4 New Speedup =750/200=3.75
4 Question 1 Lengthening time for ALU operation by 25% It will affect the speedup obtained from pipelining because the slowest stage time will be 250ps. New speedup = 850/250 = 3.4 Therefore, speedup is 15% less
5 Question 2 – 6.4 Identify all of the data dependencies in the following code. add $3, $4, $2 sub $5, $3, $1 lw $6, 200($3) add $7, $3, $6 a. Which dependencies are data hazards that will be resolved via forwarding? b. Which dependencies are data hazards that will cause a stall?
6 Question 2 Data Dependencies 1. Data dependency through $3 between the first instruction and each subsequent instructions 2. Data dependency through $6 between the last instruction and lw instruction Dependencies that will be resolved via forwarding Dependencies between the first instruction and each subsequent instruction can be resolved via forwarding Dependencies that will cause a stall Dependencies between the last instruction and lw instruction cannot be resolved via forwarding, so it will cause a stall
7 Question 2 InstructionCC1CC2CC3CC4CC5CC6CC7CC8CC9 add $3,$3,$2IFIDEXMEMWB sub $5,$3,$1IFIDEXMEMWB lw $6,200($3)IFIDEXMEMWB stall add $7,$3,$6IFIDEXMEMWB
8 Question 3 – 6.22 Consider executing the following code on the pipelined datapath of Figure 6.36 lw $4, 100($2) sub $6, $4, $3 add $2, $3, $5 a. Draw a diagram that illustrates the dependencies that need to be resolved b. Provide another diagram that illustrates how the code will actually be executed c. How many cycles will it take to execute this code?
9 Question 3 Cont. Figure 6.36
10 Question 3-1 Diagram that illustrates the dependencies that need to be resolved InstructionCC1CC2CC3CC4CC5CC6CC7 lw $4,100($2)IFIDEXMEMWB sub $6,$4,$3IFIDEXMEMWB add $2,$3,$5IFIDEXMEMWB
11 Question 3-2 Diagram that illustrates how the code will actually be executed InstructionCC1CC2CC3CC4CC5CC6CC7CC8 lw $4,100($2)IFIDEXMEMWB stall sub $6,$4,$3IFIDEXMEMWB add $2,$3,$5IFIDEXMEMWB
12 Question 3-3 Total cycles to execute the code is 8 InstructionCC1CC2CC3CC4CC5CC6CC7CC8 lw $4,100($2)IFIDEXMEMWB stall sub $6,$4,$3IFIDEXMEMWB add $2,$3,$5IFIDEXMEMWB
13 Question 4-1 Look at the following code. The pipelined datapath is given on the previous question addi $t0, $t0, 4 lw $v0, 0($t0) sw $v0, 20($t1) lw $s0, 60($t0) add $s1, $s0, $s0 sub $s1, $s1, $s0 sub $s4, $s1, $s5 a. Determine where the hazards occur and on which register and write it against each instruction above
14 Question 4-1 addi $t0, $t0, 4 lw $v0, 0($t0)data hazard on $t0 sw $v0, 20($t1)data hazard on $v0 lw $s0, 60($t0) add $s1, $s0, $s0data hazard on $s0 sub $s1, $s1, $s0data hazard on $s1 sub $s4, $s1, $s5data hazard on $s1
15 Question 4-2 b. Indicate in the chart below what stage each instruction is in what cycle, where forwarding occurs (draw an arrow from where the value is produced to where the value is used), and where stalls or flushed occur. In the chart below, you may have more/less spaces than you need. Make sure you fill in the instructions in the first column as they would execute InstructionCC1CC2CC3CC4CC5CC6CC7CC8CC9CC10CC11
16 Question 4-2 InstructionCC1CC2CC3CC4CC5CC6CC7CC8CC9CC10CC11CC12 addi $t0,$t0,4IFIDEXMEMWB lw $v0,0($t0)IFIDEXMEMWB sw $v0,20($t1)IFIDEXMEMWB lw $s0,60($t0)IFIDEXMEMWB stall Stall add $s1,$s0,$s0IFIDEXMEMWB sub $s1,$s1,$s0IFIDEXMEMWB add $s4,$s1,$s5IFIDEXMEMWB
17 Question 4-3 c. Which two registers of the register file are being read in cycle 8? $s0, $s1 InstructionCC1CC2CC3CC4CC5CC6CC7CC8CC9CC10CC11CC12 addi $t0,$t0,4IFIDEXMEMWB lw $v0,0($t0)IFIDEXMEMWB sw $v0,20($t1)IFIDEXMEMWB lw $s0,60($t0)IFIDEXMEMWB stall Stall add $s1,$s0,$s0IFIDEXMEMWB sub $s1,$s1,$s0IFIDEXMEMWB add $s4,$s1,$s5IFIDEXMEMWB
18 Question 4-4 d. How many cycles does it take to execute the above code completely? 12 InstructionCC1CC2CC3CC4CC5CC6CC7CC8CC9CC10CC11CC12 addi $t0,$t0,4IFIDEXMEMWB lw $v0,0($t0)IFIDEXMEMWB sw $v0,20($t1)IFIDEXMEMWB lw $s0,60($t0)IFIDEXMEMWB stall Stall add $s1,$s0,$s0IFIDEXMEMWB sub $s1,$s1,$s0IFIDEXMEMWB add $s4,$s1,$s5IFIDEXMEMWB
19 Question 4-5 e. Can you reorder the code to reduce the number of cycles it takes to execute this code, without changing the code result? If yes, show how. Switch the instructions: sw $v0,20($t1) with lw $s0,60($t0) InstructionCC1CC2CC3CC4CC5CC6CC7CC8CC9CC10CC11 addi $t0,$t0,4IFIDEXMEMWB lw $v0,0($t0)IFIDEXMEMWB lw $s0,60($t0)IFIDEXMEMWB sw $v0,20($t1)IFIDEXMEMWB add $s1,$s0,$s0IFIDEXMEMWB sub $s1,$s1,$s0IFIDEXMEMWB add $s4,$s1,$s5IFIDEXMEMWB
Question 4-5 Switching load and store instruction is VERY dangerous! The load and store address may be same under certain conditions. Another kind of data dependence (memory correctness). Don’t change the order of lw/sw unless you are 100% sure. 20