10/11: Lecture Topics Execution cycle Introduction to pipelining Data hazards
Office Hours Changing Th 8:30-9:30 to Mo 2-3 New office hours are Tu 2:30-3:30
Execution Cycle Five steps to executing an instruction: 1. Fetch IF ID EX MEM WB Five steps to executing an instruction: 1. Fetch Get the next instruction to execute from memory onto the chip 2. Decode Figure out what the instruction says to do Get values from registers 3. Execute Do what the instruction says; for example, On a memory reference, add up base and offset On an arithmetic instruction, do the math
More Execution Cycle 4. Memory Access 5. Write back IF ID EX MEM WB If it’s a load or store, access memory If it’s a branch, replace the PC with the destination address Otherwise do nothing 5. Write back Place the result of the operation in the appropriate register
add $s0, $s1, $s2 IF get instruction at PC from memory it’s 000000 10001 10010 10000 00000 100000 ID determine what 000000 … 100000 is 000000 … 100000 is add get contents of $s1 and $s2 ($s1=7, $s2=12) EX add 7 and 12 = 19 MEM do nothing WB store 19 in $s0
lw $t2, 16($s0) IF get instruction at PC from memory it’s 010111 10000 01000 0000000000010000 ID determine what 010111 is 010111 is lw get contents of $s0 and $t2 (we don’t know that we don’t care about $t2) $s0=0x200D1C00, $t2=77763 EX add 16 to 0x200D1C00 = 0x200D1C10 MEM load the word stored at 0x200D1C10 WB store loaded value in $t2
Latency & Throughput IF ID EX MEM WB 1 2 3 4 5 6 7 8 9 10 inst 1 inst 2 Latency—the time it takes for an individual instruction to execute What’s the latency for this implementation? Throughput—the number of instructions that execute per minute What’s the throughput of this implementation?
A case for pipelining The functional units are being underutilized the instruction fetcher is used once every five clock cycles why not have it fetch a new instruction every clock cycle? Pipelining overlaps the stages of execution so every stage has something to due each cycle A pipeline with N stages could speedup by N times, but each stage must take the same amount of time each stage must always have work to do Also, latency for each instruction may go up, but why don’t we care?
Unpipelined Assembly Line What is the latency of this assembly line, i.e. for how many cycles is the plane on the assembly line? What is the throughput of this assembly line, i.e. how many planes are manufactured each cycle?
Pipelined Assembly Line The assembly line has 5 stages If a plane isn’t ready to go to the next stage then the pipeline stalls that stage and all stages before it freeze The gap in the assembly line is known as a bubble
Pipelined Analysis What is the latency? What is the throughput? What is the speed up? (Speed up = Old Time / New Time)
Pipeline Example 1 2 3 4 5 6 7 8 9 10 IF ID EX MEM WB add $s0, $s1, $s2 sub $s3, $s2, $s3 lw $s2, 20($t0) sw $s0, 16($s1) and $t1, $t2, $t3 IF ID EX MEM WB
Pipelined Xput and Latency 1 2 3 4 5 6 7 8 9 IF ID EX MEM WB inst 1 inst 2 inst 3 inst 4 inst 5 What’s the throughput of this implementation? What’s the latency of this implementation?
Data Hazards What happens in the following code? IF ID EX MEM WB add $s0, $s1, $s2 IF ID EX MEM WB add $s4, $s3, $s0 $s0 is read here $s0 is written here This is called as a data dependency When it causes a pipeline stall it is called a data hazard
Solution: Stall Stall the pipeline until the result is available add s0,s1,s2 IF ID EX MEM WB add s4,s3,s0 IF stall ID EX MEM WB Stall the pipeline until the result is available
Solution: Read & Write in same Cycle Write the register in the first part of the clock cycle Read it in the second part of the clock cycle add s0,s1,s2 add s4,s3,s0 IF stall ID EX MEM WB write $s0 read $s0 A stall of two cycles is still required
Solution: Forwarding The value of $s0 is known after cycle 3 (after the first instruction’s EX stage) The value of $s0 isn’t needed until cycle 4 (before the second instruction’s EX stage) If we forward the result there isn’t a stall add s0,s1,s2 add s4,s3,s0 IF ID EX MEM WB
Another data hazard What if the first instruction is lw? lw s0,0(s2) add s4,s3,s0 IF ID EX MEM WB s0 isn’t known until after the MEM stage We can’t forward back into the past Either stall or reorder instructions
Solutions to the lw hazard We can stall for one cycle, but we hate to stall lw s0,0(s2) add s4,s3,s0 IF ID EX MEM WB Try to execute an unrelated instruction between the two instructions lw s0,0(s2) IF ID EX MEM WB IF ID EX MEM WB sub t4,t2,t3 IF ID EX MEM WB add s4,s3,s0 sub t4,t2,t3
Reordering Instructions Reordering instructions is a common technique for avoiding pipeline stalls Sometimes the compiler does the reordering statically Almost all modern processors do this reordering dynamically they can see several instructions and they execute anyone that has no dependency this is known as out-of-order execution and is very complicated to implement
Control Hazards Branch instructions cause control hazards because we don’t know which instruction to execute next IF ID EX MEM WB bne $s0, $s1, next add $s4, $s3, $s0 ... IF ID EX MEM WB next: sub $s4, $s3, $s0 do we fetch add or sub? we don’t know until here