Chapter 4 The Processor Part 3
Can pipelining get us into trouble? Yes: Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time E.g., two instructions try to read the same memory at the same time data hazards: attempt to use item before it is ready instruction depends on result of prior instruction still in the pipeline add r1, r2, r3 sub r4, r2, r1 control hazards: attempt to make a decision before condition is evaluated branch instructions beq r1, loop Can always resolve hazards by waiting pipeline control must detect the hazard take action (or delay action) to resolve hazards
Morgan Kaufmann Publishers 10 November, 2018 Structure Hazards Conflict for use of a resource In MIPS pipeline with a single memory Load/store requires data access Instruction fetch would have to stall for that cycle Would cause a pipeline “bubble” Hence, pipelined datapaths require separate instruction/data memories Or separate instruction/data caches Chapter 4 — The Processor
Structural Hazards limit performance Example: if 1.3 memory accesses per instruction and only one memory access per cycle then average CPI = 1.3 otherwise resource is more than 100% utilized Solution 1: Use separate instruction and data memories Solution 2: Allow memory to read and write more than one word per cycle Solution 3: Stall
Single Memory is a Structural Hazard Time (clock cycles) Reading data from memory ALU I n s t r. O r d e Mem Reg Mem Reg Load ALU Mem Reg Instr 1 ALU Mem Reg Instr 2 ALU Instr 3 Mem Reg Mem Reg Reading instruction from memory ALU Mem Reg Instr 4 Detection is easy in this case! (right half highlight means read, left half write)
How About Register File Access? Time (clock cycles) Internal bypassing path Fix register file access hazard by doing reads in the second half of the cycle and writes in the first half ALU IM Reg DM add $1, I n s t r. O r d e ALU IM Reg DM Inst 1 ALU IM Reg DM Inst 2 ALU IM Reg DM add $2,$1, clock edge that controls loading of pipeline state registers clock edge that controls register writing
Morgan Kaufmann Publishers 10 November, 2018 Data Hazards An instruction depends on completion of data access by a previous instruction add $s0, $t0, $t1 sub $t2, $s0, $t3 Chapter 4 — The Processor
Forwarding (aka Bypassing) Morgan Kaufmann Publishers 10 November, 2018 Forwarding (aka Bypassing) Use result when it is computed Don’t wait for it to be stored in a register Requires extra connections in the datapath Chapter 4 — The Processor — 8 Chapter 4 — The Processor
Loads Can Cause Data Hazards Dependencies backward in time cause hazards ALU IM Reg DM lw $1,4($2) I n s t r. O r d e ALU IM Reg DM sub $4,$1,$5 ALU IM Reg DM and $6,$1,$7 ALU IM Reg DM or $8,$1,$9 ALU IM Reg DM xor $4,$1,$5 Load-use data hazard
Morgan Kaufmann Publishers 10 November, 2018 Load-Use Data Hazard Can’t always avoid stalls by forwarding If value not computed when needed Can’t forward backward in time! Chapter 4 — The Processor — 10 Chapter 4 — The Processor
Code Scheduling to Avoid Stalls Morgan Kaufmann Publishers 10 November, 2018 Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction C code for A = B + E; C = B + F; lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) stall stall 13 cycles 11 cycles Chapter 4 — The Processor
Morgan Kaufmann Publishers 10 November, 2018 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can’t always fetch correct instruction Still working on ID stage of branch In MIPS pipeline Need to compare registers and compute target early in the pipeline Add hardware to do it in ID stage Chapter 4 — The Processor
Morgan Kaufmann Publishers 10 November, 2018 Stall on Branch Wait until branch outcome determined before fetching next instruction Chapter 4 — The Processor — 13 Chapter 4 — The Processor
Morgan Kaufmann Publishers 10 November, 2018 Branch Prediction Longer pipelines can’t readily determine branch outcome early Stall penalty becomes unacceptable Predict outcome of branch Only stall if prediction is wrong In MIPS pipeline Can predict branches not taken Fetch instruction after branch, with no delay Chapter 4 — The Processor
MIPS with Predict Not Taken Morgan Kaufmann Publishers 10 November, 2018 MIPS with Predict Not Taken Prediction correct Prediction incorrect Chapter 4 — The Processor
More-Realistic Branch Prediction Morgan Kaufmann Publishers 10 November, 2018 More-Realistic Branch Prediction Static branch prediction Based on typical branch behavior Example: loop and if-statement branches Predict backward branches taken Predict forward branches not taken Dynamic branch prediction Hardware measures actual branch behavior e.g., record recent history of each branch Assume future behavior will continue the trend When wrong, stall while re-fetching, and update history Chapter 4 — The Processor
Data Hazard sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 Is there any problems ? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2)
Data Hazard on $2 sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 Problem: $2 cannot be read by other instructions before it is written by the add. sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2)
Dependencies Problem with starting next instruction before first is finished dependencies that go backward in time are data hazards
Software Solution Have compiler guarantee no hazards Where do we insert the stalls(nops)?? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Problem: this really slows us down!
Stall: One Way to “Fix” a Data Hazard ALU IM Reg DM sub $2, $1, $3 I n s t r. O r d e stall stall and $12, $2, $5 or $13, $6, $2 ALU IM Reg DM Can fix data hazard by waiting – stall – but affects throughput
Another Way to “Fix” a Data Hazard Use temporary results, don’t wait for them to be written register file forwarding to handle read/write to same register ALU forwarding what if this $2 was $13?
Forwarding add r1,r2,r3 sub r4,r1,r5 and r6,r7,r1 or r8,r1,r1 Can fix data hazard by forwarding results as soon as they are available to where they are needed. ALU IM Reg DM add r1,r2,r3 I n s t r. O r d e ALU IM Reg DM sub r4,r1,r5 ALU IM Reg DM and r6,r7,r1 ALU IM Reg DM or r8,r1,r1 ALU IM Reg DM sw r4,100(r1) Note: Forwarding from ALU supplied by EX/MEM register
Data Forwarding (aka Bypassing) Any data dependence line that goes backwards in time EX stage generating R-type ALU results or effective address calculation MEM stage generating lw results Forward by taking the inputs to the ALU from any pipeline register rather than just ID/EX by adding multiplexors to the inputs of the ALU so can pass Rd data to either (or both) of the EX’s stage Rs and Rt ALU inputs 00: normal input (ID/EX pipeline registers) 10: forward from previous instr (EX/MEM pipeline registers) 01: forward from instr 2 back (MEM/WB pipeline registers) adding the proper control hardware With forwarding, can run at full speed even in the presence of data dependencies
Data Forwarding Control Conditions (1/4) EX/MEM hazard: if (EX/MEM.RegisterRd == ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 “RegisterRd” is number of register to be written (RD or RT) “RegisterRs” is number of RS register “RegisterRt” is number of RT register “ForwardA, ForwardB” controls forwarding muxes MEM/WB hazard: if (MEM/WB.RegisterRd == ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegisterRd == ID/EX.RegisterRt)) ForwardB = 01 Forwards the result from the previous instr. to either input of the ALU. Forwards the result from the second previous instr. to either input of the ALU.
Data Forwarding Control Conditions (2/4) EX/MEM hazard: if (EX/MEM.RegWrite and (EX/MEM.RegisterRd == ID/EX.RegisterRs)) ForwardA = 10 and (EX/MEM.RegisterRd == ID/EX.RegisterRt)) ForwardB = 10 MEM/WB hazard: if (MEM/WB.RegWrite and (MEM/WB.RegisterRd == ID/EX.RegisterRs)) ForwardA = 01 and (MEM/WB.RegisterRd == ID/EX.RegisterRt)) ForwardB = 01 Forwards the result from the previous instr. to either input of the ALU provided it writes. Forwards the result from the second previous instr. to either input of the ALU provided it writes.
Data Forwarding Control Conditions (3/4) EX/MEM hazard: if (EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd == ID/EX.RegisterRs)) ForwardA = 10 and (EX/MEM.RegisterRd == ID/EX.RegisterRt)) ForwardB = 10 MEM/WB hazard: if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (MEM/WB.RegisterRd == ID/EX.RegisterRs)) ForwardA = 01 and (MEM/WB.RegisterRd == ID/EX.RegisterRt)) ForwardB = 01 Forwards the result from the previous instr. to either input of the ALU provided it writes and != R0. Forwards the result from the second previous instr. to either input of the ALU provided it writes and != R0. What’s wrong with this hazard control?
Yet Another Complication! Another potential data hazard can occur when there is a conflict between the result of the WB stage instruction and the MEM stage instruction which should be forwarded? More recent result! I n s t r. O r d e ALU IM Reg DM add $1,$1,$2 add $1,$1,$3 ALU IM Reg DM ALU IM Reg DM add $1,$1,$4
Corrected Data Forwarding Control Conditions MEM/WB hazard: if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (MEM/WB.RegisterRd == ID/EX.RegisterRs) and (EX/MEM.RegisterRd != ID/EX.RegisterRs || ~ EX/MEM.RegWrite)) ForwardA = 01 and (MEM/WB.RegisterRd == ID/EX.RegisterRt) and (EX/MEM.RegisterRd != ID/EX.RegisterRt || ~ EX/MEM.RegWrite))) ForwardB = 01
Datapath with Forwarding Hardware 1 PCSrc ID/EX EX/MEM Control IF/ID Add MEM/WB Branch Add 4 Shift left 2 Instruction Memory Read Addr 1 Data Memory Register File Read Data 1 Read Addr 2 Read Address PC Read Data Address 1 Write Addr ALU Read Data 2 1 Write Data Write Data ALU cntrl 16 32 Sign Extend How many bits wide is each pipeline register now? ID/EX = 9 + 32x4 + 10 = 147 + 10 = 157 EX/MEM.RegisterRd MEM/WB.RegisterRd IF/ID.RegisterRs IF/ID.RegisterRt 1 Forward Unit Control line inputs to Forward Unit EX/MEM.RegWrite and MEM/WB.RegWrite not shown on diagram
Memory-to-Memory Copies For loads immediately followed by stores (memory-to-memory copies) can avoid a stall by adding forwarding hardware from the MEM/WB register to the data memory input. Would need to add a Forward Unit to the memory access stage Should avoid stalling on such a load I n s t r. O r d e ALU IM Reg DM lw $1,10($2) ALU IM Reg DM sw $1,10($3)
Forwarding (or Bypassing): What about Loads Dependencies backwards in time are hazards Can’t solve with forwarding Must delay/stall instruction dependent on loads Time (clock cycles) IF ID/RF EX MEM WB ALU Im Reg Dm lw $1, 0($2) sub $4, $1, $3
Can't always forward Load word can still cause a hazard: an instruction tries to read a register following a load instruction that writes to the same register. Thus, we need a hazard detection unit to stall the load instruction
Stalling We can stall the pipeline by keeping an instruction in the same stage
Stall/Bubble in the Pipeline Morgan Kaufmann Publishers 10 November, 2018 Stall/Bubble in the Pipeline Stall inserted here Chapter 4 — The Processor — 36 Chapter 4 — The Processor
Stall/Bubble in the Pipeline Morgan Kaufmann Publishers 10 November, 2018 Stall/Bubble in the Pipeline Or, more accurately… Chapter 4 — The Processor — 37 Chapter 4 — The Processor
Load-use Hazard Detection Unit Need a hazard detection unit in the ID stage that inserts a stall between the load and its use The first line tests to see if the instruction is a load; the next two lines check to see if the destination register of the load in the EX stage matches either source registers of the instruction in the ID stage After this 1-cycle stall, the forwarding logic can handle the remaining data hazards ID Hazard Detection if (ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt))) stall the pipeline
Stall Hardware In addition to the hazard detection unit, we have to implement the stall Prevent the IF and ID stage instructions from making progress down the pipeline, done by preventing the PC register and the IF/ID pipeline register from changing Hazard detection unit controls the writing of the PC and IF/ID registers The instructions in the back half of the pipeline starting with the EX stage must be flushed (execute noop) Must deassert the control signals (setting them to 0) in the EX, MEM, and WB control fields of the ID/EX pipeline register. Hazard detection unit controls the multiplexer that chooses between the real control values and 0’s. Assume that 0’s are benign values in datapath: nothing changes
Adding the Hazard Hardware Read Address Instruction Memory Add PC 4 1 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 16 32 ALU Shift left 2 Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB Control cntrl Branch PCSrc Forward Unit Hazard Unit 1 For class handout
Adding the Hazard Detection Unit Hardware Read Address Instruction Memory Add PC 4 1 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 16 32 ALU Shift left 2 Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB Control cntrl Branch PCSrc Forward Unit ID/EX.MemRead Hazard Unit ID/EX.RegisterRt 1 In reality, only the signals RegWrite and MemWrite need to be 0, the other control signals can be don’t cares.