Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This leads to an interrupt of the synchronous execution in the pipeline and thus to a performance decrease. Solution: suspend the execution of the instruction (pipeline stall) If an instruction is suspended in a certain stage of the pipeline, all subsequent instructions are also stopped. The pipeline logic inserts NOP operations into the next pipeline stage. The processing of all earlier instructions is continued.
Resource Hazards Structural hazards Result from two instructions that are processed in different stages which require the same resource. Not all of the components can be replicated to make sure that this never happens. Examples Parallel writes to the register file, e.g., if arithmetic operations can write directly and load in the memory access phase. Parallel access to memory in IF and MA Subsequent instructions need the FP division hardware that is not implemented as a pipeline.
Data Hazards and Control Hazards Data hazards Instruction access the same data as earlier instructions and these are not yet finished, e.g., an operand computed by a previous instruction is not yet available. Data hazards result from data dependences between the instructions. Branch (control) hazards The next instruction cannot be fetched due to a jump in the control flow.
Resolving Pipeline Hazards Simple solution is to stop the pipeline Insertion of NOPs or Pipeline Bubbles. This reduces the pipeline throughput. Many techniques in hardware and software have been developed to reduce the effect of hazards on the performance.
Pipeline Hazards and Data Dependences Data dependences occur between statements in the program. Example add R1,R2,R3 sub R4,R5,R6 and R6,R1,R8 xor R9,R1,R11
Data Dependence An instruction j is data dependent on instruction i if There is a path from i to j and where –I(i) = set or read data –O(i)=set of written data
True dependence True or flow dependence: first write then read Example LOOP: loadF0,0(R1) addF4,F0,F2
Anti Dependence Anti dependence (first read then write) Instruction i reads an operand from a register or memory which is overwritten by a later instruction. ADD R2,R3,R4 XOR R3,R5,R6
Output Dependence Output dependence (both write) Instruction i and j write the same register or memory address: ADD R2,R3,R4 XOR R2,R5,R6 Anti and output dependences are called name dependences.
Dependences and Pipeline Hazards Data dependences are properties of the program. It depends on the pipeline organization and the temporal execution of instructions whether data dependences lead to pipeline hazards or not. Data dependences may induce hazards. Thus, they point out the possibility. They determine the execution order of instructions. –Independent instructions can be reordered and even executed in parallel. –They determine the maximum degree of parallelism.
Data Hazards Data hazards can occur if data dependent instructions are executed only with a short delay in the pipeline. Thus their accesses can overlap in the pipeline. Example: True dependence load R1, A load R2, B add R2,R1,R2 mul R1,R2,R1 WB MA EX ID IF WB MA EX ID IF WB MA EX ID IF WB MA EX ID IF Zeit t i+1 t i+3 titi t i+2 t i+4
Data Hazards Example: True dependence WB MA EX ID IF WB MA EX ID IF add R2,R1,R2 mul R1,R2,R1 Zeit t i+1 t i+3 titi t i+2 t i+4 R2 neu R2 alt Read wrong value
Data Hazards Classification Read-after-write (RAW) Happens if instruction j reads a source register before instruction i wrote its result. Implied by a true dependence. Write-after-Read (WAR) Happens if instruction j writes the target register before instruction i reads the operand. Implied by an anti dependence Write-after-Write (WAW) Happens if instruction j writes its target register before instruction i wrote its result to the same register. Implied by an output dependence. Can happen in pipelines where multiple stages can write or an instruction can proceed without waiting for a stalled previous instruction. inst i … inst j
Handling Hazards Software solutions (static solutions) Implemented by the compiler Insertion of NOPs –Detection of potential data hazards –Insertion of NOPs after instructions that might lead to hazards. Reordering of instructions –Instruction scheduling phase of the compiler –Reorders instructions so that independent instructions are executed between dependent instructions.
Handling Hazards Hardware solutions (Dynamic Solutions) Detection of conflicts –Requires an appropriate hardware logic Handling –Interlocking, Stalling –Forwarding –Forwarding with interlocking
Handling Hazards in the Hardware Pipeline Interlocking Detection of hazards. Stops instruction j and all subsequent instructions for multiple cycles. WB MA EX ID IF WB MA EX add R2,R1,R2 mul R1,R2,R1 Zeit t i+1 t i+3 titi t i+2 t i+4 R2 stall ID IF
Handling Hazards in the Hardware Forwarding Direct forward of ALU results to the ALU input. Eliminates stall cycles. Requires additional hardware (forwarding logic) WB MA EX ID IF WB MA EX ID IF add R2,R1,R2 mul R1,R2,R1 Zeit t i+1 t i+3 titi t i+2 t i+4
Forwarding and Interlocking Not all hazards can be handled by forwarding Example: true dependence with load operation WB MA EX ID IF WB MA EX ID IF load R2,A add R1,R2,R1 Zeit t i+1 t i+3 titi t i+2 t i+4 WB MA EX ID IF WB MA EX load R2,A add R1,R2,R1 Zeit t i+1 t i+3 titi t i+2 t i+4 Solution: Forwarding + Interlocking stall ID IF
MIPS-Pipeline Hinweis: Skript Wismüller
Branch Hazards Computation of the target and condition is done in the EX phase and it replaces PC in the MA phase. Condition typically depends on the EXE phase of the previous instruction requiring forwarding. Thus, only after three cycles the correct instruction can be loaded.
Branch Hazards JUMP Target WB MA EX ID IF WB MA EX ID IF Zeit PC Stall cycles stall
Branch Hazards Condition and target should be computed already in ID Structural Hazard: –ALU can not be used for the computation of the target. Additional ALU is thus required in ID. Data dependence with previous arithmetic instruction –RAW Hazard Critical path in ID phase is prolongated –Decoding, computation of branch target, and updating PC for critical path.
Resolving Branch Hazards Insertion of independent instructions Instruction scheduling of compiler Fill the stall cycle with an indepent instruction (Delay Slot) add R1,R2,R3 br addr nop... br addr add R1,R2,R
Branch Prediction Prediction of branch decision when a jump is encountered. Speculative execution of instructions dependent on the predicted outcome. After the condition was computed Either continue without delay since the prediction was correct or delete the started instructions and fetch the correct ones. Two classes Static branch prediction by hardware or compiler Dynamic branch prediciton by the hardware
Static Branch Prediction Hardware Static prediction in processor, backward jumps are predicted to be always taken. Compiler Specification via a bit in the jump opcode Prediction can be guided by program analysis or profiling (feedback directed compilation)
Dynamic Branch Prediction Properties Based on dynamic behavior of the application –The history of a jump is taken into account. Leads to more precise predictions Expensive in terms of hardware Branch Prediction Buffer Cache for information about conditional jumps Requires that the target can be computed fast
Branch Prediction Buffer Cache Organization Address-Tag inval0 0 Address-Tag1 inval0 1 Address-Tag entries (Instruction address >> 2) % Bit
Single Bit Predictor Single prediction bit If the Bit is set, the brunch is predicted to be taken. If the prediction is wrong the bit is inverted. NT T T Predict Taken Predict Not Taken
Single Bit vs Double Bit Predictors Single Bit Predictor is suboptimal for nested loops Wrong prediction in the first iteration of inner loop.
Two Bit Predictor Two bits allow to have four states –strongly taken –weakly taken –weakly not taken –strongly not taken Requires two mispredictions to switch prediction.
Two Bit Predictor (11) Predict taken (11) Predict taken (10) Predict taken (10) Predict taken (01) Predict not taken (01) Predict not taken (00) Predict not taken (00) Predict not taken T T T NT T weakly taken weakly not taken
Two Bit Predictor (11) Predict taken (11) Predict taken (10) Predict taken (10) Predict taken (01) Predict not taken (01) Predict not taken (00) Predict not taken (00) Predict not taken T T T NT T
Two-Bit Predictor with Saturation Scheme Count the taken jumps If sum >= 2, predict taken jump Extensible to n Bit Experiments showed that there is no big impact. TNT (11) Predict taken (11) Predict taken (10) Predict taken (10) Predict taken (01) Predict not taken (01) Predict not taken (00) Predict not taken (00) Predict not taken NT T T T
Size of Prediction Buffer – SPEC 89 % Misspredictions
Correlation Predictors Prediction is also based on the history of other jumps. Simple two bit predictor is not sufficient to predict third branch. Taking into account the preceding jumps, enables a correct prediction. If (aa==2) aa=0; If (bb==2) bb=0; If (aa!=bb){ … }
(m,n)-Predictors (m,n)-Predictors: Uses the history of the last m jumps to select one of 2 m n-bit predictors. Branch History Register (BHR) m-Bit shift register Store the global history of the last m jumps. Bits determine whether the jump was taken. After each jump the outcome is shifted into the BHR The BHR gives the index in the Pattern History Table (PHT)
(m,n) Predictors Example: (2,2) Predictor: Pattern History Tables PHTs (2-Bit Predictors) Branch History Register (BHR) 2 Bit Schieberegister) Jump address 2-Bit Predictor
Brunch Target Buffer Branch Target Address Cache, Branch Target Buffer Required, if the computation of the target address is late in the pipeline. Stores the jump address and the target address Can be used in the IF phase. Can be combined with a predictor. Adress of jump instruction Target address Prediction bits
Cycle i+1 Cycle i+2 Cycle i Branch Target Buffer (BTB) Prediction in IF Send PC to memory and BTB Found? Branch& Taken? Fetch instr. at target Taken? Update BTB kill instructions update PC Mispredicted branch kill fetched instructions update PC delete entry from BTB Branch corretly predicted; Continue execution with no stalls Normal instruction execution No Yes No Yes Fetch next instruction YesNo