Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.

Slides:

Advertisements

Similar presentations

Morgan Kaufmann Publishers The Processor

Advertisements

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.

Instruction-Level Parallelism compiler techniques and branch prediction prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University March.

Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.

Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Instruction-Level Parallelism (ILP)

Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.

1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.

COMP381 by M. Hamdi 1 Pipeline Hazards. COMP381 by M. Hamdi 2 Pipeline Hazards Hazards are situations in pipelining where one instruction cannot immediately.

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.

1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

Chapter 12 Pipelining Strategies Performance Hazards.

EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:

1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.

Goal: Reduce the Penalty of Control Hazards

King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )

1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

1 Lecture 4: Advanced Pipelines Control hazards, multi-cycle in-order pipelines, static ILP (Appendix A.4-A.10, Sections )

7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.

Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.

5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.

Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.

Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.

1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.

1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.

Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

CMPE 421 Parallel Computer Architecture

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.

1 COMP541 Pipelined MIPS Montek Singh Mar 30, 2010.

Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.

Branch Hazards and Static Branch Prediction Techniques

Pipelining Example Laundry Example: Three Stages

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

CSCE 212 Chapter 6 Enhancing Performance with Pipelining Instructor: Jason D. Bakos.

ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.

CS203 – Advanced Computer Architecture Pipelining Review.

Instruction-Level Parallelism and Its Dynamic Exploitation

Computer Organization CS224

Instruction Level Parallelism

Computer Architecture

Concepts and Challenges

Morgan Kaufmann Publishers

Part IV Data Path and Control

Pipeline Implementation (4.6)

Appendix C Pipeline implementation

Morgan Kaufmann Publishers The Processor

Chapter 4 The Processor Part 4

Morgan Kaufmann Publishers The Processor

Part IV Data Path and Control

Lecture 6: Advanced Pipelines

The processor: Pipelining and Branching

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

CSC 4250 Computer Architectures

How to improve (decrease) CPI

Advanced Computer Architecture

Control unit extension for data hazards

Instruction Execution Cycle

CS203 – Advanced Computer Architecture

Control unit extension for data hazards

Dynamic Hardware Prediction

Control unit extension for data hazards

Presentation transcript:

Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This leads to an interrupt of the synchronous execution in the pipeline and thus to a performance decrease. Solution: suspend the execution of the instruction (pipeline stall) If an instruction is suspended in a certain stage of the pipeline, all subsequent instructions are also stopped. The pipeline logic inserts NOP operations into the next pipeline stage. The processing of all earlier instructions is continued.

Resource Hazards Structural hazards Result from two instructions that are processed in different stages which require the same resource. Not all of the components can be replicated to make sure that this never happens. Examples Parallel writes to the register file, e.g., if arithmetic operations can write directly and load in the memory access phase. Parallel access to memory in IF and MA Subsequent instructions need the FP division hardware that is not implemented as a pipeline.

Data Hazards and Control Hazards Data hazards Instruction access the same data as earlier instructions and these are not yet finished, e.g., an operand computed by a previous instruction is not yet available. Data hazards result from data dependences between the instructions. Branch (control) hazards The next instruction cannot be fetched due to a jump in the control flow.

Resolving Pipeline Hazards Simple solution is to stop the pipeline Insertion of NOPs or Pipeline Bubbles. This reduces the pipeline throughput. Many techniques in hardware and software have been developed to reduce the effect of hazards on the performance.

Pipeline Hazards and Data Dependences Data dependences occur between statements in the program. Example add R1,R2,R3 sub R4,R5,R6 and R6,R1,R8 xor R9,R1,R11

Data Dependence An instruction j is data dependent on instruction i if There is a path from i to j and where –I(i) = set or read data –O(i)=set of written data

True dependence True or flow dependence: first write then read Example LOOP: loadF0,0(R1) addF4,F0,F2

Anti Dependence Anti dependence (first read then write) Instruction i reads an operand from a register or memory which is overwritten by a later instruction. ADD R2,R3,R4 XOR R3,R5,R6

Output Dependence Output dependence (both write) Instruction i and j write the same register or memory address: ADD R2,R3,R4 XOR R2,R5,R6 Anti and output dependences are called name dependences.

Dependences and Pipeline Hazards Data dependences are properties of the program. It depends on the pipeline organization and the temporal execution of instructions whether data dependences lead to pipeline hazards or not. Data dependences may induce hazards. Thus, they point out the possibility. They determine the execution order of instructions. –Independent instructions can be reordered and even executed in parallel. –They determine the maximum degree of parallelism.

Data Hazards Data hazards can occur if data dependent instructions are executed only with a short delay in the pipeline. Thus their accesses can overlap in the pipeline. Example: True dependence load R1, A load R2, B add R2,R1,R2 mul R1,R2,R1 WB MA EX ID IF WB MA EX ID IF WB MA EX ID IF WB MA EX ID IF Zeit t i+1 t i+3 titi t i+2 t i+4

Data Hazards Example: True dependence WB MA EX ID IF WB MA EX ID IF add R2,R1,R2 mul R1,R2,R1 Zeit t i+1 t i+3 titi t i+2 t i+4 R2 neu R2 alt Read wrong value

Data Hazards Classification Read-after-write (RAW) Happens if instruction j reads a source register before instruction i wrote its result. Implied by a true dependence. Write-after-Read (WAR) Happens if instruction j writes the target register before instruction i reads the operand. Implied by an anti dependence Write-after-Write (WAW) Happens if instruction j writes its target register before instruction i wrote its result to the same register. Implied by an output dependence. Can happen in pipelines where multiple stages can write or an instruction can proceed without waiting for a stalled previous instruction. inst i … inst j

Handling Hazards Software solutions (static solutions) Implemented by the compiler Insertion of NOPs –Detection of potential data hazards –Insertion of NOPs after instructions that might lead to hazards. Reordering of instructions –Instruction scheduling phase of the compiler –Reorders instructions so that independent instructions are executed between dependent instructions.

Handling Hazards Hardware solutions (Dynamic Solutions) Detection of conflicts –Requires an appropriate hardware logic Handling –Interlocking, Stalling –Forwarding –Forwarding with interlocking

Handling Hazards in the Hardware Pipeline Interlocking Detection of hazards. Stops instruction j and all subsequent instructions for multiple cycles. WB MA EX ID IF WB MA EX add R2,R1,R2 mul R1,R2,R1 Zeit t i+1 t i+3 titi t i+2 t i+4 R2 stall ID IF

Handling Hazards in the Hardware Forwarding Direct forward of ALU results to the ALU input. Eliminates stall cycles. Requires additional hardware (forwarding logic) WB MA EX ID IF WB MA EX ID IF add R2,R1,R2 mul R1,R2,R1 Zeit t i+1 t i+3 titi t i+2 t i+4

Forwarding and Interlocking Not all hazards can be handled by forwarding Example: true dependence with load operation WB MA EX ID IF WB MA EX ID IF load R2,A add R1,R2,R1 Zeit t i+1 t i+3 titi t i+2 t i+4 WB MA EX ID IF WB MA EX load R2,A add R1,R2,R1 Zeit t i+1 t i+3 titi t i+2 t i+4 Solution: Forwarding + Interlocking stall ID IF

MIPS-Pipeline Hinweis: Skript Wismüller

Branch Hazards Computation of the target and condition is done in the EX phase and it replaces PC in the MA phase. Condition typically depends on the EXE phase of the previous instruction requiring forwarding. Thus, only after three cycles the correct instruction can be loaded.

Branch Hazards JUMP Target WB MA EX ID IF WB MA EX ID IF Zeit PC Stall cycles stall

Branch Hazards Condition and target should be computed already in ID Structural Hazard: –ALU can not be used for the computation of the target. Additional ALU is thus required in ID. Data dependence with previous arithmetic instruction –RAW Hazard Critical path in ID phase is prolongated –Decoding, computation of branch target, and updating PC for critical path.

Resolving Branch Hazards Insertion of independent instructions Instruction scheduling of compiler Fill the stall cycle with an indepent instruction (Delay Slot) add R1,R2,R3 br addr nop... br addr add R1,R2,R

Branch Prediction Prediction of branch decision when a jump is encountered. Speculative execution of instructions dependent on the predicted outcome. After the condition was computed Either continue without delay since the prediction was correct or delete the started instructions and fetch the correct ones. Two classes Static branch prediction by hardware or compiler Dynamic branch prediciton by the hardware

Static Branch Prediction Hardware Static prediction in processor, backward jumps are predicted to be always taken. Compiler Specification via a bit in the jump opcode Prediction can be guided by program analysis or profiling (feedback directed compilation)

Dynamic Branch Prediction Properties Based on dynamic behavior of the application –The history of a jump is taken into account. Leads to more precise predictions Expensive in terms of hardware Branch Prediction Buffer Cache for information about conditional jumps Requires that the target can be computed fast

Branch Prediction Buffer Cache Organization Address-Tag inval0 0 Address-Tag1 inval0 1 Address-Tag entries (Instruction address >> 2) % Bit

Single Bit Predictor Single prediction bit If the Bit is set, the brunch is predicted to be taken. If the prediction is wrong the bit is inverted. NT T T Predict Taken Predict Not Taken

Single Bit vs Double Bit Predictors Single Bit Predictor is suboptimal for nested loops Wrong prediction in the first iteration of inner loop.

Two Bit Predictor Two bits allow to have four states –strongly taken –weakly taken –weakly not taken –strongly not taken Requires two mispredictions to switch prediction.

Two Bit Predictor (11) Predict taken (11) Predict taken (10) Predict taken (10) Predict taken (01) Predict not taken (01) Predict not taken (00) Predict not taken (00) Predict not taken T T T NT T weakly taken weakly not taken

Two Bit Predictor (11) Predict taken (11) Predict taken (10) Predict taken (10) Predict taken (01) Predict not taken (01) Predict not taken (00) Predict not taken (00) Predict not taken T T T NT T

Two-Bit Predictor with Saturation Scheme Count the taken jumps If sum >= 2, predict taken jump Extensible to n Bit Experiments showed that there is no big impact. TNT (11) Predict taken (11) Predict taken (10) Predict taken (10) Predict taken (01) Predict not taken (01) Predict not taken (00) Predict not taken (00) Predict not taken NT T T T

Size of Prediction Buffer – SPEC 89 % Misspredictions

Correlation Predictors Prediction is also based on the history of other jumps. Simple two bit predictor is not sufficient to predict third branch. Taking into account the preceding jumps, enables a correct prediction. If (aa==2) aa=0; If (bb==2) bb=0; If (aa!=bb){ … }

(m,n)-Predictors (m,n)-Predictors: Uses the history of the last m jumps to select one of 2 m n-bit predictors. Branch History Register (BHR) m-Bit shift register Store the global history of the last m jumps. Bits determine whether the jump was taken. After each jump the outcome is shifted into the BHR The BHR gives the index in the Pattern History Table (PHT)

(m,n) Predictors Example: (2,2) Predictor: Pattern History Tables PHTs (2-Bit Predictors) Branch History Register (BHR) 2 Bit Schieberegister) Jump address 2-Bit Predictor

Brunch Target Buffer Branch Target Address Cache, Branch Target Buffer Required, if the computation of the target address is late in the pipeline. Stores the jump address and the target address Can be used in the IF phase. Can be combined with a predictor. Adress of jump instruction Target address Prediction bits

Cycle i+1 Cycle i+2 Cycle i Branch Target Buffer (BTB) Prediction in IF Send PC to memory and BTB Found? Branch& Taken? Fetch instr. at target Taken? Update BTB kill instructions update PC Mispredicted branch kill fetched instructions update PC delete entry from BTB Branch corretly predicted; Continue execution with no stalls Normal instruction execution No Yes No Yes Fetch next instruction YesNo