Pipelining. A process of execution of instructions may be decomposed into several suboperations Each of suboperations may be executed by a dedicated segment.

Pipelining

A process of execution of instructions may be decomposed into several suboperations Each of suboperations may be executed by a dedicated segment that operates concurrently with all other segments

Instruction pipelining Idea of instruction pipelining

Instruction pipelining Idea of instruction pipelinig

Instruction pipelining Speedup

Machine without pipelining Machine with instruction pipelining Speedup

Instruction pipelining Ideal pipeline All objects go through the same stages No sharing of resources between any two stages Propagation delay through all pipeline stages is equal The scheduling of an object entering the pipeline is not affected by the objects in other stages

Instruction pipelining Problems – Structural hazard Structural hazard An instruction in the pipeline may need a resource being used by another instruction in the pipeline Example: Princeton vs Harvard memory architecture (unified vs split cache design)

Instruction pipelining Problems – Data hazard Data hazard An instruction may produce data that is needed by a later instruction Examples: (No bubble, why?)

Instruction pipelining Problems – Data hazard Resolving data hazards Inerlocks Freeze earlier pipeline stages until the data becomes available Bypasses If data is available somewhere in the datapath provide a bypass to get it to the right stage Data flow analysis (out of order execution, register renaming) Addressing the problem before it takes its toll

Instruction pipelining Problems – Control hazard Control hazard An instruction may determine the next instruction to be executed Examples: Branch instructions, interrupts

Instruction pipelining Problems – Control hazard ???

Instruction pipelining Problems – Resolving control hazards Prefetch Target Instruction Fetch instructions in both streams, branch not taken and branch taken Both are saved until branch is executed, then select the right instruction and discard the wrong stream Branch Target Buffer BTB Entries: Addresses of previously executed branches; Target instruction and the next few instructions When fetching an instruction, search the BTB If found, fetch the instruction stream from the BTB If not, new stream is fetched and update the BTB

Instruction pipelining Problems – Resolving control hazards Loop Buffer High speed register file for entire loop Branch Prediction Guessing the branch condition, correct guess eliminates the branch penalty Dynamic Branch Prediction Finite State Machine (FSM) added to the BTB Static Branch Prediction Compiler support Instruction Set support

Instruction pipelining Problems – Resolving control hazards Delayed Branch (Branch Delay Slots) Exposing control hazard to software

Instruction pipelining Arithmetic pipelining The process of performing a complex arithmetic operation may be also divided into stages (exmaple: FP addition) A.The exponents of the two floating-point numbers to be added are compared to find the number with the smallest magnitude B.The significand of the number with the smaller magnitude is shifted so that the exponents of the two numbers agree C.The significands are added D.The result of the addition is normalized E.Checks are made to see if any floating-point exceptions occurred during addition, such as overflow F.Rounding occurs

Instruction pipelining Arithmetic pipelining Example: s = x + y x = 1234.00; y = -567.8

Instruction & Arithmetic pipelining Speedup Example: LOOP:LOADA i, R1 LOADB i, R2 ADDR1, R2 STORER2, C i BRANCHn, LOOP 4-stage pipeline: FI – Fetch instruction (1t) DO – Decode instruction and prepare operands (1t) EX – Execute instruction general-purpose and fixed- point arithmetic instructions (1t) floating-point arithmetic instructions (6t) WB – Write back (1t)

Instruction & Arithmetic Pipelining Dedicated Vector Instruction

Without instruction pipelining t C = (4 * t + 4 * t + 9 * t + 4 * t + 4 * t) * n = 25 * t * n With instruction pipelining t p = 15 * t * n With a dedicated vector instruction VADD A,B,C,n t v = t ini + 8 * t + (n – 1) * t = t s + (n – 1) * t Speedup Instruction & Arithmetic pipelining Speedup

Throughput [Mflops] Number of floating point operations per second Efficiency Instruction & Arithmetic pipelining Speedup

Pipelining. A process of execution of instructions may be decomposed into several suboperations Each of suboperations may be executed by a dedicated segment.

Similar presentations

Presentation on theme: "Pipelining. A process of execution of instructions may be decomposed into several suboperations Each of suboperations may be executed by a dedicated segment."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pipelining. A process of execution of instructions may be decomposed into several suboperations Each of suboperations may be executed by a dedicated segment.

Similar presentations

Presentation on theme: "Pipelining. A process of execution of instructions may be decomposed into several suboperations Each of suboperations may be executed by a dedicated segment."— Presentation transcript:

Similar presentations

About project

Feedback