Download presentation
Presentation is loading. Please wait.
Published byClement Robertson Modified over 8 years ago
1
Pipelining
2
A process of execution of instructions may be decomposed into several suboperations Each of suboperations may be executed by a dedicated segment that operates concurrently with all other segments
3
Instruction pipelining Idea of instruction pipelining
4
Instruction pipelining Idea of instruction pipelinig
5
Instruction pipelining Speedup
6
Machine without pipelining Machine with instruction pipelining Speedup
7
Instruction pipelining Ideal pipeline All objects go through the same stages No sharing of resources between any two stages Propagation delay through all pipeline stages is equal The scheduling of an object entering the pipeline is not affected by the objects in other stages
8
Instruction pipelining Problems – Structural hazard Structural hazard An instruction in the pipeline may need a resource being used by another instruction in the pipeline Example: Princeton vs Harvard memory architecture (unified vs split cache design)
9
Instruction pipelining Problems – Data hazard Data hazard An instruction may produce data that is needed by a later instruction Examples: (No bubble, why?)
10
Instruction pipelining Problems – Data hazard Resolving data hazards Inerlocks Freeze earlier pipeline stages until the data becomes available Bypasses If data is available somewhere in the datapath provide a bypass to get it to the right stage Data flow analysis (out of order execution, register renaming) Addressing the problem before it takes its toll
11
Instruction pipelining Problems – Control hazard Control hazard An instruction may determine the next instruction to be executed Examples: Branch instructions, interrupts
12
Instruction pipelining Problems – Control hazard ???
13
Instruction pipelining Problems – Resolving control hazards Prefetch Target Instruction Fetch instructions in both streams, branch not taken and branch taken Both are saved until branch is executed, then select the right instruction and discard the wrong stream Branch Target Buffer BTB Entries: Addresses of previously executed branches; Target instruction and the next few instructions When fetching an instruction, search the BTB If found, fetch the instruction stream from the BTB If not, new stream is fetched and update the BTB
14
Instruction pipelining Problems – Resolving control hazards Loop Buffer High speed register file for entire loop Branch Prediction Guessing the branch condition, correct guess eliminates the branch penalty Dynamic Branch Prediction Finite State Machine (FSM) added to the BTB Static Branch Prediction Compiler support Instruction Set support
15
Instruction pipelining Problems – Resolving control hazards Delayed Branch (Branch Delay Slots) Exposing control hazard to software
16
Instruction pipelining Arithmetic pipelining The process of performing a complex arithmetic operation may be also divided into stages (exmaple: FP addition) A.The exponents of the two floating-point numbers to be added are compared to find the number with the smallest magnitude B.The significand of the number with the smaller magnitude is shifted so that the exponents of the two numbers agree C.The significands are added D.The result of the addition is normalized E.Checks are made to see if any floating-point exceptions occurred during addition, such as overflow F.Rounding occurs
17
Instruction pipelining Arithmetic pipelining Example: s = x + y x = 1234.00; y = -567.8
18
Instruction & Arithmetic pipelining Speedup Example: LOOP:LOADA i, R1 LOADB i, R2 ADDR1, R2 STORER2, C i BRANCHn, LOOP 4-stage pipeline: FI – Fetch instruction (1t) DO – Decode instruction and prepare operands (1t) EX – Execute instruction general-purpose and fixed- point arithmetic instructions (1t) floating-point arithmetic instructions (6t) WB – Write back (1t)
19
Instruction & Arithmetic Pipelining Dedicated Vector Instruction
20
Without instruction pipelining t C = (4 * t + 4 * t + 9 * t + 4 * t + 4 * t) * n = 25 * t * n With instruction pipelining t p = 15 * t * n With a dedicated vector instruction VADD A,B,C,n t v = t ini + 8 * t + (n – 1) * t = t s + (n – 1) * t Speedup Instruction & Arithmetic pipelining Speedup
21
Throughput [Mflops] Number of floating point operations per second Efficiency Instruction & Arithmetic pipelining Speedup
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.