Chapter One Introduction to Pipelined Processors

Name: Chapter One Introduction to Pipelined Processors
Uploaded: 2017-10-11T20:56:00+00:00
Duration: PTM9S10
Channel: Crystal Gallagher
Description: Chapter One Introduction to Pipelined Processors

Chapter One Introduction to Pipelined Processors

Principle of Designing Pipeline Processors
(Design Problems of Pipeline Processors)

Instruction Prefetch and Branch Handling
The instructions in computer programs can be classified into 4 types: Arithmetic/Load Operations (60%) Store Type Instructions (15%) Branch Type Instructions (5%) Conditional Branch Type (Yes – 12% and No – 8%)

Arithmetic/Load Operations (60%) : These operations require one or two operand fetches. The execution of different operations requires a different number of pipeline cycles

Store Type Instructions (15%) : It requires a memory access to store the data. Branch Type Instructions (5%) : It corresponds to an unconditional jump.

Conditional Branch Type (Yes – 12% and No – 8%) : Yes path requires the calculation of the new address No path proceeds to next sequential instruction.

Arithmetic-load and store instructions do not alter the execution order of the program. Branch instructions and Interrupts cause some damaging effects on the performance of pipeline computers.

Interrupts When instruction I is being executed,the occurrence of an interrupt postpones instruction I+1 until ISR is serviced. There are two types of interrupt: Precise : caused by illegal operation codes and can be detected at decoding stage Imprecise: caused by defaults from storage, address and execution functions

Handling Interrupts Precise: Since decoding is the first stage, instruction I prohibits I+1 from entering the pipeline and all preceding instructions are executed before ISR Imprecise : No new instructions are allowed and all incomplete instructions whether they precede or follow are executed before ISR.

Handling Example – Interrupt System of Cray1

Cray-1 System The interrupt system is built around an exchange package. When an interrupt occurs, the Cray-1 saves 8 scalar registers, 8 address registers, program counter and monitor flags. These are packed into 16 words and swapped with a block whose address is specified by a hardware exchange address register Since exchange package does not have all state information, software interrupt handler have to store remaining states

In general, the higher the percentage of branch type instructions in a program, the slower a program will run on a pipeline processor.

Effect of Branching on Pipeline Performance
Consider a linear pipeline of 5 stages Fetch Instruction Store Results Fetch Operands Execute Decode

Overlapped Execution of Instruction without branching

I5 is a branch instruction

Estimation of the effect of branching on an n-segment instruction pipeline

Estimation of the effect of branching
Consider an instruction cycle with n pipeline clock periods. Let p – probability of conditional branch (20%) q – probability that a branch is successful (60% of 20%) (12/20=0.6)

Suppose there are m instructions Then no. of instructions of successful branches = mxpxq (mx0.2x0.6) Delay of (n-1)/n is required for each successful branch to flush pipeline.

Thus, the total instruction cycle required for m instructions =

As m becomes large , the average no. of instructions per instruction cycle is given as = ?

As m becomes large , the average no. of instructions per instruction cycle is given as

When p =0, the above measure reduces to n, which is ideal. In reality, it is always less than n.

Solution = ?

Multiple Prefetch Buffers
Buffers can be used to match the instruction fetch rate to pipeline consumption rate Sequential Buffers: for in-sequence pipelining Target Buffers: instructions from a branch target (for out-of-sequence pipelining)

Multiple Prefetch Buffers
A conditional branch cause both sequential and target to fill and based on condition one is selected and other is discarded

Data Buffering and Busing Structures

Speeding up of pipeline segments
The processing speed of pipeline segments are usually unequal. Consider the example given below: S1 S2 S3 T1 T2 T3

If T1 = T3 = T and T2 = 3T, S2 becomes the bottleneck and we need to remove it How? One method is to subdivide the bottleneck Two divisions possible are:

First Method: S1 S3 T T 2T T

Second Method: S1 S3 T T T T T

If the bottleneck is not sub-divisible, we can duplicate S2 in parallel S1 S2 S3 T 3T

Control and Synchronization is more complex in parallel segments

Data Buffering Instruction and data buffering provides a continuous flow to pipeline units Example: 4X TI ASC

Example: 4X TI ASC In this system it uses a memory buffer unit (MBU) which Supply arithmetic unit with a continuous stream of operands Store results in memory The MBU has three double buffers X, Y and Z (one octet per buffer) X,Y for input and Z for output

Example: 4X TI ASC This provides pipeline processing at high rate and alleviate bandwidth mismatch problem between memory and arithmetic pipeline

Busing Structures PBLM: Ideally subfunctions in pipeline should be independent, else the pipeline must be halted till dependency is removed. SOLN: An efficient internal busing structure. Example : TI ASC

Example : TI ASC In TI ASC, once instruction dependency is recognized, update capability is incorporated by transferring contents of Z buffer to X or Y buffer.

Chapter One Introduction to Pipelined Processors

Similar presentations

Presentation on theme: "Chapter One Introduction to Pipelined Processors"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter One Introduction to Pipelined Processors

Similar presentations

Presentation on theme: "Chapter One Introduction to Pipelined Processors"— Presentation transcript:

Similar presentations

About project

Feedback