Principles of Linear Pipelining. In pipelining, we divide a task into set of subtasks. The precedence relation of a set of subtasks {T 1, T 2,…, T k }

Slides:



Advertisements
Similar presentations
PIPELINING AND VECTOR PROCESSING
Advertisements

PIPELINE AND VECTOR PROCESSING
Computer Organization and Architecture
CS5365 Pipelining. Divide task into a sequence of subtasks. Each subtask is executed by a stage (segment)of the pipe.
Chapter 3 Pipelining. 3.1 Pipeline Model n Terminology –task –subtask –stage –staging register n Total processing time for each task. –T pl =, where t.
Computer Organization and Architecture
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Parallell Processing Systems1 Chapter 4 Vector Processors.
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
Computer Organization and Architecture
Chapter 6 Pipelining & RISCs Dr. Abraham Techniques for speeding up a computer Pipelining Parallel processing.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Computer Organization and Architecture The CPU Structure.
Pipelining By Toan Nguyen.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Basic Processing Unit (Week 6)
CH12 CPU Structure and Function
5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
Chapter 5 Basic Processing Unit
Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.
PIPELINING AND VECTOR PROCESSING
Chapter One Introduction to Pipelined Processors.
Speeding up of pipeline segments © Fr Dr Jaison Mulerikkal CMI.
Parallel architecture Technique. Pipelining Processor Pipelining is a technique of decomposing a sequential process into sub-processes, with each sub-process.
Module : Algorithmic state machines. Machine language Machine language is built up from discrete statements or instructions. On the processing architecture,
ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013.
1 Control Unit Operation and Microprogramming Chap 16 & 17 of CO&A Dr. Farag.
Overview of Super-Harvard Architecture (SHARC) Daniel GlickDaniel Glick – May 15, 2002 for V (Dewar)
Principles of Linear Pipelining
Computer Architecture Lecture 32 Fasih ur Rehman.
Processor Architecture
Chapter One Introduction to Pipelined Processors
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
Introduction  The speed of execution of program is influenced by many factors. i) One way is to build faster circuit technology to build the processor.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.
Chapter One Introduction to Pipelined Processors
CPU Design and Pipelining – Page 1CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: CPU Operations and Pipelining Reading:
Chapter One Introduction to Pipelined Processors
Principles of Linear Pipelining
Speedup Speedup is defined as Speedup = Time taken for a given computation by a non-pipelined functional unit Time taken for the same computation by a.
Chapter One Introduction to Pipelined Processors.
Chapter One Introduction to Pipelined Processors.
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
Pipelining. A process of execution of instructions may be decomposed into several suboperations Each of suboperations may be executed by a dedicated segment.
UNIT-V PIPELINING & VECTOR PROCESSING.
Real-World Pipelines Idea Divide process into independent stages
PARALLEL COMPUTER ARCHITECTURE
Computer Architecture Chapter (14): Processor Structure and Function
Chapter One Introduction to Pipelined Processors
Chapter One Introduction to Pipelined Processors
Chapter One Introduction to Pipelined Processors
CS/COE0447 Computer Organization & Assembly Language
Pipelining and Vector Processing
Data Representation and Arithmetic Algorithms
Array Processor.
Data Representation and Arithmetic Algorithms
Computer Architecture
COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING
Extra Reading Data-Instruction Stream : Flynn
COMPUTER ORGANIZATION AND ARCHITECTURE
Pipelining and Superscalar Techniques
Presentation transcript:

Principles of Linear Pipelining

In pipelining, we divide a task into set of subtasks. The precedence relation of a set of subtasks {T 1, T 2,…, T k } for a given task T implies that the same task T j cannot start until some earlier task T i finishes. The interdependencies of all subtasks form the precedence graph.

Principles of Linear Pipelining With a linear precedence relation, task T j cannot start until earlier subtasks { T i } for all (i < j) finish. A linear pipeline can process subtasks with a linear precedence graph.

Principles of Linear Pipelining A pipeline can process successive subtasks if Subtasks have linear precedence order Each subtasks take nearly same time to complete

Basic Linear Pipeline L: latches, interface between different stages of pipeline S1, S2, etc. : pipeline stages

Basic Linear Pipeline It consists of cascade of processing stages. Stages : Pure combinational circuits performing arithmetic or logic operations over the data flowing through the pipe. Stages are separated by high speed interface latches. Latches : Fast Registers holding intermediate results between stages Information Flow are under the control of common clock applied to all latches

Basic Linear Pipeline L: latches, interface between different stages of pipeline S1, S2, etc. : pipeline stages

Basic Linear Pipeline The flow of data in a linear pipeline having four stages for the evaluation of a function on five inputs is as shown below:

Basic Linear Pipeline The vertical axis represents four stages The horizontal axis represents time in units of clock period of the pipeline.

Clock Period (τ) for the pipeline Let τ i be the time delay of the circuitry S i and t 1 be time delay of latch. Then the clock period of a linear pipeline is defined by The reciprocal of clock period is called clock frequency (f = 1/τ) of a pipeline processor.

Performance of a linear pipeline Consider a linear pipeline with k stages. Let T be the clock period and the pipeline is initially empty. Starting at any time, let us feed n inputs and wait till the results come out of the pipeline. First input takes k periods and the remaining (n-1) inputs come one after the another in successive clock periods. Thus the computation time for the pipeline T p is T p = kT+(n-1)T = [k+(n-1)]T

Performance of a linear pipeline For example if the linear pipeline have four stages with five inputs. Tp = [k+(n-1)]T = [4+4]T = 8T

Example : Floating Point Adder Unit

Floating Point Adder Unit This pipeline is linearly constructed with 4 functional stages. The inputs to this pipeline are two normalized floating point numbers of the form A = a x 2 p B = b x 2 q where a and b are two fractions and p and q are their exponents. For simplicity, base 2 is assumed

Floating Point Adder Unit Our purpose is to compute the sum C = A + B = c x 2 r = d x 2 s where r = max(p,q) and 0.5 ≤ d < 1 For example: A= x 10 3 B= x 10 2 a = b= p=3 & q =2

Floating Point Adder Unit Operations performed in the four pipeline stages are : 1.Compare p and q and choose the largest exponent, r = max(p,q)and compute t = |p – q| Example: r = max(p, q) = 3 t = |p-q| = |3-2|= 1

Floating Point Adder Unit 2.Shift right the fraction associated with the smaller exponent by t units to equalize the two exponents before fraction addition. Example: Smaller exponent, b= Shift right b by 1 unit is 0.082

Floating Point Adder Unit 3.Perform fixed-point addition of two fractions to produce the intermediate sum fraction c, where 0 ≤ c < 1 Example : a = b= c = a + b = =

Floating Point Adder Unit 4.Count the number of leading zeros (u) in fraction c and shift left c by u units to produce the normalized fraction sum d = c x 2 u, with a leading bit 1. Update the large exponent s by subtracting s = r – u to produce the output exponent. Example: c = , u = -1  right shift d = , s= r – u = 3-(-1) = 4 C = x 10 4

Floating Point Adder Unit The above 4 steps can all be implemented with combinational logic circuits and the 4 stages are: 1.Comparator / Subtractor 2.Shifter 3.Fixed Point Adder 4.Normalizer (leading zero counter and shifter)

4-STAGE FLOATING POINT ADDER

Example for floating-point adder Exponents Segment 1: Segment 2: Segment 3: Segment 4: RR R R R R R R Adjust exponent Normalize result Add mantissas Align mantissas Choose exponent Compare exponents by subtraction Difference=3-2=1 Mantissas baAB For example: X=0.9504*10 3 Y=0.8200* S= =

Performance Parameters The various performance parameters of pipeline are : 1.Speed-up 2.Throughput 3.Efficiency

Speedup Speedup is defined as Speedup = Time taken for a given computation by a non-pipelined functional unit Time taken for the same computation by a pipelined version Assume a function of k stages of equal complexity which takes the same amount of time T. Non-pipelined function will take kT time for one input. Then Speedup = nkT/(k+n-1)T = nk/(k+n-1)

Speed-up For e.g., if a pipeline has 4 stages and 5 inputs, its speedup factor is Speedup = ?

Efficiency It is an indicator of how efficiently the resources of the pipeline are used. If a stage is available during a clock period, then its availability becomes the unit of resource. Efficiency can be defined as

Efficiency

No. of stage time units = nk – there are n inputs and each input uses k stages. Total no. of stage-time units available = k[ k + (n-1)] – It is the product of no. of stages in the pipeline (k) and no. of clock periods taken for computation(k+(n-1)).

Throughput It is the average number of results computed per unit time. For n inputs, a k-staged pipeline takes [k+(n-1)]T time units Then, Throughput = n / [k+n-1] T = nf / [k+n-1] where f is the clock frequency – Throughput = Efficiency x Frequency

Point no 2 Classification of Pipelining

Handler’s Classification Based on the level of processing, the pipelined processors can be classified as: 1.Arithmetic Pipelining 2.Instruction Pipelining 3.Processor Pipelining

Arithmetic Pipelining The arithmetic logic units of a computer can be segmented for pipelined operations in various data formats. Example : Star 100

Arithmetic Pipelining

Instruction Pipelining The execution of a stream of instructions can be pipelined by overlapping the execution of current instruction with the fetch, decode and operand fetch of the subsequent instructions It is also called instruction look-ahead

Processor Pipelining This refers to the processing of same data stream by a cascade of processors each of which processes a specific task The data stream passes the first processor with results stored in a memory block which is also accessible by the second processor The second processor then passes the refined results to the third and so on.

Processor Pipelining

Li and Ramamurthy's Classification According to pipeline configurations and control strategies, Li and Ramamurthy classify pipelines under three schemes – Unifunction v/s Multi-function Pipelines – Static v/s Dynamic Pipelines – Scalar v/s Vector Pipelines

Uni-function v/s Multi-function Pipelines

Unifunctional Pipelines A pipeline unit with fixed and dedicated function is called unifunctional. Example: CRAY1 (Supercomputer ) It has 12 unifunctional pipelines described in four groups: – Address Functional Units: Address Add Unit Address Multiply Unit

Unifunctional Pipelines – Scalar Functional Units Scalar Add Unit Scalar Shift Unit Scalar Logical Unit Population/Leading Zero Count Unit – Vector Functional Units Vector Add Unit Vector Shift Unit Vector Logical Unit

Unifunctional Pipelines – Floating Point Functional Units Floating Point Add Unit Floating Point Multiply Unit Reciprocal Approximation Unit

Multifunctional A multifunction pipe may perform different functions either at different times or same time, by interconnecting different subset of stages in pipeline. Example 4X-TI-ASC (Supercomputer )

Static Vs Dynamic Pipeline

Static Pipeline It may assume only one functional configuration at a time Static pipelines are preferred when instructions of same type are to be executed continuously A unifunction pipe must be static.

Dynamic pipeline It permits several functional configurations to exist simultaneously A dynamic pipeline must be multi-functional The dynamic configuration requires more elaborate control and sequencing mechanisms than static pipelining

Scalar Vs Vector Pipeline

Scalar Pipeline It processes a sequence of scalar operands under the control of a DO loop Instructions in a small DO loop are often prefetched into the instruction buffer. The required scalar operands are moved into a data cache to continuously supply the pipeline with operands Example: IBM System/360 Model 91

Vector Pipelines They are specially designed to handle vector instructions over vector operands. Computers having vector instructions are called vector processors. The design of a vector pipeline is expanded from that of a scalar pipeline. The handling of vector operands in vector pipelines is under firmware and hardware control. Example : Cray 1

Point no 3 Generalized Pipeline and Reservation Table

3 stage non-linear pipeline It has 3 stages Sa, Sb and Sc and latches. Multiplexers(cross circles) can take more than one input and pass one of the inputs to output Output of stages has been tapped and used for feedback and feed-forward. Sa SbSc Input Output B Output A

3 stage non-linear pipeline The above pipeline can perform a variety of functions. Each functional evaluation can be represented by a particular sequence of usage of stages. Some examples are: 1.Sa, Sb, Sc 2.Sa, Sb, Sc, Sb, Sc, Sa 3.Sa, Sc, Sb, Sa, Sb, Sc

Reservation Table Each functional evaluation can be represented using a diagram called Reservation Table(RT). It is the space-time diagram of a pipeline corresponding to one functional evaluation. X axis – time units Y axis – stages

Reservation Table For first sequence Sa, Sb, Sc, Sb, Sc, Sa called function A, we have SaAA SbAA ScAA

Reservation Table For second sequence Sa, Sc, Sb, Sa, Sb, Sc called function B, we have SaBB SbBB ScBB

3 stage non-linear pipeline Output A Output B Sa SbSc Input Reservation Table Time  Stage  Sa Sb Sc

Function A

3 stage pipeline : Sa, Sb, Sc, Sb, Sc, SaSa SbSc Input Output B Output A Reservation Table Time  Stage  SaA Sb Sc

3 stage pipeline : Sa, Sb, Sc, Sb, Sc, SaSa SbSc Input Output B Output A Reservation Table Time  Stage  SaA SbA Sc

3 stage pipeline : Sa, Sb, Sc, Sb, Sc, SaSa SbSc Input Output B Output A Reservation Table Time  Stage  SaA SbA ScA

3 stage pipeline : Sa, Sb, Sc, Sb, Sc, SaSa SbSc Input Output B Output A Reservation Table Time  Stage  SaA SbAA ScA

3 stage pipeline : Sa, Sb, Sc, Sb, Sc, SaSa SbSc Input Output B Output A Reservation Table Time  Stage  SaA SbAA ScAA

3 stage pipeline : Sa, Sb, Sc, Sb, Sc, SaSa SbSc Input Output B Output A Reservation Table Time  Stage  SaAA SbAA ScAA

Function B

3 stage pipeline: Sa, Sc, Sb, Sa, Sb, ScSa SbSc Input Output B Output A Reservation Table Time  Stage  SaB Sb Sc

3 stage pipeline: Sa, Sc, Sb, Sa, Sb, ScSa SbSc Input Output B Output A Reservation Table Time  Stage  SaB Sb ScB

3 stage pipeline: Sa, Sc, Sb, Sa, Sb, ScSa SbSc Input Output B Output A Reservation Table Time  Stage  SaB SbB ScB

3 stage pipeline: Sa, Sc, Sb, Sa, Sb, ScSa SbSc Input Output B Output A Reservation Table Time  Stage  SaBB SbB ScB

3 stage pipeline: Sa, Sc, Sb, Sa, Sb, ScSa SbSc Input Output B Output A Reservation Table Time  Stage  SaBB SbBB ScB

3 stage pipeline: Sa, Sc, Sb, Sa, Sb, ScSa SbSc Input Output B Output A Reservation Table Time  Stage  SaBB SbBB ScBB

Reservation Table After starting a function, the stages need to be reserved in corresponding time units. Each function supported by multifunction pipeline is represented by different RTs Time taken for function evaluation in units of clock period is compute time.(For A & B, it is 6)

Reservation Table Marking in same row => usage of stage more than once Marking in same column => more than one stage at a time