Speedup Speedup is defined as Speedup = Time taken for a given computation by a non-pipelined functional unit Time taken for the same computation by a.

Slides:



Advertisements
Similar presentations
PIPELINING AND VECTOR PROCESSING
Advertisements

PIPELINE AND VECTOR PROCESSING
Lecture 11 Oct 12 Circuits for floating-point operations addition multiplication division (only sketchy)
Chapter 3 Pipelining. 3.1 Pipeline Model n Terminology –task –subtask –stage –staging register n Total processing time for each task. –T pl =, where t.
Chapter 3 Arithmetic for Computers. Multiplication More complicated than addition accomplished via shifting and addition More time and more area Let's.
Computer ArchitectureFall 2008 © August 25, CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (1)
Floating Point Numbers
1 ECE369 Chapter 3. 2 ECE369 Multiplication More complicated than addition –Accomplished via shifting and addition More time and more area.
1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution.
5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.
Computer Organization and Architecture Computer Arithmetic Chapter 9.
Computer Arithmetic Nizamettin AYDIN
Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.
Fixed-Point Arithmetics: Part II
Fundamental of Computer Architecture By Panyayot Chaikan November 01, 2003.
Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.
07/19/2005 Arithmetic / Logic Unit – ALU Design Presentation F CSE : Introduction to Computer Architecture Slides by Gojko Babić.
9.4 FLOATING-POINT REPRESENTATION
ECEG-3202: Computer Architecture and Organization, Dept of ECE, AAU 1 Floating-Point Arithmetic Operations.
PIPELINING AND VECTOR PROCESSING
CSC 221 Computer Organization and Assembly Language
Computer Architecture, The Arithmetic/Logic UnitSlide 1 Part III The Arithmetic/Logic Unit.
Speeding up of pipeline segments © Fr Dr Jaison Mulerikkal CMI.
1 Number Systems Lecture 10 Digital Design and Computer Architecture Harris & Harris Morgan Kaufmann / Elsevier, 2007.
Principles of Linear Pipelining
1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution.
Computer Architecture Lecture 22 Fasih ur Rehman.
Computer Architecture Lecture 32 Fasih ur Rehman.
Principles of Linear Pipelining. In pipelining, we divide a task into set of subtasks. The precedence relation of a set of subtasks {T 1, T 2,…, T k }
Chapter One Introduction to Pipelined Processors
CSCI 6307 Foundation of Systems Review: Midterm Exam Xiang Lian The University of Texas – Pan American Edinburg, TX 78539
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.
Short Cuts for Multiply and Divide For Positive Numbers 1. Multiply by 2 k is the same as shift k to the left, 0 fill 2. Divide by 2 k is the same as.
Computer Architecture Lecture 11 Arithmetic Ralph Grishman Oct NYU.
Chapter One Introduction to Pipelined Processors
Representation of Data (Part II) Computer Studies Notes: chapter 19 Ma King Man.
Principles of Linear Pipelining
Chapter One Introduction to Pipelined Processors.
By Wannarat Computer System Design Lecture 3 Wannarat Suntiamorntut.
Floating Point Arithmetic – Part I
Chapter 9 Computer Arithmetic
William Stallings Computer Organization and Architecture 8th Edition
Floating Point Representations
UNIT-V PIPELINING & VECTOR PROCESSING.
Computer System Design Lecture 3
Integer Division.
NxN Crossbar design for Barrel Shifter
Pipelining.
William Stallings Computer Organization and Architecture 7th Edition
Pipelining.
Pipelining and Vector Processing
Data Representation and Arithmetic Algorithms
Arithmetic Logical Unit
ECEG-3202 Computer Architecture and Organization
Overview Parallel Processing Pipelining
Data Representation and Arithmetic Algorithms
Chapter 8 Computer Arithmetic
COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING
Extra Reading Data-Instruction Stream : Flynn
ADSP 21065L.
Presentation transcript:

Speedup Speedup is defined as Speedup = Time taken for a given computation by a non-pipelined functional unit Time taken for the same computation by a pipelined version Assume a function of k stages of equal complexity which takes the same amount of time T. Non-pipelined function will take kT time for one input. Then Speedup = nkT/(k+n-1)T = nk/(k+n-1)

Speed-up For e.g., if a pipeline has 4 stages and 5 inputs, its speedup factor is Speedup = ? The maximum value of speedup is Lt [Speedup] = ? n  ∞

Speed-up The maximum value of speedup is Lt [Speedup] = k n  ∞

Efficiency It is an indicator of how efficiently the resources of the pipeline are used. If a stage is available during a clock period, then its availability becomes the unit of resource. Efficiency can be defined as

Efficiency No. of stage time units = nk – there are n inputs and each input uses k stages. Total no. of stage-time units available = k[ k + (n-1)] – It is the product of no. of stages in the pipeline (k) and no. of clock periods taken for computation(k+(n-1)).

Efficiency Thus efficiency is expressed as follows: The maximum value of efficiency is

Efficiency Efficiency is minimum when n = 1. Minimum value of Efficiency = ? For k = 4 and n = 5, Efficiency = ?

Throughput It is the average number of results computed per unit time. For n inputs, a k-staged pipeline takes [k+(n-1)]T time units Then, Throughput = n / [k+n-1] T = nf / [k+n-1] where f is the clock frequency

Throughput The maximum value of throughput is Lt [Throughput] = ? n  ∞

Throughput The maximum value of throughput is Lt [Throughput] = f n  ∞ Throughput = Efficiency x Frequency

Problem Consider the execution of a program of instructions by a linear pipeline processor with a clock rate of 25MHz. Assume that the instruction pipeline has 5 stages and that one instruction is issued per clock cycle. The penalties due to branch instructions and out-of-sequence executions are ignored a)Calculate the speedup factor as compared with non-pipelined processor b)What are the efficiency and throughput of this pipelined processor

Example : Floating Point Adder Unit

Floating Point Adder Unit This pipeline is linearly constructed with 4 functional stages. The inputs to this pipeline are two normalized floating point numbers of the form A = a x 2 p B = b x 2 q where a and b are two fractions and p and q are their exponents. For simplicity, base 2 is assumed

Floating Point Adder Unit Our purpose is to compute the sum C = A + B = c x 2 r = d x 2 s where r = max(p,q) and 0.5 ≤ d < 1 For example: A= x 10 3 B= x 10 2 a = b= p=3 & q =2

Floating Point Adder Unit Operations performed in the four pipeline stages are : 1.Compare p and q and choose the largest exponent, r = max(p,q)and compute t = |p – q| Example: r = max(p, q) = 3 t = |p-q| = |3-2|= 1

Floating Point Adder Unit 2.Shift right the fraction associated with the smaller exponent by t units to equalize the two exponents before fraction addition. Example: Smaller exponent, b= Shift right b by 1 unit is 0.082

Floating Point Adder Unit 3.Perform fixed-point addition of two fractions to produce the intermediate sum fraction c, where 0 ≤ c < 1 Example : a = b= c = a + b = =

Floating Point Adder Unit 4.Count the number of leading zeros (u) in fraction c and shift left c by u units to produce the normalized fraction sum d = c x 2 u, with a leading bit 1. Update the large exponent s by subtracting s = r – u to produce the output exponent. Example: c = , u = -1  right shift d = , s= r – u = 3-(-1) = 4 C = x 10 4

Floating Point Adder Unit The above 4 steps can all be implemented with combinational logic circuits and the 4 stages are: 1.Comparator / Subtractor 2.Shifter 3.Fixed Point Adder 4.Normalizer (leading zero counter and shifter)

4-STAGE FLOATING POINT ADDER

Example for floating-point adder Exponents Segment 1: Segment 2: Segment 3: Segment 4: RR R R R R R R Adjust exponent Normalize result Add mantissas Align mantissas Choose exponent Compare exponents by subtraction Difference=3-2=1 Mantissas baAB For example: X=0.9504*10 3 Y=0.8200* S= =

Classification of Pipeline Processors There are various classification schemes for classifying pipeline processors. Two important schemes are 1.Handler’s Classification 2.Li and Ramamurthy's Classification

Handler’s Classification Based on the level of processing, the pipelined processors can be classified as: 1.Arithmetic Pipelining 2.Instruction Pipelining 3.Processor Pipelining

Arithmetic Pipelining The arithmetic logic units of a computer can be segmented for pipelined operations in various data formats. Example : Star 100

Arithmetic Pipelining

Example : Star 100 – It has two pipelines where arithmetic operations are performed – First: Floating Point Adder and Multiplier – Second : Multifunctional : For all scalar instructions with floating point adder, multiplier and divider. – Both pipelines are 64-bit and can be split into four 32-bit at the cost of precision

Star 100 Architecture