Pipeline Design Problems

Slides:



Advertisements
Similar presentations
CS5365 Pipelining. Divide task into a sequence of subtasks. Each subtask is executed by a stage (segment)of the pipe.
Advertisements

Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
CSE Advanced Computer Architecture Week-4 Week of Feb 2, 2004 engr.smu.edu/~rewini/8383.
Chapter 3 Pipelining. 3.1 Pipeline Model n Terminology –task –subtask –stage –staging register n Total processing time for each task. –T pl =, where t.
© 2006 Pearson Addison-Wesley. All rights reserved14 A-1 Chapter 14 excerpts Graphs (breadth-first-search)
Constant-Time LCA Retrieval
June 3, 2015Windows Scheduling Problems for Broadcast System 1 Amotz Bar-Noy, and Richard E. Ladner Presented by Qiaosheng Shi.
Module #1 - Logic 1 Based on Rosen, Discrete Mathematics & Its Applications. Prepared by (c) , Michael P. Frank and Modified By Mingwu Chen Trees.
Copyright © Cengage Learning. All rights reserved. CHAPTER 2 THE LOGIC OF COMPOUND STATEMENTS THE LOGIC OF COMPOUND STATEMENTS.
© 2006 Pearson Addison-Wesley. All rights reserved14 A-1 Chapter 14 Graphs.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 3.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
Anshul Kumar, CSE IITD CSL718 : Pipelined Processors  Types of Pipelines  Types of Hazards 16th Jan, 2006.
Chapter One Introduction to Pipelined Processors.
1 Dr. Ali Amiri TCOM 5143 Lecture 8 Capacity Assignment in Centralized Networks.
Multiple-Cycle Hardwired Control Digital Logic Design Instructor: Kasım Sinan YILDIRIM.
Agenda Review: –Planar Graphs Lecture Content:  Concepts of Trees  Spanning Trees  Binary Trees Exercise.
 Lecture 2 Processor Organization  Control needs to have the  Ability to fetch instructions from memory  Logic and means to control instruction sequencing.
Principles of Linear Pipelining. In pipelining, we divide a task into set of subtasks. The precedence relation of a set of subtasks {T 1, T 2,…, T k }
Chapter One Introduction to Pipelined Processors
Multiplying Decimals © Math As A Second Language All Rights Reserved next #8 Taking the Fear out of Math 8.25 × 3.5.
Chapter 9. Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Ver Chapter 13: Graphs Data Abstraction & Problem Solving with C++
© 2006 Pearson Addison-Wesley. All rights reserved 14 A-1 Chapter 14 Graphs.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February 2, 2006 Session 6.
Chapter One Introduction to Pipelined Processors
De Bruijn sequences 陳柏澍 Novembers Each of the segments is one of two types, denoted by 0 and 1. Any four consecutive segments uniquely determine.
SCHOOL OF ENGINEERING Introduction to Electrical and Electronic Engineering Part 2 Pr. Nazim Mir-Nasiri and Pr. Alexander Ruderman.
Business System Development
Relations Chapter 9.
Graphs: Definitions and Basic Properties
Control Flow Testing Handouts
Solving Systems of Equations in Two Variables; Applications
ANALYSIS OF SEQUENTIAL CIRCUITS
Greedy Technique.
Clocks A clock is a free-running signal with a cycle time.
Chap 7. Register Transfers and Datapaths
Modeling Arithmetic, Computation, and Languages
COMPUTING FUNDAMENTALS
Introduction to Pipelined Processors
Chapter One Introduction to Pipelined Processors
Amortized Analysis The problem domains vary widely, so this approach is not tied to any single data structure The goal is to guarantee the average performance.
Lecture 2 Introduction to Programming
Outline of the Chapter Basic Idea Outline of Control Flow Testing
AVL Tree.
Structural testing, Path Testing
Chapter One Introduction to Pipelined Processors
Chapter One Introduction to Pipelined Processors
S Digital Communication Systems
Registers and Counters Register : A Group of Flip-Flops. N-Bit Register has N flip-flops. Each flip-flop stores 1-Bit Information. So N-Bit Register Stores.
Computer Network Performance Measures
Unit# 9: Computer Program Development
Subject Name: Information Theory Coding Subject Code: 10EC55
College Algebra Fifth Edition
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
If a DRAM has 512 rows and its refresh time is 9ms, what should be the frequency of row refresh operation on the average?
Chapter 15 Graphs © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Chapter 14 Graphs © 2006 Pearson Addison-Wesley. All rights reserved.
ECE 352 Digital System Fundamentals
Copyright © Cengage Learning. All rights reserved.
AVL-Trees.
Chapter 14 Graphs © 2011 Pearson Addison-Wesley. All rights reserved.
EGR 2131 Unit 12 Synchronous Sequential Circuits
C H A P T E R 11 A.C. Fundamentals.
ECE 352 Digital System Fundamentals
Error Correction Coding
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Pipelining and Superscalar Techniques
Algorithms Tutorial 27th Sept, 2019.
Presentation transcript:

Pipeline Design Problems

Job Sequencing and Collision Prevention for the Design of Static Pipeline

Job Sequencing and Collision Prevention Consider reservation table given below at t=0   1 2 3 4 5 Sa A Sb Sc

Job Sequencing and Collision Prevention Consider next initiation made at t=1 The second initiation easily fits in the reservation table   1 2 3 4 5 6 7 Sa A1 A2 Sb Sc

Job Sequencing and Collision Prevention Now consider the case when first initiation is made at t = 0 and second at t = 2. Here both markings A1 and A2 falls in the same stage time units and is called collision and it must be avoided   1 2 3 4 5 6 7 Sa A1 A2 Sb A1A2 Sc

Terminologies

Terminologies Latency: Time difference between two initiations in units of clock period Forbidden Latency: Latencies resulting in collision Forbidden Latency Set: Set of all forbidden latencies

General Method of finding Latency Considering all initiations: Forbidden Latencies are 2 and 5   1 2 3 4 5 6 7 8 9 10 Sa A1 A2 A3 A4 A5 A6A1 A6 Sb A1A3 A2A4 A3A5 A4A6 Sc

Shortcut Method of finding Latency Forbidden Latency Set = {0,5} U {0,2} U {0,2} = { 0, 2, 5 }

Terminologies Initiation Sequence : Sequence of time units at which initiation can be made without causing collision Example : { 0,1,3,4 ….} Latency Sequence : Sequence of latencies between successive initiations Example : { 1,2,1….} For a RT, number of valid initiations and latencies are infinite

Terminologies Initiation Rate : The average number of initiations done per unit time It is a positive fraction and maximum value of IR is 1 Average Latency : The average of latency of a given latency sequence IR = 1/AL

Terminologies Latency Cycle: Among the infinite possible latency sequence, the periodic ones are significant. E.g. { 2, 3, 4, 2, 3, 4,… } The subsequence that repeats itself is called latency cycle. E.g. {2, 3, 4}

Terminologies Period of cycle: The sum of latencies in a latency cycle (2+3+4=9) Average Latency: The average taken over its latency cycle (AL=9/3=3) To design a pipeline, we need a control strategy that maximize the throughput (no. of results per unit time) Maximizing throughput is minimizing AL

Terminologies Control Strategy Initiate pipeline as specified by latency sequence. Latency sequence which is aperiodic in nature is impossible to design Thus design problem is arriving at a latency cycle having minimal average latency.

Terminologies Stage Utilization Factor (SUF): SUF of a particular stage is the fraction of time units the stage used while following a latency sequence. Example: Consider 5 initiations of function A as below   1 2 3 4 5 6 7 8 9 10 11 12 13 Sa A1 A2 A3 A4 A5 Sb Sc

Terminologies SUF of stage Sa is number of markings present along Sa divided by the time interval over which marking is counted. SUF(Sa) = SUF(Sb) = SUF(Sc) = 10/14

Terminologies Let SU(i) be the stage utilization factor of stage i Let N(i) be no. of markings against stage i in the reservation table Suppose we initiate pipeline with initiation rate (IR), then SU(i) is given by

SUF

Terminologies Minimum Average Latency (MAL) Thus SU(i) = IR x N(i) N(i) ≤ 1/IR  N(i) ≤ AL Therefore

State Diagram Suppose a pipeline is initially empty and make an initiation at t = 0. Now we need to check whether an initiation possible at t=i for i > 0. bi is used to note possibility of initiation bi = 1  initiation not possible bi = 0  initiation possible

State Diagram bi 1 0 1 0 0 1

State Diagram The above binary representation (binary vector) is called collision vector(CV) The collision vector obtained made at first initiation is called initial collision vector(ICV) ICVA = (101001) The graphical representation of states (CVs) that a pipeline can reach and the relation is given by state diagram

State Diagram States (CVs) are denoted by nodes The node representing CVt-1 is connected to CVt by a directed graph from CVt-1 to CVt and similarly for CVt* with a * on arc

Procedure to draw state diagram Start with ICV For each unprocessed state, say CVt-1, do as follows: Find CVt from CVt-1 by the following steps Left shift CVt-1 by 1 bit Drop the leftmost bit Append the bit 0 at the right-hand end

Procedure to draw state diagram If the 0th bit of CVt is 0, then obtain CV* by logically ORing CVt with ICV. Make a new node for CVt and join with CVt-1 with an arc if the state CVt does not already exist. If CV* exists, repeat step (c), but mark the arc with a *.

State Diagram 1 0 1 0 0 1

State Diagram 1 0 1 0 0 1 0 1 0 0 1 0 Left Shift

State Diagram 1 0 1 0 0 1 0 1 0 0 1 0 Zero  CV* exists

State Diagram 1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 * ICV – 101001 OR CVi – 010010 CV* 111011

State Diagram 1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 * 1 0 0 1 0 0 No CV* 1 1 0 1 1 0 Left Shift

State Diagram No CV* Left Shift 1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 * ICV – 101001 OR CVi – 001000 CV* 101001 1 0 0 1 0 0 1 1 0 1 1 0 0 0 1 0 0 0 Zero  CV* exists 1 0 1 1 0 0

State Diagram 1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 * ICV – 101001 CVi – 010000 CV* 111001 1 0 0 1 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 1 0 0 1 1 1 0 0 1 Zero  CV* exists

1 0 0 1 0 0 ICV – 101001 CVi – 011000 CV* 111001 0 1 0 0 1 0 * 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 0 1 1 0 1 0 1 1 0 0 0 1 1 0 0 0 Zero  CV* exists

1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 * 1 0 0 1 0 0 1 1 0 1 1 0 No CV* *

* 0 1 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 1 0 1 1 0 1 1 1 0 0 1 1 0 1 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 1 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 0 0 No CV*

1 1 0 0 0 0 * 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 1 1 0 1 1 0 *

* 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 0 1 1 0 1 1 0 0 0 1 0 0 0 1 0 1 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 1 1 0 0 0 0 1 0 0 0 0 0

1 0 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 * 1 1 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0

1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 * 1 0 0 1 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0

State Diagram From the above diagram, closed loops can be identified as latency cycles. To find the latency corresponding to a loop, start with any initial * count the number of states before we encounter another * and reach back to initial *.

1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 * 1 0 0 1 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 Latency = (3)

1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 * 1 0 0 1 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 Latency = (1,3,3)

1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 * 1 0 0 1 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 Latency = (4,3)

1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 * 1 0 0 1 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 Latency = (1,6)

1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 * 1 0 0 1 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 Latency = (1,7)

1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 * 1 0 0 1 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 Latency = (4)

1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 * 1 0 0 1 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 Latency = (6)

1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 * 1 0 0 1 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 Latency = (7)

State Diagram The state with all zeros has a self-loop which corresponds to empty pipeline and it is possible to wait for indefinite number of latency cycles of the form (1,8), (1,9),(1,10) etc. Simple Cycle: latency cycle in which each state is encountered only once. Complex Cycle: consists of more than one simple cycle in it. It is enough to look for simple cycles

State Diagram In the above example, the cycle that offers MAL is (1, 3, 3) From A cycle arrived so is called greedy cycle, which minimize latency between successive initiation

Modified State Diagram The state diagram becomes cumbersome for longer ICVs. In modified state diagrams, we represent only states obtained of initiations.

Modified State Diagram The procedure is as follows: Start with the ICV For each unprocessed state, For each bit I in the CVi which is 0, do the following: Shift CVi left by i bits Drop i leftmost bits

Modified State Diagram Append zeros to right Logically OR with ICV If step(d) results in a new state then form a new node for this state and join it with node of CVi by an arc with a marking i. Join this new node with node of ICV with an arc having the marking ≥ d (length of ICV)

Modified State Diagram 1 0 1 0 0 1

Modified State Diagram 1 0 1 0 0 1 1 1 1 0 1 1 ICV – 101001 OR CVi – 010010 CV* 111011 i =1 1

Modified State Diagram 1 0 1 0 0 1 1 1 1 0 1 1 ≥6 1

Modified State Diagram 1 0 1 0 0 1 1 1 1 0 1 1 i = 3 ≥6 1 ICV – 101001 OR CVi – 001000 CV* 101001

Modified State Diagram 3 1 0 1 0 0 1 1 1 1 0 1 1 i = 3 ≥6 1

Modified State Diagram 3 1 0 1 0 0 1 1 1 1 0 1 1 i = 4 ≥6 1 ICV – 101001 OR CVi – 010000 CV* 111001

Modified State Diagram 3 1 0 1 0 0 1 1 1 1 0 1 1 ≥6 4 1 1 1 1 0 0 1 ICV – 101001 OR CVi – 010000 CV* 111001

Modified State Diagram 3 1 0 1 0 0 1 1 1 1 0 1 1 ≥6 ≥6 4 1 1 1 1 0 0 1

Modified State Diagram 3 ≥6 1 0 1 0 0 1 1 1 1 0 1 1 ≥6 ≥6 4 1 1 1 1 0 0 1

Modified State Diagram 3 ≥6 1 0 1 0 0 1 1 1 1 0 1 1 ≥6 ≥6 4 1 1 1 1 0 0 1 i = 3 ICV – 101001 OR CVi – 011000 CV* 111001

Modified State Diagram 3 ≥6 1 0 1 0 0 1 1 1 1 0 1 1 ≥6 ≥6 4 1 3 1 1 1 0 0 1

Modified State Diagram 3 ≥6 1 0 1 0 0 1 1 1 1 0 1 1 ≥6 ≥6 4 1 3 1 1 1 0 0 1 i = 3 ICV – 101001 OR CVi – 001000 CV* 101001

Modified State Diagram 3 ≥6 1 0 1 0 0 1 1 1 1 0 1 1 ≥6 ≥6 4 3 1 3 1 1 1 0 0 1

Modified State Diagram 3 ≥6 1 0 1 0 0 1 1 1 1 0 1 1 ≥6 ≥6 4 3 1 3 1 1 1 0 0 1 i = 4 ICV – 101001 OR CVi – 010000 CV* 111001

Modified State Diagram 3 ≥6 1 0 1 0 0 1 1 1 1 0 1 1 ≥6 ≥6 4 3 1 3 1 1 1 0 0 1 4

Dynamic Pipeline and Reconfigurability Two methods to improve the throughput of dynamic pipeline: Insertion of non-compute delays Use of Internal Buffers

End