Introduction  The speed of execution of program is influenced by many factors. i) One way is to build faster circuit technology to build the processor.

Slides:



Advertisements
Similar presentations
Lecture 4: CPU Performance
Advertisements

Pipelining (Week 8).
CSCI 4717/5717 Computer Architecture
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Pipeline and Vector Processing (Chapter2 and Appendix A)
Chapter 8. Pipelining.
Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.
Chapter Six 1.
Instruction-Level Parallelism (ILP)
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
COMP381 by M. Hamdi 1 Pipeline Hazards. COMP381 by M. Hamdi 2 Pipeline Hazards Hazards are situations in pipelining where one instruction cannot immediately.
Chapter 12 Pipelining Strategies Performance Hazards.
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
DLX Instruction Format
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Appendix A Pipelining: Basic and Intermediate Concepts
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Pipelining By Toan Nguyen.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.
Parallelism Processing more than one instruction at a time. Pipelining
-1.1- PIPELINING 2 nd week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PIPELINING 2 nd week References Pipelining concepts The DLX.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Edited By Miss Sarwat Iqbal (FUUAST) Last updated:21/1/13
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
9.2 Pipelining Suppose we want to perform the combined multiply and add operations with a stream of numbers: A i * B i + C i for i =1,2,3,…,7.
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.
Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Processor Architecture
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
Pipelining Example Laundry Example: Three Stages
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
Introduction to Computer Organization Pipelining.
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
Chapter One Introduction to Pipelined Processors.
Real-World Pipelines Idea –Divide process into independent stages –Move objects through stages in sequence –At any given times, multiple objects being.
DICCD Class-08. Parallel processing A parallel processing system is able to perform concurrent data processing to achieve faster execution time The system.
Computer Architecture Chapter (14): Processor Structure and Function
Computer Organization
Pipelining: Hazards Ver. Jan 14, 2014
ARM Organization and Implementation
Pipelining Chapter 6.
William Stallings Computer Organization and Architecture 8th Edition
Pipeline Implementation (4.6)
CDA 3101 Spring 2016 Introduction to Computer Organization
\course\cpeg323-08F\Topic6b-323
Morgan Kaufmann Publishers The Processor
Pipelining and Vector Processing
Lecture 6: Advanced Pipelines
Pipelining Chapter 6.
\course\cpeg323-05F\Topic6b-323
Daxia Ge Friday February 9th, 2007
Instruction Execution Cycle
Chapter 8. Pipelining.
Throughput = #instructions per unit time (seconds/cycles etc.)
Pipelining Hazards.
Presentation transcript:

Introduction  The speed of execution of program is influenced by many factors. i) One way is to build faster circuit technology to build the processor and the main memory. ii) another way is to arrange the hardware so that more than one operation can be performed at the same time. In this way, the number of operations performed per second, is increased even though the elapsed time needed to perform any one operation is not changed.

What Is Pipelining ?  Pipelining is a key implementation technique whereby multiple instructions are overlapped in execution.  Pipelining offers an economical way to realize temporal parallelism in digital computers.  Pipelining has led to the tremendous improvement of system throughput in digital computer.

Pipelining-contd.  In this technique a sequential process is decomposed into sub operations - called segments.  Each segment performs partial processing. The result obtained from one segment is transferred to the next segment.

Implementation of Pipelining  Temporal parallelism is obtained in execution of instructions.  It is an important method to increase the speed of PE(processing element).  But this is an effective method of increasing the speed provided some ideal conditions are satisfied.

Ideal conditions for pipelining  Successive instructions are such that the work done during the execution of an instruction can be effectively used by the next and successive instructions.  Successive instructions are independent of one another.  Sufficient resources are available in a processor so that if a resource is required by successive instructions in the pipeline, it is readily available.

 Tasks are broken into number of independent sub-tasks.Those sub-tasks should take nearly equal time to execute.  There should be locality in instruction execution i.e.instructions are executed in sequence one after other in the order in which they are written. But if instructions are not in sequence rather involving many branches and jump instructions –then pipelining is not effective.  This technique is efficient for those applications that need to repeat the same task.

 Practically it is not possible to breakup all tasks consuming exactly same time. For example execution of floating point division will take much more time than say decoding an instruction. However, delays are introduced (if required ), so that different parts of an instruction will take equal amount of time.  All real programs have branch instructions which disturb the sequential execution of instructions. But statistical studies show that probability of encountering branches is less than 17%. Practical Situation

 Successive instructions are not always independent. The results produced by an instruction may be required by the instruction and must be made available at the right time.  There are always resource conflicts in the processor due to limitation of chip size.It is not cost effective to duplicate some resources indiscriminately.for example it is not useful to have more than two floating point arithmetic units.

Example: X i * Y i + C i for i=1,2,…7 A i B i C i R1 R2 Multiplier R3 R4 Adder R5 seg1 seg2 seg3

clk R1R2 A1B1 A2B2 A3B3 R3R4 - - A1*B1C1 A2*B2C2 R5 - - A1*B1+C A4B4 A5B5 A6B6 A7B7 A3*B3C3 A4*B4C4 A5*B5C5 A6*B6C6 A2*B2+C2 A3*B3+C3 A4*B4+C4 A5*B5+C5 seg1 seg2 seg3

clk 8 9 R1R2 R3R4 A7*B7C7 - - R5 A6*B6+C6 A7*B7+C7 -

Space-time diagram clk1234 seg1 T1T2T3T4 2 T1T2T3 3 T1T T5T6T7 T4T5T6T7 T3T4T5T6T7

Hardware Organization  The simplest way of viewing the basic pipeline structure is to imagine that each segment consist of an input register and a combinational circuit.  The register holds the data and the combinational circuit performs the sub processes in the particular segment. The output of the combinational circuit in a given segment is connected to the input register of the next segment.

Instruction fetch unit Execution unit Interstage Buffer (b) Hardware Organization

Let a computer having two separate hardware units, one for fetching instructions and another for executing them. The instructions fetched by the fetch unit is deposited in an intermediate storage buffer, B1. this buffer is needed to enable the execution unit to execute the instruction while the fetch unit is fetching the next instruction.

Let F¡ and E¡ refer to the fetch and execute steps for instruction I¡. Execution of program consists of a sequence of fetch and execute steps. F1F1E1E1F2F2E2E2F3F3E3E3F4F4E4E4 (a) Sequential execution time I1 I2 I3 I4

clock cycle F1F1E1E1 Instruction I1I1 I2I2 F2F2E2E2 I3I3 F3F3E3E3 time (c) Pipelined Execution

Space –time diagram  The behaviour of a pipeline can be illustrated using space-time diagram.  In this diagram segment utilization is shown as a function of time.  Space-time diagram for 4 segment p.l. is shown in fig.

Pipelining Hazards  DEFINITION –There are some situations which opposes the parallel execution of subtasks within the pipe and are called pipelining hazards.  CLASSIFICATION –There are 3 classes of p.l. hazards Structural Hazards Data Hazards Control Hazards

Structural Hazards Basic reason : Resource Conflict  Definition: When a machine is pipelined, overlapped execution of instruction requires more number of resources concurrently, which may not be available to support the pipeline due to resource conflict, and then the machine is said to have structural hazards.

Contd.  If some resources are not been duplicated enough to allow all combinations of instructions in the pipeline to execute concurrently that resource –then that may cause some structural hazard. –For example : suppose a system is having single- memory pipeline for data and instructions –thus if an instruction contains a data-memory reference, there will be a conflict of resource and hence it may face hazard.

 Solution: The common solution is to stall the pipeline for one clock cycle when the data memory access occurs. Stall : It is nothing but a pipeline bubble or just bubble –it floats through the pipeline but carrying no useful work.

Data Hazards  Definition: Data hazards are hazards in which an instruction modifies a register or memory location, and a succeeding instruction attempts to access the data in that register or memory location before the register or memory location has actually been updated.  There are three types of data hazards: i) RAW(Read After Write): j tries to read before I writes-so j incorrectly gets the old values.

ii)WAW ( Write After Write):j tries to write an operand before it is written by I- i.e.writes end up being performed in the wrong order, leaving the value written by I rather than j in the destination. iii) WAR (Write After Read ) :j tries to write a destination before it is by I., I incorrectly gets the new value- this never happens here because here all reads are early (ID) and all writes are late (in WB). Example : ADD R1, R2,R3 SUB R4, R1,R5 AND R6, R1, R7 Here ADD instruction writes the sum of contents of reg R2 and R3 into the reg R1 the SUB instruction uses the result of addition in reg R1

Now ADD instruction will not modify (update) R1 until the end of stage 5, while the SUB instruction needs the value in stage 2. Hence there will be data hazard (RAW)

What is the solution?  Special hardware is required to detect the hazard and separation required between two instructions to avoid the hazard- then we can put stalls accordingly. But SUB is to be stopped at stage 1 until ADD has completed stage5- hence not at all desirable solution.

Contd.  Data forwarding : If the result can be moved from where ADD produces it, the EX/MEM reg, to where the SUB needs it. If the forwarding hardware detects that the previous ALU operation has written the register corresponding to a source for the current ALU operation, control logic selects the forwarded result as the ALU input rather than the value read from the reg. file.

Performance issues of pipelining  Pipelining increases the CPU instruction throughput. Instruction throughput: The number of instructions completed per unit time.  Pipelining does not reduce the execution time of each instruction due to overhead in the control of pipelining.

Pipelined instruction processing in pentium  Pentium has five pipelines Integer Pipeline (5 stage) FP pipeline (8stage ) Load pipeline (5stage) Store pipeline (4stages) Branch pipeline( 5stages )  For all pipelines two stages are common: Instruction fetch (F) First decode (D1) After D1, Instructions are decoded and either a single control word (for simple operation ) is generated or a sequence of control words are initiated. These control words are decoded in the D2 stage of the PL

i t t+1 t+2 t+3 t+5 IM RF ALU RF DM IM RF ALU RF DM IM RF ALU RF DM IM RF ALU RF DM

 Six tasks: T1,T2,T3,T4,T5,T6  During 1 st clk : segment 1 is busy with T1  During 2 nd clk : segment 1 is busy with T2 segment 2 is busy with T1 ……..

Speedup

What is Hazard?  Any condition that causes the pipelining to stall is called hazard. There are situations that prevent the next instruction in the instruction stream from executing during its designated clock cycle. This is due to hazards.

Classification of Hazards Hazard may occur for different cause there are three types of hazards. 1)Structural hazard 2) Data hazard 3) Control hazard or instruction hazard

Structural hazard – arise from resource conflicts. –Hardware can not support simultaneous overlapped instruction execution. Data hazard – due to dependency of sequential instructions. Control hazard – due to branch instruction.

Structural Hazard This is a situation when two instructions require to use a hardware resource at the same time. When some functional units are not fully pipelined i.e.  un-pipelined unit  not enough duplication.  single memory (used for both data & instruction.)

How to tackle structural hazard?  Duplication of important resource.  Using modular memory : –Data memory –Instruction memory

Data hazard  There are three types of data hazards: –RAW (Read After Write): Destination reg of writing for I1 &source reg of I2 are same –WAW (Write After Write): Destination reg of I1 & I2 are same –WAR (Write After Read) : Source reg of I1 & destination reg of I2 are same.

Example  RAW : add r0, r1,r2 sub r4, r3, r0  WAW : add r0, r1, r2 sub r0, r4, r5  WAR : add r2, r1, r0 sub r0, r3, r4

Control hazard  Due to branch and jump instruction –Branch Conditional Unconditional

Handling of control hazard  Pre-fetch target instruction :both instructions are fetched.  Branch Target Buffer (BTB) : Associative memory contains previously executed branch instructions.  Loop buffer : Small very high speed register file.  Branch prediction :to guess the outcome of the condition.  Delayed branch : Rearranging the m/c language

Contd.  Stall until decision is made –Insert “no-op” instructions: those that accomplish nothing, just take time –Drawback: branches take 3 clock cycles each (assuming comparator is put in ALU stage)

Performance issues of pipelining  Pipelining increases the CPU instruction throughput. Instruction throughput: The number of instructions completed per unit time.  Pipelining does not reduce the execution time of each instruction due to overhead in the control of pipelining.