Chapter 3 Pipelining. 3.1 Pipeline Model n Terminology –task –subtask –stage –staging register n Total processing time for each task. –T pl =, where t.

Slides:

Advertisements

Similar presentations

Branch prediction Titov Alexander MDSP November, 2009.

Advertisements

Lecture 4: CPU Performance

PIPELINING AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING

Computer Organization and Architecture

Architecture-dependent optimizations Functional units, delay slots and dependency analysis.

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.

Instruction-Level Parallelism compiler techniques and branch prediction prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University March.

Pipelining and Control Hazards Oct

POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:

CS5365 Pipelining. Divide task into a sequence of subtasks. Each subtask is executed by a stage (segment)of the pipe.

Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.

Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.

Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.

Computer Architecture Pipelines & Superscalars. Pipelines Data Hazards Code: lw $4, 0($1) add $15, $1, $1 sub$2, $1, $3 and $12, $2, $5 or $13, $6, $2.

Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.

11/1/2005Comp 120 Fall November Exam Postmortem 11 Classes to go! Read Sections 7.1 and 7.2 You read 6.1 for this time. Right? Pipelining then.

S. Barua – CPSC 440 CHAPTER 6 ENHANCING PERFORMANCE WITH PIPELINING This chapter presents pipelining.

Chapter 12 Pipelining Strategies Performance Hazards.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

Chapter 12 CPU Structure and Function. Example Register Organizations.

Pipelined Processor II CPSC 321 Andreas Klappenecker.

7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.

Appendix A Pipelining: Basic and Intermediate Concepts

Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.

5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 3.

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

Anshul Kumar, CSE IITD CSL718 : Pipelined Processors  Types of Pipelines  Types of Hazards 16th Jan, 2006.

Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

Principles of Linear Pipelining

Principles of Linear Pipelining. In pipelining, we divide a task into set of subtasks. The precedence relation of a set of subtasks {T 1, T 2,…, T k }

Chapter One Introduction to Pipelined Processors

Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February 2, 2006 Session 6.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.

Chapter One Introduction to Pipelined Processors

CPU Design and Pipelining – Page 1CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: CPU Operations and Pipelining Reading:

Introduction to Computer Organization Pipelining.

Speedup Speedup is defined as Speedup = Time taken for a given computation by a non-pipelined functional unit Time taken for the same computation by a.

LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,

Computer Architecture Chapter (14): Processor Structure and Function

William Stallings Computer Organization and Architecture 8th Edition

Chapter 14 Instruction Level Parallelism and Superscalar Processors

Appendix C Pipeline implementation

Chapter One Introduction to Pipelined Processors

CDA 3101 Spring 2016 Introduction to Computer Organization

COMP4211 : Advance Computer Architecture

Pipelining: Advanced ILP

Pipelining and Vector Processing

Pipelining Chapter 6.

The processor: Pipelining and Branching

Instruction Execution Cycle

Computer Architecture

Pipelining: Basic Concepts

COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING

Control unit extension for data hazards

Control unit extension for data hazards

Linear Pipeline Processors

Pipelining and Superscalar Techniques

Presentation transcript:

Chapter 3 Pipelining

3.1 Pipeline Model n Terminology –task –subtask –stage –staging register n Total processing time for each task. –T pl =, where t i is the processing time, d i is the delay by the staging register, and k is the number of stages

3.1 Pipeline Model (continued) n Total processing time for each task. –T seq = n pipeline cycle time, t max = Max(t i +d i ), 1  I  k n clock frequency = 1/ t max n pipeline cycle time t cyc can be denoted by T seq /k + d n speedup, S =,where N is the number of tasks.

3.1 Pipeline Model (continued) n If staging register delay is ignored and the processing times of the stages are same, t cyc = T seq / k. Therefore, S ideal becomes n If

3.1 Pipeline Model (continued) n The total cost of the pipeline is given by C= L.k + Cp where Cp = and L is the cost of each staging register. n To minimize the composite cost per the computation rate, k =

3.1 Pipeline Model (continued) n In practice, making the delays of pipeline stages equal is a complicated and time-consuming process –It is essential to maximum performance that the stages be close to balanced. –It is done for commercial processors, although it is not easy and cheap to do n Another problem with pipelines is the overhead in term of handling exception or interrupts. –A deep pipeline increases the interrupt handling overhead.

Pipeline Types n Pipeline Types(Handler’s classification) –Instruction pipelines n FI, DI, CA, FO, EX, ST –arithmetic pipelines –processor pipelines: a cascade of processors each executing a specific module in the application program.

Instruction pipeline n reservation table –Row : stages –Column : pipeline cycles n The cycle time of instruction pipelines is often determined by the stages requiring memory access.

Control Hazard n Conditional branch instructions –The target address of branch will be known only after the evaluation of the condition. n The ways to solve control hazards –The pipeline is frozen –The pipeline predicts that the branch will not be taken. – It would be to start fetching the target instruction sequence into a buffer while the nonbranch sequence is being fed into the pipeline.

Arithmetic pipelines n Floating point addition –Consider S = A + B, where A=(Ea,Ma), B=(Eb, Mb), and S=(Es,Ms) –Addition steps (Figure 3.5) n Equalize the exponents n Add mantissas n Normalize Ms and adjust Es for the sum normalization n Round Ms n Renormalize Ms and adjust Es –Modified floating point add pipeline (Figure 3.6 & 3.7)

Arithmetic pipelines(cont.) n floating point multiplication –Consider P= A x B, where A=(Ea,Ma), B=(Eb, Mb), and P=(Ep,Mp) –Multiplication steps (Figure 3.8) n Add exponents n Multiply mantissas n Normalize Mp and adjust Ep n Round Mp n Renormalize Mp and adjust Ep –Modified floating point add pipeline (Figure 3.9)

Arithmetic pipelines(cont.) n Multifunction pipeline –To perform more than one operation –A control input is needed for proper operation of the multifunction pipeline. –Figure 3.10 : floating point add/multiplier

Classification scheme by Ramamoorthy and Li n Functionality –unifunctional – multifunctional n Configuration –static –dynamic n Mode of operation: –scalar – vector

3.2 Pipeline control and Performance n To provide the max. possible throughput, it must be kept full and flowing smoothly. n Two conditions of smooth flow of a pipeline: –the rate of input of data –data interlocks between the stages n Example 3.1 : the pipeline completes one operation per cycle(once it is full) n Example 3.2 : non-linear pipeline

Structural hazard n Due to the non-availability of appropriate hardware n One obvious way of avoiding structural hazard is to insert additional hardware into the pipeline.

Example 3.3 n Figure 3.12 depicts the operation of the pipeline –In cycle 3, 4, 5, and 6, simultaneous accesses are needed. – If we assume that the machine has separate data and instruction caches, in cycles 5 and 6 the problems are solved. – One way to solve the problem in cycle 4 is to stall the ADD instruction (Figure 3.13) n The stalling process results in a degradation of pipeline performance.

Collision vectors n Initiation : launching of an operation into the pipeline n Latency: the number of cycles that elapse between two initiation. n Latency sequence: the latencies between successive initiations n Collision: it occurs if a stage in the pipeline is required to perform more than one task at any time.

Collision vectors(cont.) n Forbidden set: the set of all possible column distances between two entries on some row of RT. n Collision vector can be derived from forbidden set F and can be utilized to control the initiation of operations in the pipelines. –CV = (v n-1,v n-2,…,v 2,v 1 ) –V i =1 if i is in the forbidden set

Examples Example 3.4 (a)Overlapped RT (b)Collision Vector(CV) Example 3.5 & 3.6 Collision case and no collision case

Control n How to control the initiation of pipeline using CV. –Place the CV in a shift reg. –If the LSB of the shift reg. Is 1, do not initiate an operation at that cycle; shift the CV right once, inserting 0 at the vacant MSB position –If the LSB of the shift reg. Is 0, initiate a new operation at that cycle; shift the CV right once, inserting 0 at the vacant MSB position. In order to reflect the superposing status due to the new initiation over the original one, perform a bit-by-bit OR of the original CV with the content of the shift reg.

3.2.3 Performance n Figure 3.15(a) – The CV of Figure 3.11 : (00111) – Figure 3.15(a) shows the state transitions.

3.2.3 Performance n Average latency n simple cycle n greedy cycle n MAL(Minimum average Latency)

3.2.4 Multifunction Pipelines n Figure 3.17 n Vxx, Vxy, Vyx, Vyy

3.3 Other Pipeline Problems n Data Interlock: due to the sharing of resources. Data hazard n data forwarding n internal forwarding –write-read forwarding –read-read forwarding –write-write forwarding n load/store architectures versus memory/memory architectures

3.3 Other Pipeline Problems (continued) n Conditional Branches –branch prediction –delayed branch –branch-prediction buffer –branch history –multiple instruction buffers n Interrupts –precise interrupt scheme

3.4 Dynamic Pipelines n Instruction deferral –scoreboard n Tomosulo’s algorithm n Performance evaluation –maximizing the total number of initiations per unit time –minimizing the total time required to handle a specific sequences of initiation table types

3.5 Example systems n CDC Star-100 n CDC 6600 n MIPS R-4000

3.6 Summaries n Three approaches have been tried to improve the performance beyond the ideal CPI case: –superpipeline –superscalar –VLIW(Very Long Instruction Word)

End of Chapter 3