Pipelining and Superscalar Techniques

Slides:

Advertisements

Similar presentations

CH14 Instruction Level Parallelism and Superscalar Processors

Advertisements

Computer Organization and Architecture

CSCI 4717/5717 Computer Architecture

Instruction-Level Parallel Processors {Objective: executing two or more instructions in parallel} 4.1 Evolution and overview of ILP-processors 4.2 Dependencies.

Superscalar processors Review. Dependence graph S1S2 Nodes: instructions Edges: ordered relations among the instructions Any ordering-based transformation.

POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:

Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.

CS5365 Pipelining. Divide task into a sequence of subtasks. Each subtask is executed by a stage (segment)of the pipe.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Chapter 3 Pipelining. 3.1 Pipeline Model n Terminology –task –subtask –stage –staging register n Total processing time for each task. –T pl =, where t.

Pipelining Principles Prof. Sirer CS 316 Cornell University.

Microprocessor Microarchitecture Dependency and OOO Execution Lynn Choi Dept. Of Computer and Electronics Engineering.

Instruction-Level Parallelism (ILP)

Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.

Chapter 12 Pipelining Strategies Performance Hazards.

1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.

Review of CS 203A Laxmi Narayan Bhuyan Lecture2.

Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.

7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.

Pipelining By Toan Nguyen.

1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.

1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.

Chapter One Introduction to Pipelined Processors.

Speeding up of pipeline segments © Fr Dr Jaison Mulerikkal CMI.

Parallel architecture Technique. Pipelining Processor Pipelining is a technique of decomposing a sequential process into sub-processes, with each sub-process.

Principles of Linear Pipelining

Principles of Linear Pipelining. In pipelining, we divide a task into set of subtasks. The precedence relation of a set of subtasks {T 1, T 2,…, T k }

Chapter One Introduction to Pipelined Processors

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.

Chapter One Introduction to Pipelined Processors

PART 5: (1/2) Processor Internals CHAPTER 14: INSTRUCTION-LEVEL PARALLELISM AND SUPERSCALAR PROCESSORS 1.

Chapter One Introduction to Pipelined Processors

Dataflow Order Execution  Use data copying and/or hardware register renaming to eliminate WAR and WAW register name refers to a temporary value produced.

Chapter One Introduction to Pipelined Processors.

IBM System 360. Common architecture for a set of machines

Advanced Architectures

William Stallings Computer Organization and Architecture 8th Edition

William Stallings Computer Organization and Architecture 8th Edition

Simultaneous Multithreading

Chapter 9 a Instruction Level Parallelism and Superscalar Processors

Assembly Language for Intel-Based Computers, 5th Edition

CS203 – Advanced Computer Architecture

Chapter 14 Instruction Level Parallelism and Superscalar Processors

Chapter One Introduction to Pipelined Processors

Chapter One Introduction to Pipelined Processors

Chapter One Introduction to Pipelined Processors

CS203 – Advanced Computer Architecture

Pipelining: Advanced ILP

Instruction Level Parallelism and Superscalar Processors

Lecture 6: Advanced Pipelines

Out of Order Processors

Topic 6: Pipelining and Pipelined Architecture

Instruction Level Parallelism and Superscalar Processors

Serial versus Pipelined Execution

Checking for issue/dispatch

Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)

How to improve (decrease) CPI

Advanced Computer Architecture

Tomasulo Organization

Scoreboarding ENGS 116 Lecture 7 Vincent H. Berk October 5, 2005

Dynamic Hardware Prediction

Created by Vivi Sahfitri

COMPUTER ORGANIZATION AND ARCHITECTURE

Linear Pipeline Processors

Presentation transcript:

Pipelining and Superscalar Techniques Chapter 6 Pipelining and Superscalar Techniques

Linear pipeline processors A linear pipeline processor is a cascade of processing stages which are linearly connected to perform a fixed function over a stream of data flowing from one end to the other. Depending on the control of flow of data along the pipeline, 2 categories of pipeline: asynchronous and synchronous model.

Asynchronous model: data flow between adjacent stages in this is controlled by a handshaking protocol. When stage is ready to transmit, it sends a ready signal to the next stage. Next stage then receives the data and returns the acknowledgement. Different delays may be experienced in different stages.

Synchronous: Clocked Latches are used Synchronous: Clocked Latches are used. Upon arrival of the clock signal, all latches transfer data to the next stage simultaneously. The utilization pattern of successive stages in a synchronous pipeline is specified by a reservation table.

Clock cycle: let t be the clock cycle of a pipeline Clock cycle: let t be the clock cycle of a pipeline. Ti be the time delay of the stage. D is the time delay of the latch. T=t + d Pipeline frequency: f=1/t

Total time: T = [k + (n-1)]t Where k =no. of stages. N= no Total time: T = [k + (n-1)]t Where k =no. of stages. N= no. of tasks, t= clock cycle. Speed-up factor: Sk= T1/ Tk = nkt/ [k + (n-1)]t = nk/ [k + (n-1)]

K0 = √t.c/d.h Performance/Cost ratio: Given by larson: PCR= f/ [c + kh] = 1/ [ (t/k + d)(c +kh)] Where c = cost of logic stage And h = cost of latch K0 = √t.c/d.h

Efficiency : Ek = Sk/ k. = n/ [k + (n-1)] Throughput: number of tasks performed per unit time. Hk = n / [k + (n-1)]t = nf/ [k + (n-1)].

Dynamic/ Non-linear pipeline Linear pipelines are static pipelines. Dynamic pipelines can be reconfigured to perform variable functions at different times. Dynamic pipeline allows feedforward and feedback connections in addition to the streamline connections.

Reservation table Multiple checkmarks in a row, which means repeated usage of the same stage in different cycles. Latency: the number of time units between two initiations of a pipeline is the latency between them. A latency of k means that two initiations are separated by k clock cycles. Any attempt by two or more initiations to use the same pipeline stage at the same time will cause a collision.

Some latencies will cause collisions and some will not Some latencies will cause collisions and some will not. Latencies that cause collisions are called forbidden latencies.

Instruction pipeline design Instruction execution phases. Mechanisms for instruction pipeline: three types of buffers can be used to match the instruction fetch rate to the pipeline consumption rate. Prefetch buffers: sequential buffer and target buffer. Loop buffer: holds sequential instructions contained in a small loop.

Multiple functional units: a certain pipeline stage can become a bottle neck. This problem can be alleviated by using multiple copies of the same stage simultaneously.

Mechanisms for Instruction pipelining

Internal data forwarding The throughput of a pipelined processor can be further improved with internal data forwarding among multiple functional units. Some memory-access operations can be replaced by register transfer operations. 1. Store-load forwarding. 2. Load-load forwarding. 3. Store- Store forwarding

Hazard avoidance The read and write of shared variables by different instructions in a pipeline may lead to different results if these instructions are executed out of order. 1. RAW hazard: read after write- flow dependence. 2. WAW: write after write: output dependence. 3. WAR: write after read: antidependence.

R(i)∩ D(j) ≠ 0 : raw (Flow dependence) R(i)∩ R(j) ≠ 0 : waw(O/p dependence) D(i)∩ R(j) ≠ 0 : war(Anti -dependence) Where D= domain: contain input set R= range: output set.

instruction scheduling 3 methods for scheduling instructions through an instruction pipeline. 1. Static scheduling scheme 2. Dynamic scheduling- Tomasulo’s register tagging scheme and scoreboarding scheme