5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.

Slides:

Advertisements

Similar presentations

CPU Structure and Function

Advertisements

Chapter 1. Basic Structure of Computers

PIPELINING AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING

Computer Organization and Architecture

CSCI 4717/5717 Computer Architecture

Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.

INSTRUCTION SET ARCHITECTURES

Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.

Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.

Chapter 3 Pipelining. 3.1 Pipeline Model n Terminology –task –subtask –stage –staging register n Total processing time for each task. –T pl =, where t.

Computer Organization and Architecture

Pipeline and Vector Processing (Chapter2 and Appendix A)

Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.

Computer Organization and Architecture

Computer Organization and Architecture

Computer Organization and Architecture The CPU Structure.

Chapter 12 Pipelining Strategies Performance Hazards.

EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.

Pipelining Fetch instruction Decode instruction Calculate operands (i.e. EAs) Fetch operands Execute instructions Write result Overlap these operations.

Goal: Reduce the Penalty of Control Hazards

Chapter 12 CPU Structure and Function. Example Register Organizations.

Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.

7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.

Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.

Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.

CH12 CPU Structure and Function

Group 5 Tony Joseph Sergio Martinez Daniel Rultz Reginald Brandon Haas Emmanuel Sacristan Keith Bellville.

9.2 Pipelining Suppose we want to perform the combined multiply and add operations with a stream of numbers: A i * B i + C i for i =1,2,3,…,7.

CPU Design and PipeliningCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: CPU Operations and Pipelining Reading: Stallings,

Presented by: Sergio Ospina Qing Gao. Contents ♦ 12.1 Processor Organization ♦ 12.2 Register Organization ♦ 12.3 Instruction Cycle ♦ 12.4 Instruction.

Speeding up of pipeline segments © Fr Dr Jaison Mulerikkal CMI.

The Central Processing Unit (CPU) and the Machine Cycle.

1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.

1 Pipelining and Vector Processing Computer Organization Computer Architectures Lab PIPELINING AND VECTOR PROCESSING Parallel Processing Pipelining Arithmetic.

Principles of Linear Pipelining

Principles of Linear Pipelining. In pipelining, we divide a task into set of subtasks. The precedence relation of a set of subtasks {T 1, T 2,…, T k }

Chapter One Introduction to Pipelined Processors

Represents different voltage levels High: 5 Volts Low: 0 Volts At this raw level a digital computer is instructed to carry out instructions.

Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010

Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.

Introduction  The speed of execution of program is influenced by many factors. i) One way is to build faster circuit technology to build the processor.

CPU Design and Pipelining – Page 1CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: CPU Operations and Pipelining Reading:

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.

Introduction to Computer Organization Pipelining.

Pipelining. A process of execution of instructions may be decomposed into several suboperations Each of suboperations may be executed by a dedicated segment.

Computer Architecture Chapter (14): Processor Structure and Function

The Processor and Machine Language

Pipelining and Vector Processing

Control unit extension for data hazards

Sequencing, Selection, and Loops in Machine Language

Computer Architecture

COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING

Control unit extension for data hazards

Control unit extension for data hazards

Chapter 11 Processor Structure and function

Presentation transcript:

5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2 S5S S1S1 S2S2 S5S5 S3S3 S4S Time

Five Stage Instruction Pipeline Fetch instruction Decode instruction Fetch operands Execute instructions Write result

Two major difficulties Data Dependency Branch Difficulties Solutions: Prefetch target instruction Delayed Branch Branch target buffer (BTB) Branch Prediction

Data Dependency Use Delay Load to solve: Example: load R1 R1  M[Addr1] load R2 R2  M[Addr2] ADD R3  R1+R2 Store M[addr3]  R3

Delay Load

Example Five instructions need to be carried out: Load from memory to R1 Increment R2 Add R3 to R4 Subtract R5 from R6 Branch to address X

Delay Branch

Rearrange the Instruction

Delayed Branch In this procedure, the compiler detects the branch instruction and rearrange the machine language code sequence by inserting useful instructions that keep the pipeline operating without interrupts

Prefetch target instruction Prefetch the target instruction in addition to the instruction following the branch If the branch condition is successful, the pipeline continues from the branch target instruction

Branch target buffer (BTB) BTB is an associative memory Each entry in the BTB consists of the address of a previously executed branch instruction and the target instruction for the branch

Loop Buffer Very fast memory Maintained by fetch stage of pipeline Check buffer before fetching from memory Very good for small loops or jumps The loop buffer is similar (in principle) to a cache dedicated to instructions. The differences are that the loop buffer only retains instructions in sequence, and is much smaller in size (and lower in cost).

Branch Prediction A pipeline with branch prediction uses some additional logic to guess the outcome of a conditional branch instruction before it is executed

Branch Prediction Various techniques can be used to predict whether a branch will be taken or not: Prediction never taken Prediction always taken Prediction by opcode Branch history table The first three approaches are static: they do not depend on the execution history up to the time of the conditional branch instruction. The last approach is dynamic: they depend on the execution history.

Floating Point Arithmetic Pipeline Pipeline arithmetic units are usually found in very high speed computers They are used to implement floating- point operations, multiplication of fixed- point numbers, and similar computations encountered in scientific problems

Floating Point Arithmetic Pipeline Example for floating-point addition and subtraction Inputs are two normalized floating-point binary numbers X = A x 2^a Y = B x 2^b A and B are two fractions that represent the mantissas a and b are the exponents Try to design segments are used to perform the “add” operation

Floating Point Arithmetic Pipeline Compare the exponents Align the mantissas Add or subtract the mantissas Normalize the result

Floating Point Arithmetic Pipeline X = x 103 and Y = x 102 The two exponents are subtracted in the first segment to obtain 3-2=1 The larger exponent 3 is chosen as the exponent of the result Segment 2 shifts the mantissa of Y to the right to obtain Y = x 103 The mantissas are now aligned Segment 3 produces the sum Z = x 103 Segment 4 normalizes the result by shifting the mantissa once to the right and incrementing the exponent by one to obtain Z = x 104