PIPELINE AND VECTOR PROCESSING

Slides:

Advertisements

Similar presentations

Chapter 2: Data Manipulation

Advertisements

The CPU The Central Presentation Unit What is the CPU?

PIPELINING AND VECTOR PROCESSING

1 ITCS 3181 Logic and Computer Systems B. Wilkinson Slides9.ppt Modification date: March 30, 2015 Processor Design.

ARITHMETIC LOGIC SHIFT UNIT

CMPT 334 Computer Organization

Pipeline and Vector Processing (Chapter2 and Appendix A)

Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.

CHAPTER 4 COMPUTER SYSTEM – Von Neumann Model

The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.

Chapter 12 CPU Structure and Function. Example Register Organizations.

SUPERSCALAR EXECUTION. two-way superscalar The DLW-2 has two ALUs, so it’s able to execute two arithmetic instructions in parallel (hence the term two-way.

Advanced Computer Architectures

Micro-operations Are the functional, or atomic, operations of a processor. A single micro-operation generally involves a transfer between registers, transfer.

1 Catalog of useful (structural) modules and architectures In this course we will be working mostly at the BEHAVIORAL and STRUCTURAL levels. We will rely.

5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.

Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.

Machine Instruction Characteristics

Pipeline And Vector Processing. Parallel Processing The purpose of parallel processing is to speed up the computer processing capability and increase.

Presented by: Sergio Ospina Qing Gao. Contents ♦ 12.1 Processor Organization ♦ 12.2 Register Organization ♦ 12.3 Instruction Cycle ♦ 12.4 Instruction.

Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.

1 Microprocessor-based systems Course 2 General structure of a computer.

PIPELINING AND VECTOR PROCESSING

Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,

Speeding up of pipeline segments © Fr Dr Jaison Mulerikkal CMI.

The Central Processing Unit (CPU) and the Machine Cycle.

CS 111 – Sept. 15 Chapter 2 – Manipulating data by performing instructions “What is going on in the CPU?” Commitment: –Please read through section 2.3.

Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.

1 Pipelining and Vector Processing Computer Organization Computer Architectures Lab PIPELINING AND VECTOR PROCESSING Parallel Processing Pipelining Arithmetic.

Principles of Linear Pipelining

Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.

Chapter One Introduction to Pipelined Processors

Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.

Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010

Pipelining Pipelining is a design feature that allows multiple instructions to be run simultaneously. Speeds up the execution of instruction processing.

Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.

HOW COMPUTERS WORK THE CPU & MEMORY. THE PARTS OF A COMPUTER.

3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,

RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.

CPIT Program Execution. Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the.

DICCD Class-08. Parallel processing A parallel processing system is able to perform concurrent data processing to achieve faster execution time The system.

UNIT-V PIPELINING & VECTOR PROCESSING.

Parallel Processing - introduction

Morgan Kaufmann Publishers The Processor

Morgan Kaufmann Publishers

Morgan Kaufmann Publishers The Processor

Design of the Control Unit for Single-Cycle Instruction Execution

Pipelining and Vector Processing

Design of the Control Unit for One-cycle Instruction Execution

Chapter 2: Data Manipulation

Overview Parallel Processing Pipelining

Chap. 9 Pipeline and Vector Processing

Computer Architecture

Chapter 2: Data Manipulation

COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING

Course Outline for Computer Architecture

Compiler analyzes the instructions before and after the branch and rearranges the program sequence by inserting useful instructions in the delay steps.

COMPUTER ORGANIZATION AND ARCHITECTURE

Instruction execution and ALU

Chapter 2: Data Manipulation

Presentation transcript:

PIPELINE AND VECTOR PROCESSING CHAPTER # 9 PIPELINE AND VECTOR PROCESSING

CONTENTS Parallel Processing Pipelining Arithmetic Pipeline Instruction Pipeline RISC Pipeline Vector Processing Array Processors

Figure 9-1 Processor with multiple functional units Adder-sub tractor Integer multiply Logic unit Shift unit Processor register Incrementer To memory Floating-point Add-subtract Floating-point multiply Floating-point divide

Instruction and stream. Single instruction stream, single data stream (SISD). Single instruction stream, multiple data stream (SIMD). Multiple instruction stream, single data stream (MISD). Multiple instruction stream, multiple data stream (MIMD).

Figure 9-2 Example of Pipelining. Ai Bi Ci R1 Ai , R2 Bi Input Ai and Bi R3 R1 * R2, R4 Ci Multiply and input Ci R5 R3 + R4 Add Ci to product R1 R2 Multiplier R3 R4 Adder R5

1 A1 B1 ---- ---- ---- Content of registers in pipeline example. Table 9-1 Content of registers in pipeline example. Clock Pulse number Segment1 R1 R2 Segment2 R3 R4 Segment3 R5 1 A1 B1 ---- ---- ---- 2 A2 B2 A1*B1 C1 ---- 3 A3 B3 A2*B2 C2 A1*B1+C1 4 A4 B4 A3*B3 C3 A2*B2+C2 5 A5 B5 A4*B4 C4 A3*B3+C3 6 A6 B6 A5*B5 C5 A4*B4+C4 7 A7 B7 A6*B6 C6 A5*B5+C5 8 ---- ---- A7*B7 C7 A6*B6+C6 9 ---- ---- ---- ---- A7*B7+C7

Figure 9-3 Four segment pipeline. Clock Input S1 R1 S2 R2 S3 R3 S4 R4

Figure 9-4 Space-time diagram for pipeline. Clock cycle 1 2 3 4 5 6 7 8 9 T1 T2 T3 T4 T5 T6 Segment: 1 2 3 4

Figure 9-5 Multiple functional units in parallel. Ii+3 P3 Ii+2 P2 Ii+1 P1 Ii

Add or subtract the mantissas. Normalize the result. Arithmetic Pipeline Compare the exponents. Align the mantissas. Add or subtract the mantissas. Normalize the result.

Exponents Mantissas a b A B R Difference Figure 9-6 Pipeline for floating-point and subtraction. Exponents Mantissas a b A B Segment 1 Segment 2 Segment 3 Segment 4 R Compare Exponent By subtraction Choose exponent Adjust Align mantissas Add or subtract mantissas Normalize result Difference

Instruction Pipeline Fetch the instruction from memory. Decode the instruction. Calculate the effective address. Fetch the operands from memory. Execute the instruction. Store the result in the proper place.

Figure 9-7 Four-segment CPU pipeline. Decode instruction And calculate Effective address Fetch instruction from memory Branch? Fetch operand From memory Execute instruction Interrupt? Interrupt handling Update PC Empty pipe yes no

Segments and their purpose. FI is the segment that fetches an instruction. DA is the segment that decodes the instruction and calculate the effective address. FO is the segment that fetches the operand. EX is the segment that executes the instruction.

Figure 9-8 Timing of instruction pipeline. Step: 1 2 3 4 5 6 7 8 9 10 11 12 13 Instruction: 1 FI DA FO EX 2 FI DA FO EX (Branch) 3 FI DA FO EX 4 FI -- -- FI DA FO EX 5 -- -- -- FI DA FO EX 6 FI DA FO EX 7 FI DA FO EX

Pipeline Conflicts Resource conflicts Data dependency conflicts Branch difficulties conflicts

Three-segment instruction pipeline I: Instruction fetch A: ALU operation E: Execute instruction

Delayed Load LOAD R1 M[address 1] LOAD R2 M[address 2] ADD R3 R1+R2 STORE M[address 3] R3

Figure 9-9 Three segment pipeline timing. 6 5 4 3 2 1 I Clock cycles A E 1. Load R1 2. Load R2 3. Add R1+R2 4. Store R3 Pipeline timing with data conflict 7 3. No-operation 4. Add R1+R2 5. Store R3 Pipeline timing with delayed load E

Figure 9-10 Examples of delayed branch. Clock cycles A E 1. Load 2. Increment 3. Add 4. Subtract 10 9 8 7 6 5 4 3 2 1 5. Branch to X 6. NO-operation 7. NO-operation 8. Instruction in X Using no-operation instructions

Figure 9-10 Examples of delayed branch. 2 3 4 5 6 7 8 Clock cycles I A E 1. Load 2. Increment I A E 3. Branch to X I A E 4. Add I A E 5. Subtract I A E 6. Instruction in X I A E Rearranging instruction

Application of Vector Processing Long range weather forecasting. Petroleum explorations. Seismic data analysis. Medical diagnosis. Aerodynamics and space flight simulations.

Figure 9-11 Instruction format for vector processor Operation code Base address Source 1 Base address Source 2 Base address destination Vector length

Figure 9-12 Pipeline for calculating an inner product Source A B Multiplier pipeline Adder

Figure 9-13 Multiple module memory organization AR DR Memory array Address bus Data bus

Types of Array Processors Attached Array Processor SIMD Array Processor

Figure 9-14 Attached Array Processor with host computer General-Purpose computer input-output interface Attached array processor Local memory Main memory High-speed memory to Memory bus

Figure 9-15 SIMD array processor organization Master control unit Main memory PE1 PE2 PE3 PEn M1 M2 M3 Mn