3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,

Slides:



Advertisements
Similar presentations
Instruction Level Parallelism and Superscalar Processors
Advertisements

CH14 Instruction Level Parallelism and Superscalar Processors
PIPELINE AND VECTOR PROCESSING
Computer Organization and Architecture
Computer architecture
CSCI 4717/5717 Computer Architecture
RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
The University of Adelaide, School of Computer Science
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
1 Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 – 4.4)
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Parallell Processing Systems1 Chapter 4 Vector Processors.
Instruction Level Parallelism (ILP) Colin Stevens.
Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
11/11/05ELEC CISC (Complex Instruction Set Computer) Veeraraghavan Ramamurthy ELEC 6200 Computer Architecture and Design Fall 2005.
Appendix A Pipelining: Basic and Intermediate Concepts
Chapter 14 Instruction Level Parallelism and Superscalar Processors
SUPERSCALAR EXECUTION. two-way superscalar The DLW-2 has two ALUs, so it’s able to execute two arithmetic instructions in parallel (hence the term two-way.
Chapter One Introduction to Pipelined Processors.
Advanced Computer Architectures
Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.
Basics and Architectures
Pipeline And Vector Processing. Parallel Processing The purpose of parallel processing is to speed up the computer processing capability and increase.
Chapter One Introduction to Pipelined Processors.
What have mr aldred’s dirty clothes got to do with the cpu
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
RISC Architecture RISC vs CISC Sherwin Chan.
RISC architecture and instruction Level Parallelism (ILP) based on “Computer Architecture: a Quantitative Approach” by Hennessy and Patterson, Morgan Kaufmann.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Parallel architecture Technique. Pipelining Processor Pipelining is a technique of decomposing a sequential process into sub-processes, with each sub-process.
Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.
ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013.
Processor Architecture
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Pipelining and Parallelism Mark Staveley
Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010
EKT303/4 Superscalar vs Super-pipelined.
Introduction  The speed of execution of program is influenced by many factors. i) One way is to build faster circuit technology to build the processor.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
Winter-Spring 2001Codesign of Embedded Systems1 Essential Issues in Codesign: Architectures Part of HW/SW Codesign of Embedded Systems Course (CE )
Chapter One Introduction to Pipelined Processors.
Advanced Architectures
William Stallings Computer Organization and Architecture 8th Edition
Chapter 9 a Instruction Level Parallelism and Superscalar Processors
Chapter 14 Instruction Level Parallelism and Superscalar Processors
COMP4211 : Advance Computer Architecture
Instruction Level Parallelism and Superscalar Processors
Pipelining and Vector Processing
Superscalar Processors & VLIW Processors
EE 445S Real-Time Digital Signal Processing Lab Spring 2014
Instruction Level Parallelism and Superscalar Processors
Pipelining Chapter 6.
Computer Architecture
Data Dependence Distances
Pipelining: Basic Concepts
COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING
Created by Vivi Sahfitri
COMPUTER ORGANIZATION AND ARCHITECTURE
Pipelining.
Presentation transcript:

3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1

Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining, a computation is divided into number of steps called stages Each stage works at full speed on a particular part of a computation

Pipelining Often output of one stage becomes the input to the second stage When all stages work on same speed and pipe is full, the work rate of pipelines equals to the sum of the work rates of the stages

An Example There are 3 stages: A, B and C In the time slot T3, all the stages can be performed simultaneously as they run on different parts Above situations lead to challenged foe pipelining to work

Instruction Pipeline F: Fetch Instruction D: Decode Instruction Ex: Execute Instruction W: Write to the memory

Non-Ideal Situations It is not possible to breakup an instruction execution into stages taking same time Successive instructions are not always independent There may be resource constraints due to limited size of chip

Making a Parallel Program Important Concepts: Task: A piece of work of a parallel program that can not be decomposed Process: An abstract entity and performs the task assigned to processors -We first write a program in terms of processes then map to the processors

Making a Parallel Program (Contd.) Group of tasks are mapped to different processors

An Example: Merge Sort using two Processors:

Mapping Decides which process will run which processor Takes care of the processor architecture Also considers the load balancing, fault tolerance issues

Dependency Analysis For decomposing an application into tasks, we look for dependency among statements in the sequential program of the application There mainly two types of dependency: - Data dependency - Control dependency

Super Pipelining In pipelining we assume that each stage takes one clock cycle but this is in ideal condition In practice some pipeline stages take less time than one clock time So we can divide each clock cycle into two phases and allocate intervals Pipelining becomes faster if the two phases need different resources( to avoid the resource conflict) - thus the notion of superpipelining

Superscalar Processing In superscalar processing, data parallelism and temporal parallelism are combined to increase the speed of the processor This is achieved by issuing more than one instruction at same time in each clock cycle Hardware should be able to fetch several instructions at a time

Superscalar Processing

Vector Processors Specially designed to perform vector operations A vector operation involves a large array of operands i.e. some operation is performed over different data Excellent compilers for vector code written in a programming language are available

Register Based Vector Operation (An Example) Let * be a vector operator and V1 and V2 are the vector operands stored in registers V3 ← V1 * V2 V3 may be a vector or scalar The vector length should be equal in all operands

Array Processors A vector processor works by streaming the vectors through a pipelined unit Another architecture to perform vector operation is to use an array of n Processing Elements (PE) Each PE stores a pair of operands The operation is broadcast to all PEs simultaneously Such organization of PEs is called array processor

VLIW Architecture In superscalar processing, it is quite difficult to duplicate instruction register, decoder and arithmetic unit VLIW (Very Long Instruction Word) processor has instruction words hundreds of bits in length Multiple functional units are used simultaneously in a VLIW processor The units are Integer unit, FP unit, Branch unit, Load/Store unit etc. The objective is to keep all these units busy

A Typical VLIW Instruction Format

Major Challenges in Designing VLIW Processors Lack of sufficient instruction level parallelism Hardware complexity (needs high memory and register bandwidth) Inefficient use of bits in a very long instruction word