\course\eleg652-03F\Topic1a- 03F.ppt1 Vector and SIMD Computers Vector computers SIMD.

Slides:



Advertisements
Similar presentations
Parallel Processors.
Advertisements

Department of Computer Science University of the West Indies.
Electrical and Computer Engineering UAH System Level Optical Interconnect Optical Fiber Computer Interconnect: The Simultaneous Multiprocessor Exchange.
The CPU The Central Presentation Unit What is the CPU?
PIPELINE AND VECTOR PROCESSING
EEE226 MICROPROCESSORBY DR. ZAINI ABDUL HALIM School of Electrical & Electronic Engineering USM.
9/20/2001CSE 260, class 1 CSE 260 – Introduction to Parallel Computation Larry Carter Office Hours: AP&M 4101 MW 10:00-11 or by appointment.
Fundamental of Computer Architecture By Panyayot Chaikan November 01, 2003.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Parallell Processing Systems1 Chapter 4 Vector Processors.
DH2T 34 Computer Architecture 1 LO2 Lesson Two CPU and Buses.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
ELEC 6200, Fall 07, Oct 29 McPherson: Vector Processors1 Vector Processors Ryan McPherson ELEC 6200 Fall 2007.
SIMD and Associative Computational Models
PSU CS 106 Computing Fundamentals II Introduction HM 1/3/2009.
1 Static Interconnection Networks CEG 4131 Computer Architecture III Miodrag Bolic.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
Chapter 17 Microprocessor Fundamentals William Kleitz Digital Electronics with VHDL, Quartus® II Version Copyright ©2006 by Pearson Education, Inc. Upper.
Chapter 5 Array Processors. Introduction  Major characteristics of SIMD architectures –A single processor(CP) –Synchronous array processors(PEs) –Data-parallel.
STARAN Parallel processor system hardware By KENNETH E. BATCHER Presented by Manoj k. Yarlagadda Manoj k. Yarlagadda.
Computer Organization Computer Organization & Assembly Language: Module 2.
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.
Introduction 9th January, 2006 CSL718 : Architecture of High Performance Systems.
Computer Architecture and Organization Introduction.
Outline Classification ILP Architectures Data Parallel Architectures
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Chapter One Introduction to Pipelined Processors.
Chapter 1 Introduction. Architecture & Organization 1 Architecture is those attributes visible to the programmer —Instruction set, number of bits used.
PIPELINING AND VECTOR PROCESSING
COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION 03/26/
Computer Organization - 1. INPUT PROCESS OUTPUT List different input devices Compare the use of voice recognition as opposed to the entry of data via.
Vector/Array ProcessorsCSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section.
Computer Architecture Memory, Math and Logic. Basic Building Blocks Seen: – Memory – Logic & Math.
Principles of Linear Pipelining
M U N - February 17, Phil Bording1 Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February.
Computer Organization. This module surveys the physical resources of a computer system.  Basic components  CPU  Memory  Bus  I/O devices  CPU structure.
Parallel Computing.
HOW a Computer Works ? Anatomy of Microprocessor.
1 Basic Components of a Parallel (or Serial) Computer CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM.
Vector and symbolic processors
Outline Why this subject? What is High Performance Computing?
Computer performance issues* Pipelines, Parallelism. Process and Threads.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 11: May10, 2001 Data Parallel (SIMD, SPMD, Vector)
Lecture 3: Computer Architectures
2/16/2016 Chapter Four Array Computers Index Objective understand the meaning and structure of array computer realize the associated instruction sets,
Parallel Algorithms for array processors
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
1 3 Computing System Fundamentals 3.2 Computer Architecture.
Architecture of a Massively Parallel Processor Kenneth E. Batcher 1980 presented by Yao Wu April 25, 2003.
A Scalable Pipelined Associative SIMD Array With Reconfigurable PE Interconnection Network For Embedded Applications Hong Wang & Robert A. Walker Computer.
Array computers. Single Instruction Stream Multiple Data Streams computer There two types of general structures of array processors SIMD Distributerd.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
William Stallings Computer Organization and Architecture 6th Edition
ESE532: System-on-a-Chip Architecture
Computer Architecture and Organization
How does an SIMD computer work?
Laxmi Narayan Bhuyan SIMD Architectures Laxmi Narayan Bhuyan
Array Processor.
Multivector and SIMD Computers
buses, crossing switch, multistage network.
Part 2: Parallel Models (I)
COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING
Objectives Describe common CPU components and their function: ALU Arithmetic Logic Unit), CU (Control Unit), Cache Explain the function of the CPU as.
Husky Energy Chair in Oil and Gas Research
Presentation transcript:

\course\eleg652-03F\Topic1a- 03F.ppt1 Vector and SIMD Computers Vector computers SIMD

\course\eleg652-03F\Topic1a- 03F.ppt2 A processor that is capable of adding two vectors by streaming the two sectors through a pipelined adder Pipelined Adder Multiport Memory System Stream A Stream B Stream C = A + B

\course\eleg652-03F\Topic1a- 03F.ppt3 Key of Performance Keeping up the bandwidth of C : = A + B Problem: RAM can only support 1 word/cycle 3 mem reference per cycle for oprands/result

\course\eleg652-03F\Topic1a- 03F.ppt4 MEMORY intermediate “buffer” memory Arithmatic pipeline Multiple use per data is favorable for bandwidth Must avoid bottleneck here!

\course\eleg652-03F\Topic1a- 03F.ppt5 The Architecture of a Vector Computer Scalar Functional Pipelines Scalar Control Unit Main Memory (Program and Data) Vector Control Unit Vector Registers Vector Func. Pipe. Vector Instructions Vector Data Control Scalar Processor Scalar Instructions Instruction Scalar Data Mass Storage Host Computer I/O (User) Vector Processor

\course\eleg652-03F\Topic1a- 03F.ppt6 SIMD Architectures

\course\eleg652-03F\Topic1a- 03F.ppt7 ILLIAC IV Univ. of Illinois + BSP Objective: 10 9 op/sec. 256 PE + 4CU Achieved: - 64 PE + 1CU x 10 6 op/sec Applications - weather forecasting - nuclear engineering

\course\eleg652-03F\Topic1a- 03F.ppt8 Function of CU - store user program - decode all instructions and determine where they are to be executed - execute scalar instructions - broadcast vector instructions Function of PE : perform the same function - lock-step - masking scheme - data routing Function of interconnection network - comm. between PEs (data exchanges) data bus broadcasted from CU {

\course\eleg652-03F\Topic1a- 03F.ppt9 PE 0 PEM 0 Data & Instructions PE 1 PEM 1 PE n-1 PEM n-1 Interconnection network Data bus Control bus Cont Configuration I (Illiac IV)... CU memory CU

\course\eleg652-03F\Topic1a- 03F.ppt10 I/O Data bus Configuration II (BSP)... CU memory CU Alignment network PE 0 PE 1 PE n-1 M1M1 M p-1 M0M0... Cont

\course\eleg652-03F\Topic1a- 03F.ppt PE routing connections

\course\eleg652-03F\Topic1a- 03F.ppt (a) Electrical connectivity Layout for ILLIAC-IV

\course\eleg652-03F\Topic1a- 03F.ppt Shifts of 20 Shifts of 21 (b) The physical layout

\course\eleg652-03F\Topic1a- 03F.ppt14 Input Alignment Network Output Alignment Network MMMM... PPPP 17 Inputs 16 Outputs 16 Inputs 17 Outputs 16 Processors17 Memories The data flow and processor/memory structure of the Burroughs Scientific Processor (BSP)

\course\eleg652-03F\Topic1a- 03F.ppt15 Mesh connected - multi-dimensional(cont’d) ICL DAP ICL 2980 Host 2D - nearest neighbor connection 64 x 64 (4096 PEs) (16PE/ board) AMTVLSI DAP 500 DAP 510: 32 x 32 array 64 PE /chip logic in memory

\course\eleg652-03F\Topic1a- 03F.ppt16 The AMT DAP 500 ARRAY MEMORY 32 32K BITS FAST DATA CHANNEL PROCESSOR ELEMENTS O A C D ACCUMULATOR ACTIVITY CONTROL CARRY DATA HOST CONNECTION UNIT MASTER CONTROL UNIT USER INTERFACE CODE MEMORY

\course\eleg652-03F\Topic1a- 03F.ppt17 (a) Multivector track Illiac IV (Barnes et al, 1968) Goodyear MPP (Batcher, 1980) BSP (Kuck and Stokes, 1982) DAP 610 (AMT, Inc. 1987) CM2 (TMC, 1990) MasPar MP1 (Nickolls, 1990) IBM GF/11 (Beetem et al, 1985) CDC 7600 (CDC, 1970) CDC Cyber 205 (Levine, 1982) Cray 1 (Russell, 1978) ETA 10 (ETA, Inc. 1989) Cray Y-MP (Cray Research, 1989) Cray/MPP (Cray Research, 1993) Fujitsu, NEC, Hitachi Models (b) SIMD track