Laxmi Narayan Bhuyan http://www.cs.ucr.edu/~bhuyan SIMD Architectures Laxmi Narayan Bhuyan http://www.cs.ucr.edu/~bhuyan.

Slides:

Advertisements

Similar presentations

Computer Organization, Bus Structure

Advertisements

CS 213: Parallel Processing Architectures Laxmi Narayan Bhuyan Lecture3.

DH2T 34 Computer Architecture 1 LO2 Lesson Two CPU and Buses.

Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.

CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.

2. Multiprocessors Main Structures 2.1 Shared Memory x Distributed Memory Shared-Memory (Global-Memory) Multiprocessor:  All processors can access all.

\course\eleg652-03F\Topic1a- 03F.ppt1 Vector and SIMD Computers Vector computers SIMD.

CS 213: Parallel Processing Architectures Laxmi Narayan Bhuyan

1 Sec (2.1) Computer Architectures. 2 For temporary storage of information, the CPU contains cells, or registers, that are conceptually similar to main.

Parallel Processing Architectures Laxmi Narayan Bhuyan

CSCI 8150 Advanced Computer Architecture Hwang, Chapter 7 Multiprocessors and Multicomputers 7.1 Multiprocessor System Interconnects.

4. Multiprocessors Main Structures 4.1 Shared Memory x Distributed Memory Shared-Memory (Global-Memory) Multiprocessor:  All processors can access all.

Introduction to Parallel Processing Ch. 12, Pg

Basic Computer Organization CH-4 Richard Gomez 6/14/01 Computer Science Quote: John Von Neumann If people do not believe that mathematics is simple, it.

CMSC 611: Advanced Computer Architecture Parallel Computation Most slides adapted from David Patterson. Some from Mohomed Younis.

Chapter 5 Array Processors. Introduction  Major characteristics of SIMD architectures –A single processor(CP) –Synchronous array processors(PEs) –Data-parallel.

Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.

Chapter One Introduction to Pipelined Processors.

Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.

PIPELINING AND VECTOR PROCESSING

RICE UNIVERSITY ‘Stream’-based wireless computing Sridhar Rajagopal Research group meeting December 17, 2002 The figures used in the slides are borrowed.

High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)

1 Introduction CEG 4131 Computer Architecture III Miodrag Bolic.

Stored Program A stored-program digital computer is one that keeps its programmed instructions, as well as its data, in read-write,

Programmable Logic Controllers LO1: Understand the design and operational characteristics of a PLC system.

Lecture 3: Computer Architectures

Khaled A. Al-Utaibi  I/O Ports  I/O Space VS Memory Space  80x86 I/O Instructions − Direct I/O Instructions − Indirect I/O Instructions.

3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,

An Overview of Parallel Processing

Los Alamos National Laboratory Streams-C Maya Gokhale Los Alamos National Laboratory September, 1999.

Architecture of a Massively Parallel Processor Kenneth E. Batcher 1980 presented by Yao Wu April 25, 2003.

Array computers. Single Instruction Stream Multiple Data Streams computer There two types of general structures of array processors SIMD Distributerd.

STUDY OF PIC MICROCONTROLLERS.. Design Flow C CODE Hex File Assembly Code Compiler Assembler Chip Programming.

Computer Organization

Computers’ Basic Organization

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

Overview Parallel Processing Pipelining

CLASSIFICATION OF PARALLEL COMPUTERS

Control Unit Operation

Topics SRAM-based FPGA fabrics: Xilinx. Altera..

CMSC 611: Advanced Computer Architecture

Distributed Processors

Lecture 5: Computer systems architecture

buses, crossing switch, multistage network.

Course Outline Introduction in algorithms and applications

Synchronous array of parallel processors is an array processor

Presented by: Tim Olson, Architect

Course Name: Computer Application Topic: Central Processing Unit (CPU)

Register Transfer and Microoperations

An Introduction to Microprocessor Architecture using intel 8085 as a classic processor

How does an SIMD computer work?

Processor Organization and Architecture

Array Processor.

Multiprocessor Introduction and Characteristics of Multiprocessor

Morgan Kaufmann Publishers Computer Organization and Assembly Language

Control Unit Introduction Types Comparison Control Memory

CS 213: Parallel Processing Architectures

Multivector and SIMD Computers

Parallel Processing Architectures

buses, crossing switch, multistage network.

Chap. 9 Pipeline and Vector Processing

AN INTRODUCTION ON PARALLEL PROCESSING

Md. Mojahidul Islam Lecturer Dept. of Computer Science & Engineering

Md. Mojahidul Islam Lecturer Dept. of Computer Science & Engineering

Part 2: Parallel Models (I)

Overview Last lecture Digital hardware systems Today

COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING

The Programmable Peripheral Interface (8255A)

Husky Energy Chair in Oil and Gas Research

Multiprocessor System Interconnects

Presentation transcript:

Laxmi Narayan Bhuyan http://www.cs.ucr.edu/~bhuyan SIMD Architectures Laxmi Narayan Bhuyan http://www.cs.ucr.edu/~bhuyan

Data Parallel Model Operations can be performed in parallel on each element of a large regular data structure, such as an array 1 Control Processsor broadcast to many PEs (see Ch. 1, Fig. 1-25, page 45 of [CSG99]) When computers were large, could amortize the control portion of many replicated PEs Condition flag per PE so that can skip Data distributed in each memory Early 1980s VLSI => SIMD rebirth: 32 1-bit PEs + memory on a chip was the PE Data parallel programming languages lay out data to processor

Data Parallel Model Vector processors have similar ISAs, but no data placement restriction SIMD led to Data Parallel Programming languages Advancing VLSI led to single chip FPUs and whole fast µProcs (SIMD less attractive) SIMD programming model led to Single Program Multiple Data (SPMD) model All processors execute identical program Data parallel programming languages still useful, do communication all at once: “Bulk Synchronous” phases in which all communicate after a global barrier

SIMD Programming – High-Performance Fortran (HPF) Single Program Multiple Data (SPMD) FORALL Construct similar to Fork: FORALL (I=1:N), A(I) = B(I) + C(I), END FORALL Data Mapping in HPF 1. To reduce interprocessor communication 2. Load balancing among processors http://www.npac.syr.edu/hpfa/ http://www.crpc.rice.edu/HPFF/

How does an SIMD computer work? A Host computer is necessary to do the I/O operations The user program is loaded into the control memory The data is distributed to all the memory modules The control unit decodes the instn and executes it if it is a scalar instn. If it is a vector instn, it broadcasts the control signals to the PEs to do the executions Before broadcasting the control signals, the CU broadcasts an enable vector which will enable the PEs

Masking and Data Routing Mechanisms A,B,C – working registers Si = status (1 active, 0 inactive) Ri – Data routing register Di – holds address Ii – Index register

Example

Matrix Multiplication

N * N Mesh

The Illiac IV Architecture Distributed memory architecture 64 PEs connected as an 8X8 2-D mesh with end around connection LDB: Local Data Buffer 64, 64-bit each PEM: 2K X 64 bits memory

The Illiac IV Network

Maspar MP-1 Architecture Configuration with 1K-16K PEs are available Each PE has a 4-bit ALU, 1-bit logic unit, a 64-bit mantissa unit, a 16-bit exponent unit, communication input and output ports Each PE has 40 32-bit registers available to the programmer Each processor board has 1024 PEs arranges as 64 PE clusters (PECs) with 16 PEs per cluster Each PEC is a chip connected to 8 neighbors via an octagonal mesh Another network, called Multistage Crossbar Network, with three router stages gives a function of 1024X1024 crossbar for routing from any PEC to another PEC