Array computers. Single Instruction Stream Multiple Data Streams computer There two types of general structures of array processors SIMD Distributerd.

Slides:



Advertisements
Similar presentations
Chapter 2: Data Manipulation
Advertisements

Computer Architecture
SE-292 High Performance Computing
Computer Organization and Architecture
Processor System Architecture
CEN 226: Computer Organization & Assembly Language :CSC 225 (Lec#3) By Dr. Syed Noman.
Parallell Processing Systems1 Chapter 4 Vector Processors.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Types of Parallel Computers
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
\course\eleg652-03F\Topic1a- 03F.ppt1 Vector and SIMD Computers Vector computers SIMD.

Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.
Introduction to Parallel Processing Ch. 12, Pg
Unit-1 PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE Advance Processor.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Chapter 5 Array Processors. Introduction  Major characteristics of SIMD architectures –A single processor(CP) –Synchronous array processors(PEs) –Data-parallel.
Processor Structure & Operations of an Accumulator Machine
Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
Computer Processing of Data
Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Multiprocessor systems Objective n the multiprocessors’ organization and implementation n the shared-memory in multiprocessor n static and dynamic connection.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
CPU Computer Hardware Organization (How does the computer look from inside?) Register file ALU PC System bus Memory bus Main memory Bus interface I/O bridge.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Fall 2012 Chapter 2: x86 Processor Architecture. Irvine, Kip R. Assembly Language for x86 Processors 6/e, Chapter Overview General Concepts IA-32.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
An Overview of Parallel Computing. Hardware There are many varieties of parallel computing hardware and many different architectures The original classification.
1 Introduction CEG 4131 Computer Architecture III Miodrag Bolic.
Parallel architecture Technique. Pipelining Processor Pipelining is a technique of decomposing a sequential process into sub-processes, with each sub-process.
Execution of an instruction
Module : Algorithmic state machines. Machine language Machine language is built up from discrete statements or instructions. On the processing architecture,
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Birds Eye View of Interconnection Networks
Principles of Linear Pipelining
Computer Structure & Architecture 7b - CPU & Buses.
1 Interconnection Networks. 2 Interconnection Networks Interconnection Network (for SIMD/MIMD) can be used for internal connections among: Processors,
Computer Architecture 2 nd year (computer and Information Sc.)
Electronic Analog Computer Dr. Amin Danial Asham by.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Overview von Neumann Architecture Computer component Computer function
Computer operation is of how the different parts of a computer system work together to perform a task.
The Central Processing Unit (CPU)
Question What technology differentiates the different stages a computer had gone through from generation 1 to present?
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Fundamentals of Programming Languages-II
2/16/2016 Chapter Four Array Computers Index Objective understand the meaning and structure of array computer realize the associated instruction sets,
CMSC 104, Lecture 061 Stored Programs A look at how programs are executed.
1 Basic Processor Architecture. 2 Building Blocks of Processor Systems CPU.
The Processor & its components. The CPU The brain. Performs all major calculations. Controls and manages the operations of other components of the computer.
BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
Parallel Architecture
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.
How does an SIMD computer work?
COMP4211 : Advance Computer Architecture
Processor Organization and Architecture
Laxmi Narayan Bhuyan SIMD Architectures Laxmi Narayan Bhuyan
Parallel and Multiprocessor Architectures
Chap. 9 Pipeline and Vector Processing
Advanced Computer and Parallel Processing
COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING
Advanced Computer and Parallel Processing
Presentation transcript:

Array computers

Single Instruction Stream Multiple Data Streams computer There two types of general structures of array processors SIMD Distributerd Memory SIMD Shared memory

Array computers Shared Memory SIMD CP – Control Processor CP MM – Control Processor Memory Module PE – Processing Element MM – Memory Module, IS – Instruction Stream DS – Data Stream

Array computers Distributed Memory SIMD CP – Control Processor CP MM – Control Processor Memory Module PE – Processing Element MM – Memory Module, IS – Instruction Stream DS – Data Stream

Array computers Interconnection networks Static interconnection topologies: 2-D mesh Star Tree Ring 2-D torus Illiac mesh (in a minute...)

Array computers Interconnection networks Static interconnection topologies (continued): Hypercube Network size - 16 nodes Node degree – 4 links Network diameter – max 4 links need to be traversed between any 2 nodes

Array computers Interconnection networks Dynamic interconnection topologies (various solutions), three main categories Buses Crossbar Multistage switches Example: 8x8 Omega network with 2x2 switches possible switch connections

Array computers Illiac IV In 1966 the Department of Defense Advanced Research Projects contracted the University of Illinois to build large parallel processing computer, the Illiac IV It did not become operational until 1976 at NASA’s Ames research Center

Array computers Illiac IV One of the most infamous supercomputers ever It used early ideas on SIMD The project started in 1965, it used 64 processors and a 13MHz clock. In 1976 it ran its first sucessfull application. It had 1MB memory (64x16KB) Its actual performance was 15 MFLOPS, it was estimated in initial predictions to be 1000 MFLOPS Only a quarter of the fully planned machine was ever built, costs escalated from the $8 million estimated in 1966 to $31 million by 1972, and the computer took three more years of enginering before it was operational

Array computers Illiac IV – functional structure Illiac IV system Illiac IV arrayIlliac IV I/O Array processor Control unit 64 PE64 PEM General purpose computer B-6500 Peripherals Disk file system I/O subsystem Control descriptor Buffer In/Out switch Functional structure of the Illiac IV system

Array computers Illiac IV – general structure In the original design four arrays of processing elements were planned Only one of the four arrays was actually built

Illiac IV – control unit Instruction buffer (128) Local data buffer (64 64-bit words) 4 64-bit accumulator registers 64-bit arithmetic logic unit 24-bit address adder Queue of addresses and data sent to processing elements

Array computers Illiac IV – routing network

Illiac IV – processing element Accumulator (RGA 64-bit) B register (RGB 64-bit) Holds the second operand in a binary operation (such as ADD, SUBTRACT, MULTIPLY, or DIVIDE) Routing register (RGR 64- bit) Used to transmit information from one PE to another S register (RGS 64-bit) Temporary storage register Index register (RGX 16-bit) Index register to modify the address field of an instruction Mode register (RGD 8-bit) Controls the active or nonactive status of each PE independently

Array computers Illiac IV – processing element memory Each PE has its own 2048-word 64-bits per word random-access memory Each memory is called a PEM, and they are numbered 0 through 63 also PE and PEM taken together are called a processing unit or PU. PE i may only access PEM i so that one PU cannot modify the memory of another PU Information can, however, be passed from one PU to another via the routing network, which is one of the 4 paths by which data flow through the Illiac IV array

Array computers Illiac IV – data paths Data paths in the array of Illiac IV

Array computers Illiac IV – data paths Control Unit Bus (CUB) Instructions or data from the PEMs in blocks of eight words can be sent to the CU via the CU bus Operating system takes care of fetching and executing instructions, data can also be fetched in blocks of eight words under program control using the CU bus Common Data Bus (CDB) Information stored in the CU can be "broadcast" to the entire 64 PE array simultaneously via the CDB A value such as a constant to be used as a multiplier need not be stored 64 times in each PEM; instead this value can be stored within a CU register and then broadcast to each enabled PE in the array

Array computers Illiac IV – data paths Routing network Information in one PE register can be sent to another PE register by special routing instructions (information can be transferred from PE register to PEM by standard LOAD or STORE instructions) High-speed routing lines run between every RGR of every PE and its nearest left and right neighbor (distances of -1 and + 1, respectively) and its neighbor 8 positions to the left and 8 positions to the right (-8 and +8, respectively)

Array computers Illiac IV – data paths Mode-bit line Mode-bit line consists of one line coming from the RGD of each PE in the array Mode-bit line can transmit one of the eight mode bits of each RGD in the array up to an ACAR in the CU If this bit is the bit which indicates whether or not a PE is on or off, we can transmit a "mode pattern" to an ACAR. This mode pattern reflects the status or on-offness of each PE in the array There are instructions which are executed completely within the CU that can test this mode pattern and branch on a zero or nonzero condition In this way branching in the instruction stream can occur based on the mode pattern of the entire 64-PE array.

Array computers Programming Illiac IV Hardware-first design Relatively poor software support Rudimentary operating system (for example, no support for disk management) GLYPNIR (ALGOL derivative) Provided statements which identified vector arithmetic suitable for parallel implementation CFD (Computational Fluid Dynamics) Fortran based language written for work at NASA None of these languages hid the architecture of the machine very well, but they allowed the development of machine specific programs in a High Level Language.

Examples of algorithms executed on array computers '*' in the array subscripts identifies that all the elements of a vector are to be operated upon simultaneously Programmer needs to be aware that only 64 processors are available, and so needs to structure the computation to make use of this Compiler will split a computation up into a number of 64 element iterations of the instruction where required

Examples of algorithms executed on array computers J=1 DO 1 I=1,3 A(*) = A(*) + A(* + J) J = J * 2 1 CONTINUE Time 0 Time 1 Time 2 Time 3

Decoupling sequential code DO 10 I = 2, A(I) = B(I) + A(I-1) 63 statements: A(2) = B(2) + A(1) A(3) = B(3) + A(2)... A 2 = B 2 + A 1 A 3 = B 3 + A 2 = B 3 + B 2 + A 1 A 4 = B 4 + A 3 = B 4 + B 3 + B 2 + A 1... A N = B N + B N-1...B 2 + A 1

Decoupling sequential code S = A(1) DO 10 N = 2,8 S = S + B(N) 10 A(N) = S A(1) B(2) B(3) B(4) B(5) B(6) B(7) B(8) EP 0 EP 1 EP 2 EP 3 EP 4 EP 5 EP 6 EP Step 1Step 2Step 3

Decoupling sequential code 1.Enable all PEs (turn ON all PEs) 2.All PEs LOAD RGA from location  3.i  0 4.All PEs LOAD RGR from their RGA (this instruction is performed by all PEs, whether they are ON (enabled) or OFF (disabled)) 5.All PEs ROUTE their RGR contents a distance of 2 i to the right (this instruction is also performed by all PEs, regardless of whether they are ON or OFF) 6.j  2 i  1 7.Disable PEs number 0 through j (turn them OFF) 8.All enabled PEs ADD to RGA, the contents of RGR 9.i  i+1 10.If i<3 go back to step 4, otherwise to step Enable all PEs 12.All PEs STORE the contents of RGA to location  + 1

Array computers Subsequent designs BSP – Burroughs Scientific Processor (1977) MPP – Massively Parallel Processor (1983) CM-1, CM-2 Thinking Machines