Architecture of a Massively Parallel Processor Kenneth E. Batcher 1980 presented by Yao Wu April 25, 2003.

Slides:



Advertisements
Similar presentations
Computer Organization, Bus Structure
Advertisements

PIPELINE AND VECTOR PROCESSING
Computer Organization and Architecture
The University of Adelaide, School of Computer Science
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Parallell Processing Systems1 Chapter 4 Vector Processors.
Chapter 11 Instruction Sets
Computer Organization. This module surveys the physical resources of a computer system. –Basic components CPUMemoryBus I/O devices –CPU structure Registers.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Computer Science: An Overview Tenth Edition by J. Glenn Brookshear Chapter.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
\course\eleg652-03F\Topic1a- 03F.ppt1 Vector and SIMD Computers Vector computers SIMD.
Data Manipulation Computer System consists of the following parts:
Chapter 17 Parallel Processing.
CPEN Digital System Design Chapter 9 – Computer Design
ECE 526 – Network Processing Systems Design
0 What is a computer?  Simply put, a computer is a sophisticated electronic calculating machine that:  Accepts input information,  Processes the information.
Chapter 5 Array Processors. Introduction  Major characteristics of SIMD architectures –A single processor(CP) –Synchronous array processors(PEs) –Data-parallel.
STARAN Parallel processor system hardware By KENNETH E. BATCHER Presented by Manoj k. Yarlagadda Manoj k. Yarlagadda.
Computer Organization Computer Organization & Assembly Language: Module 2.
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Basics and Architectures
1 Chapter 04 Authors: John Hennessy & David Patterson.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Advanced Computer Architecture 0 Lecture # 1 Introduction by Husnain Sherazi.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
RICE UNIVERSITY ‘Stream’-based wireless computing Sridhar Rajagopal Research group meeting December 17, 2002 The figures used in the slides are borrowed.
CPEN Digital System Design
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.
Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.
Copyright © 2015 Pearson Education, Inc. Chapter 2: Data Manipulation.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Computer Architecture SIMD Ola Flygt Växjö University
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Data Manipulation Brookshear, J.G. (2012) Computer Science: an Overview.
Stored Program A stored-program digital computer is one that keeps its programmed instructions, as well as its data, in read-write,
Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.
Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.
Chapter 2: Data Manipulation
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.
1 Basic Processor Architecture. 2 Building Blocks of Processor Systems CPU.
Lecture Overview Shift Register Buffering Direct Memory Access.
Copyright © 2005 – Curt Hill MicroProgramming Programming at a different level.
My Coordinates Office EM G.27 contact time:
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
Vector computers.
Array computers. Single Instruction Stream Multiple Data Streams computer There two types of general structures of array processors SIMD Distributerd.
CPIT Program Execution. Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the.
1 Chapter 1 Basic Structures Of Computers. Computer : Introduction A computer is an electronic machine,devised for performing calculations and controlling.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
A computer consists of five functionally independent main parts.
DIRECT MEMORY ACCESS and Computer Buses
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
What is a computer? Simply put, a computer is a sophisticated electronic calculating machine that: Accepts input information, Processes the information.
What is a computer? Simply put, a computer is a sophisticated electronic calculating machine that: Accepts input information, Processes the information.
Introduction of microprocessor
How does an SIMD computer work?
Laxmi Narayan Bhuyan SIMD Architectures Laxmi Narayan Bhuyan
Computer Architecture
Array Processor.
Multivector and SIMD Computers
Chapter 2: Data Manipulation
Chapter 2: Data Manipulation
Introduction to Computer Architecture
COMPUTER ARCHITECTURES FOR PARALLEL ROCESSING
Chapter 2: Data Manipulation
Presentation transcript:

Architecture of a Massively Parallel Processor Kenneth E. Batcher 1980 presented by Yao Wu April 25, 2003

Kenneth E. Batcher

OUTLINE Background Data-level parallelism  SIMD design Architecture of MPP ARU, ACU, PDMU and staging memory Performance Conclusion

Design Goal Application domain? Image processing --- data level parallelism The expected workload? between 10 9 and operations per second. --- Very fast (massive parallelism) Cost? --- Special-purpose machine

Data Level Parallelism Each task performs the same series calculations, but applies them to different data. B(I) = A(I) * 4  LOAD A(I) MULT 4 STORE B(I)

Data Parallelism Execution time P1 P2 P3 SOTRE(1)SOTRE(2)SOTRE(3)MULT 4 LOAD(1)LOAD(2)LOAD(3)

SIMD Architecture

Advantage vs. Disadvantage of SIMD Advantages: Simplicity of concept and programming SIMD architectures are deterministic Scalability of size and performance No explicit synchronization is required Disadvantages: Lack of applicability to a wide variety of problems Places enormous demand on processor- memory interconnection bandwidth

Massively Parallel Processor Designed by Goodyear Aerospace Corp. in 1983 Target performance: 10 9 to operations per second to process an average of bits per day. Retired in March 1991 after 8 years of service to the NASA scientific community On October 29, 1996, NASA officially handed over the world’s first Massively Parallel Processor to the Smithsonian Collection in a ceremony held in Maryland.

Block Diagram of MPP

Array Unit (ARU) 2D processing problem  2D planes rather than as a number of words or bytes Logically, 16,384 Processing elements (PEs) organized in 128 x 128 square Redundant rectangle of 128 x 4 PEs for fault recovery Each PE is bit-serial to handle operands of any length PEs are connected in a 2D mesh where each PE communicates with its four neighbors: up, down, left, and right

ARU figure

Processing Element (PE)

A-plane P-plane C-plane B-plane

ARU (S-plane) Handles data input & output for the ARU On input On output Handle input and output simultaneously

ARU (Memory Plane) The capacity is 16,777,216 data bits (over 2MB) A memory plane of 16,384 bits can be randomly accessed and transferred in one machine cycle Bit-serial Processing

ARU (Processing Plane) There are 35 processing planes in the ARU 30 processing plane are in a planar shift register. P-Plane (logic and routing operations) G-Plane (mask operation) A-Plane B-Plane C-Plane Sum-or full-add operation

An example of G-plane (mask) Clear all negative items to 0. sign plane G-plane result masked-clear

Array Control Unit (ACU) Controls operations in the ARU Performs scalar arithmetic Three independent control units:  Processing Element Control Unit (PECU) Controls operations in the processing planes of the ARU  I/O Control Unit (IOCU) Controls S-plane operations in the ARU  Main Control Unit (MCU) Executes the main application program of MPP Performs scalar processing

ACU figure

Program and Data Management Unit (PDMU) Controls the overall flow of program and data in the system PDMU is a minicomputer (DEC PDP-11) with custom interface to ACU and ARU

Staging Memory Transfers data between PDMU and ARU Reorders array of data  Pixel format to bit-serial format Reordering via common 2 19 bit multidimensional-access memory (MDA)

Speed of typical operations Operations Speed (MOPS) Addition 8 bit int (9-bit sum) bit int(13-bit sum) bit fp430 Multiplication 8 bit int(16-bit product) bit int(24-bit product) bit fp216 Multiplication by scalar 8 bit int(16-bit product) bit int(24-bit product) bit fp373

Conclusion The MPP is a ultra high speed SIMD processor designed to process 2D image data It is fully programmable Lack of applicability to a wide variety of problems “We never found any other customers for the MPP even though it was one of the fastest machines available at that time.”

References “The Massively Parallel Processor” J. L. Potter, ed. The MIT Press, 1985 “25 Years of the International Symposia on Computer Architecture”, selected papers, Gurindar Sohi, ed “Computer Architecture: A Quantitative Approach”

Thank you ! Questions?