Download presentation
Presentation is loading. Please wait.
Published byFerdinand Wilfred Carter Modified over 8 years ago
1
Architecture of a Massively Parallel Processor Kenneth E. Batcher 1980 presented by Yao Wu April 25, 2003
2
Kenneth E. Batcher
3
OUTLINE Background Data-level parallelism SIMD design Architecture of MPP ARU, ACU, PDMU and staging memory Performance Conclusion
4
Design Goal Application domain? Image processing --- data level parallelism The expected workload? between 10 9 and 10 10 operations per second. --- Very fast (massive parallelism) Cost? --- Special-purpose machine
5
Data Level Parallelism Each task performs the same series calculations, but applies them to different data. B(I) = A(I) * 4 LOAD A(I) MULT 4 STORE B(I)
6
Data Parallelism Execution time P1 P2 P3 SOTRE(1)SOTRE(2)SOTRE(3)MULT 4 LOAD(1)LOAD(2)LOAD(3)
7
SIMD Architecture
8
Advantage vs. Disadvantage of SIMD Advantages: Simplicity of concept and programming SIMD architectures are deterministic Scalability of size and performance No explicit synchronization is required Disadvantages: Lack of applicability to a wide variety of problems Places enormous demand on processor- memory interconnection bandwidth
9
Massively Parallel Processor Designed by Goodyear Aerospace Corp. in 1983 Target performance: 10 9 to 10 10 operations per second to process an average of 10 13 bits per day. Retired in March 1991 after 8 years of service to the NASA scientific community On October 29, 1996, NASA officially handed over the world’s first Massively Parallel Processor to the Smithsonian Collection in a ceremony held in Maryland.
10
Block Diagram of MPP
11
Array Unit (ARU) 2D processing problem 2D planes rather than as a number of words or bytes Logically, 16,384 Processing elements (PEs) organized in 128 x 128 square Redundant rectangle of 128 x 4 PEs for fault recovery Each PE is bit-serial to handle operands of any length PEs are connected in a 2D mesh where each PE communicates with its four neighbors: up, down, left, and right
12
ARU figure
13
Processing Element (PE)
14
A-plane P-plane C-plane B-plane 1 0 1 0 1 1 1 0 1 0 1 0 1 1 1 0 0 0 1 1 1 0 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 1 0 1 1 0 1 0
15
ARU (S-plane) Handles data input & output for the ARU On input On output Handle input and output simultaneously
16
ARU (Memory Plane) The capacity is 16,777,216 data bits (over 2MB) A memory plane of 16,384 bits can be randomly accessed and transferred in one machine cycle Bit-serial Processing
17
ARU (Processing Plane) There are 35 processing planes in the ARU 30 processing plane are in a planar shift register. P-Plane (logic and routing operations) G-Plane (mask operation) A-Plane B-Plane C-Plane Sum-or full-add operation
18
An example of G-plane (mask) Clear all negative items to 0. sign plane G-plane result masked-clear 1 0 1 0 1 1 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0
19
Array Control Unit (ACU) Controls operations in the ARU Performs scalar arithmetic Three independent control units: Processing Element Control Unit (PECU) Controls operations in the processing planes of the ARU I/O Control Unit (IOCU) Controls S-plane operations in the ARU Main Control Unit (MCU) Executes the main application program of MPP Performs scalar processing
20
ACU figure
21
Program and Data Management Unit (PDMU) Controls the overall flow of program and data in the system PDMU is a minicomputer (DEC PDP-11) with custom interface to ACU and ARU
22
Staging Memory Transfers data between PDMU and ARU Reorders array of data Pixel format to bit-serial format Reordering via common 2 19 bit multidimensional-access memory (MDA)
23
Speed of typical operations Operations Speed (MOPS) Addition 8 bit int (9-bit sum)6553 12 bit int(13-bit sum)4428 32 bit fp430 Multiplication 8 bit int(16-bit product)1861 12 bit int(24-bit product)910 32 bit fp216 Multiplication by scalar 8 bit int(16-bit product)2340 12 bit int(24-bit product)1260 32 bit fp373
24
Conclusion The MPP is a ultra high speed SIMD processor designed to process 2D image data It is fully programmable Lack of applicability to a wide variety of problems “We never found any other customers for the MPP even though it was one of the fastest machines available at that time.”
25
References “The Massively Parallel Processor” J. L. Potter, ed. The MIT Press, 1985 “25 Years of the International Symposia on Computer Architecture”, selected papers, Gurindar Sohi, ed. 1998 “Computer Architecture: A Quantitative Approach”
26
Thank you ! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.