Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architecture of a Massively Parallel Processor Kenneth E. Batcher 1980 presented by Yao Wu April 25, 2003.

Similar presentations


Presentation on theme: "Architecture of a Massively Parallel Processor Kenneth E. Batcher 1980 presented by Yao Wu April 25, 2003."— Presentation transcript:

1 Architecture of a Massively Parallel Processor Kenneth E. Batcher 1980 presented by Yao Wu April 25, 2003

2 Kenneth E. Batcher

3 OUTLINE Background Data-level parallelism  SIMD design Architecture of MPP ARU, ACU, PDMU and staging memory Performance Conclusion

4 Design Goal Application domain? Image processing --- data level parallelism The expected workload? between 10 9 and 10 10 operations per second. --- Very fast (massive parallelism) Cost? --- Special-purpose machine

5 Data Level Parallelism Each task performs the same series calculations, but applies them to different data. B(I) = A(I) * 4  LOAD A(I) MULT 4 STORE B(I)

6 Data Parallelism Execution time P1 P2 P3 SOTRE(1)SOTRE(2)SOTRE(3)MULT 4 LOAD(1)LOAD(2)LOAD(3)

7 SIMD Architecture

8 Advantage vs. Disadvantage of SIMD Advantages: Simplicity of concept and programming SIMD architectures are deterministic Scalability of size and performance No explicit synchronization is required Disadvantages: Lack of applicability to a wide variety of problems Places enormous demand on processor- memory interconnection bandwidth

9 Massively Parallel Processor Designed by Goodyear Aerospace Corp. in 1983 Target performance: 10 9 to 10 10 operations per second to process an average of 10 13 bits per day. Retired in March 1991 after 8 years of service to the NASA scientific community On October 29, 1996, NASA officially handed over the world’s first Massively Parallel Processor to the Smithsonian Collection in a ceremony held in Maryland.

10 Block Diagram of MPP

11 Array Unit (ARU) 2D processing problem  2D planes rather than as a number of words or bytes Logically, 16,384 Processing elements (PEs) organized in 128 x 128 square Redundant rectangle of 128 x 4 PEs for fault recovery Each PE is bit-serial to handle operands of any length PEs are connected in a 2D mesh where each PE communicates with its four neighbors: up, down, left, and right

12 ARU figure

13 Processing Element (PE)

14 A-plane P-plane C-plane B-plane 1 0 1 0 1 1 1 0 1 0 1 0 1 1 1 0 0 0 1 1 1 0 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 1 0 1 1 0 1 0

15 ARU (S-plane) Handles data input & output for the ARU On input On output Handle input and output simultaneously

16 ARU (Memory Plane) The capacity is 16,777,216 data bits (over 2MB) A memory plane of 16,384 bits can be randomly accessed and transferred in one machine cycle Bit-serial Processing

17 ARU (Processing Plane) There are 35 processing planes in the ARU 30 processing plane are in a planar shift register. P-Plane (logic and routing operations) G-Plane (mask operation) A-Plane B-Plane C-Plane Sum-or full-add operation

18 An example of G-plane (mask) Clear all negative items to 0. sign plane G-plane result masked-clear 1 0 1 0 1 1 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0

19 Array Control Unit (ACU) Controls operations in the ARU Performs scalar arithmetic Three independent control units:  Processing Element Control Unit (PECU) Controls operations in the processing planes of the ARU  I/O Control Unit (IOCU) Controls S-plane operations in the ARU  Main Control Unit (MCU) Executes the main application program of MPP Performs scalar processing

20 ACU figure

21 Program and Data Management Unit (PDMU) Controls the overall flow of program and data in the system PDMU is a minicomputer (DEC PDP-11) with custom interface to ACU and ARU

22 Staging Memory Transfers data between PDMU and ARU Reorders array of data  Pixel format to bit-serial format Reordering via common 2 19 bit multidimensional-access memory (MDA)

23 Speed of typical operations Operations Speed (MOPS) Addition 8 bit int (9-bit sum)6553 12 bit int(13-bit sum)4428 32 bit fp430 Multiplication 8 bit int(16-bit product)1861 12 bit int(24-bit product)910 32 bit fp216 Multiplication by scalar 8 bit int(16-bit product)2340 12 bit int(24-bit product)1260 32 bit fp373

24 Conclusion The MPP is a ultra high speed SIMD processor designed to process 2D image data It is fully programmable Lack of applicability to a wide variety of problems “We never found any other customers for the MPP even though it was one of the fastest machines available at that time.”

25 References “The Massively Parallel Processor” J. L. Potter, ed. The MIT Press, 1985 “25 Years of the International Symposia on Computer Architecture”, selected papers, Gurindar Sohi, ed. 1998 “Computer Architecture: A Quantitative Approach”

26 Thank you ! Questions?


Download ppt "Architecture of a Massively Parallel Processor Kenneth E. Batcher 1980 presented by Yao Wu April 25, 2003."

Similar presentations


Ads by Google