Parallel Computers Past and Present Yenchi Lin Apr 17,2003
Outline Concepts/Background on Parallel Computers Connection Machines Earth Simulator Conclusion
Quick architecture overview SIMD, MIMD Shared memory, distributed memory MPP, PVP, SMP NOW Network of Workstations (clusters)
SIMD, MIMD SIMD – Single Instruction Multiple Data All processors perform same instruction on different pieces of data Some processors can be masked out from executing certain instructions MIMD – Multiple Instruction Multiple Data Each processor executes different instruction on different data
Memory Shared Memory Single, unified address space across all processors Distributed Memory Each processor has its own address space Hybrid Multiple processors within a computing node share the same address space, while the whole system has many different address spaces.
Processors PVP – parallel vector processors Cray, NEC, Hitachi MPP – massively parallel processors Connection Machines SMP – symmetric multiple processor Sun SunFire, DEC (Compaq/HP) AlphaServer
D.E. Culler, J.P. Singh, A. Gupta “Parallel Computer Architecture – A Hardware/Software Approach”
Trends (cont.) D.E. Culler, J.P. Singh, A. Gupta “Parallel Computer Architecture – A Hardware/Software Approach” The trend of MPP overtaking SMP has continued, as number of NOW (clusters) grow in TOP 500 list.
Connection Machines Invented by Dennis Hills of Thinking Machines Corp. while at MIT. Originally designed to run artificial intelligence applications First working application on CM-1 : Game of Life CM-1(1985), CM-2 (1986) and CM-5 (1992) Richard Feynman helped in building the first CM-1s. At its peak, 70 machines were installed around the world and all in TOP 500 list. Thinking Machines Corp. filed bankruptcy in 1993, changed to pure software company in 1996, bought by Oracle in 1999.
CM-2 – 1986 SIMD hypercube connection 1bit processor in groups of dimension for 8192 processor configuration, 12 dimension for processor configuration. Programming languages – C*, * lisp, CM Fortran
Sprint Node in CM-2 1 bit-serial processors 16 in a group, two groups on the board Two groups share same memory and floating point unit Router has limited processing power 12 degree connectivity!
Hypercube Connection in CM-2 Maximum hop count in hypercube = dimension of hypercube Router randomly pick the next hop High wire count Four dimensional hypercube
CM-5 – 1992 Distributed memory multi- processor Sparc + custom vector units Fat Tree structure Programming Languages – C*, * lisp, CM Fortran, HPF, C++, etc Supports partitioning, multi-user
Processing Element in CM-5 33Mhz SPARC Vector processor Network interface 32MB memory Connected using Sun MBus Network access treated equally as memory access – expensive for larger message
Fat-Tree of CM-5 Three networks – data, control and diagnostic, synchronized on 40Mhz clock 4-ary fat tree, each processor as leaf Two parents per child for the first two levels Four parents per child for higher levels Data network of CM-5
Transition from CM-2 to CM-5 1-bit serial processors -> 64bit SPARCs SIMD -> MIMD Use SPMD to emulate SIMD behavior Hypercube -> Fat-Tree Randomness preserved by random routing
Earth Simulator – 2002 Collection of modified NEC SX nodes, 8 way each 12.3GB/s x 2 network Theoretical throughput 40TFlops Max throughput 36TFlops running Linpack
Programming Models of ES MPI/HPF on node level and process level OpenMP, threads Automatic Vectorization
Organization of ES 320 processor node (PN) cabinet, 2 nodes each 65 interconnect (IN) cabinet Crossbar of 640 nodes 12.3GB/s x 2 (bidirectional) node-to-node, 8TB/s aggregated 900TB disk space, 1.6 PB tape storage
PN of ES Arithmetic Processor (SX-6) Memory (512MB)
Arithmetic Processor Total of 640 x 8 = 5112 arithmetic processors
remarks Initial Cost: Development: 40Billion Yen (USD $400M) Physical Building: 7Billion Yen (USD $70M) Operating cost: Maintenance: 8Billion Yen/Year (USD $80M) USD $2.54/sec Electricity: 800Million Yen/Year (USD $8M)
Eye Candies PN cabinet, 9AP’s in one Back of a PN cabinet 1 AP, 9 in one cabinet SX-6i
Conclusion Connection machines were interesting Earth simulator is also interesting Early designs versus recent design GigaFlops vs. TeraFlops When will Americans take back the crown in supercomputing?
