COCOA(1/19) Real Time Systems LAB. COCOA MAY 31, 2001 김경임, 박성호
COCOA(2/19) Real Time Systems LAB. Contents Background COCOA Overview System Architecture Key Technologies Application Area Evaluation Conclusion References
COCOA(3/19) Real Time Systems LAB. Background A Thesis in Aerospace engineering, Pennsylvania State Univ. by Anirudh Modi, 1999 –“Unsteady separated flow simulations using a cluster of workstations” Need to a suitable platform for the efficiency & accuracy of PUMA(a parallel flow solver) –Resolving several steady solutions –A fully three-dimensional unsteady separated flow around a sphere PUMA : the Parallel Unstructured Maritime Aerodynamics Financial support : the Rotorcraft Center of Excellence(RCOE) at Penn State
COCOA(4/19) Real Time Systems LAB. COCOA Overview The COst effective COmputing Array(COCOA) A Beowulf cluster that have 50 processors To bring low cost parallel computing –The whole system cost approximately $100,000 (1998 US dollars) Performance –the benchmark shows that was almost twice as fast as the Penn State IBM SP (older RS/ nodes) supercomputer for this applications
COCOA(5/19) Real Time Systems LAB. System Architecture Computing Node(26 WS-410 Dell W/S ) –Dual 400MHz Intel Pentium II Processors w/512K L2 Cache –512MB SDRAM –4GB UW-SCSI2 Disk –3com 3c509B 100Mbits/sec Fast Ethernet Card –32x SCSI CD-ROM Drive –1.44MB FDD –Cables In addition, –One Baynetworks 450T 24-way 100Mbits/sec Switch –Two 16-way Monitor/keyboard/mouse Switches –Four 500 kVa APC UPS –For one server : one monitor, keyboard, mouse and 54GB extra UW-SCSI2 HDD
COCOA(6/19) Real Time Systems LAB. System Architecture cont. Setting up H/W
COCOA(7/19) Real Time Systems LAB. System Architecture cont. Operating System –RedHat Linux 5.1 Software –Base packages from RedHat Linux 5.1, Kernel# –Freeware GNU C/C++ compiler(gcc, pgcc) –Fortran77/90 compiler & Debugger by Portland Group –Freeware MPI libraries for parallel programming in C/C++/Fortran77/90 –ssh for secure access –DQS v3.0, a queueing system –Scientific Visualization Software TECPLOT from Amtec Corp.
COCOA(8/19) Real Time Systems LAB. Key Technologies Beowulf Cluster –A system which usually consists of one server node, and one or more client nodes connected together via Ethernet or some other fast network –Developed for large scale computing, such as aerodynamics, atmosphere, physics, etc. –First Developed at 1994 in NASA –Low price supercomputing is possible High performance/low price processors High speed network devices available –Numerous Beowulf clusters developed Used in various computational science fields
COCOA(9/19) Real Time Systems LAB. Key Technologies cont. DQS (Distributed Queuing System) –Developed to experiment batch queuing system at the Super-computer Computations Research Institute, Florida State Univ. –Provide a single coherent allocation and management MPI (Message Passing Interface) –Standard for parallel programming SSH (Secure Shell) –Program for logging & executing commands into/on a remote machine –Provides secure encrypted communication inter-un-trusted hosts over an insecure network
COCOA(10/19) Real Time Systems LAB. Application Area Analysis maritime aerodynamics –Analysis flows over complex configurations (like ships and helicopter fuselages) –Use PUMA –Details of problem: Helicopter can safely land on frigate in the North Sea only 10 percent of the time in winter
COCOA(11/19) Real Time Systems LAB. PUMA ( P arallel U nstructured M aritime A erodynamics) Program for analysis of internal and external non- reacting compressible flows over arbitrarily complex 3D geometries Written entirely in ANSI C using MPI library for message passing and hence highly portable giving good performance
COCOA(12/19) Real Time Systems LAB. PUMA ( P arallel U nstructured M aritime A erodynamics) cont. Use domain decomposition –Domain decomposition Distribute data across processes, and each process performing approximately same operation on the data Problem level parallelism, but loop level (not SIMD) Minimize communications cost –Functional decomposition Divides a problem into several distinct tasks that may be executed in parallel Parallelization in PUMA –Each compute node read its own portion of the grid file at startup –Each compute node generate the flow solution over the given grid, parallelly
COCOA(13/19) Real Time Systems LAB. PUMA ( P arallel U nstructured M aritime A erodynamics) cont.
COCOA(14/19) Real Time Systems LAB. PUMA ( P arallel U nstructured M aritime A erodynamics) cont. Modifications to PUMA –Modify PUMA to read several hundred lines at a time and broadcasting the combined data to every processor using a reasonably sized buffer –Modify MPI to combine several small messages into one before starting communication Mbits/sec vs Packet size on COCOA for MPI_Send/Recv test
COCOA(15/19) Real Time Systems LAB. PUMA ( P arallel U nstructured M aritime A erodynamics) cont. Improvement in PUMA performance after combining several small MPI messages into one
COCOA(16/19) Real Time Systems LAB. Evaluation Total Mflops vs Number of Processors on COCOA for PUMA test case Speed-up vs Number of Processors on COCOA for PUMA test case
COCOA(17/19) Real Time Systems LAB. Evaluation cont. NAS Parallel Benchmark on COCOA: comparison with other machines for Class “C” LU test
COCOA(18/19) Real Time Systems LAB. Conclusion Beowulf class supercomputer (PC, Linux, MPI, DQS, SSH) Cost effective supercomputer for numerical simulations –Almost twice as fast compared to the Penn State IBM-SP supercomputer, for our production codes including PUMA, given the same number of processors, while being built at a fraction of the cost ($100,000(1998 US dollars)). Be suitable for only numerical simulation part (weather, fluid...) that doesn’t have high communication to computation ratios, because of the high communication latency. Good scalability with most of the MPI applications used The Object, to build Cost effective supercomputer for numerical simulations dealt with at Penn State has been fulfilled.
COCOA(19/19) Real Time Systems LAB. References COCOA : NAS Parallel Benchmarks : Beowulf : RedHat : MPI : DQS : Tons of references…