HELICS Petteri Johansson & Ilkka Uuhiniemi
HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825 Gflops –COTS -> 1.3M EUROs
HELICS 256 GBytes ECC RAM 10 TB local disks Myrinet 2000 (fiber) 6 switches (128 port) Ethernet Peak performance 512*2.8GFlops = 1.43TFlops
Interconnections –Myrinet 2000 –10 ns latency (one way) 2+2 Gbs Full duplex bandwidth –bisectional bandwith: 128x (2+2) Gbs
Additional equipment 32 Double node Myrinet cluster for interactive development 2 Front End PC as access, compilation, job distribution hosts 1 Administration server 1 Fileserver (Sun Fire 880) + 2 Tbyte Raid 5 diskarray 10 Tbyte tape backup remote power control device
Problems Hardware errors: 3 power supplies, 3 hard disks, 2 motherboards, 8 Myrinet network cards Software: Kernel (stable), 2 nodes crash due to daemon crashes
Clustering What is needed? –Booting concept: Network boot (dhcp) –cluster installation installation via network –power control remote access of power supplies, seq. power off/on, reset –BIOS control update and setting via network, direct access via serial link –health control of nodes fan speed, cpu temp and disk status gathering via network
Clustering reliability of resources –spare hosts, redundant servers availability monitoring & accounting –gathering system+job status, accounting infos via network batching concepts –Score cluster software
Clustering application optimization –tracing + profiling tools (vampir, paraver) debugging of parallel applications –Debugger: Totalview, P2D2, PGI
Software SCore Cluster System Software is a high- performance parallel programming environment for workstation and PC clusters
SCORE Heterogeneous Programming Language Multiple Programming Paradigms Parallel Programming Support –Real-time process activity monitor –Deadlock detection –Automatic debugger attachment
SCORE Fault tolerance –Preemptive checkpoint –Parallel process migration Flexible Job Scheduling –Gang scheduling –Batch scheduling
USAGE Reactive flows Optimization problems Technical simulations Image processing Bio-computing/Bioinformatics