Gravitational N-body Simulation Major Design Goals -Efficiency -Versatility (ability to use different numerical methods) -Scalability Lesser Design Goals -Flexibility (control parameters must be configurable) -Persistence (pause and continue) -Visualization
Hardware Single Computer Configuration -1-4 CPUs -1-4 Cores -3-4 GHz CPUs bit FP IPC bit FP IPC -Windows Cluster Configurations -LION-XO (80x2xOpteron/8GB + 40x4xOpteron/16GB; 2.4 GHz) -1.6 TFlops (32-bit); 800 GFlops (64-bit); single-core assumed -Gigabit Ethernet -GNU/Linux -Single or dual core CPUs? CPU Model? 6 GFlops average desktop 256 GFlops top-line server
Algorithms Direct Methods: O(N 2 ) + very simple + scalable - inefficient (~30,000 particles 256 GFlops) Treecode / Mutipole: O(NlogN) - more difficult to implement - scalability harder to achieve + efficient ( particles) Field Methods: O(NlogN) or O(N) Involves solving Poisson’s equation Area of active research
Levels of Parallelization 1) SIMD: up to 4 threads -4x32-bit flops/cycle -2x64-bit flops/cycle 2) SMP/MPU: up to 4 threads -1-4 cores -1-4 CPUs 3) Cluster: up to N nodes
Memory Requirements 1)Position: x, y, z 2)Velocity: vx, vy, vz 6x4 = 24 bytes (32-bit fp) 6x8 = 48 bytes (64-bit fp) 2,500 points per KB (32-bit) 1,300 points per KB (64-bit)
Levels of Memory 1) L1 cache: 64 KB -CPU clock-speed -no latency 2) L2 cache: 1 MB -CPU clock-speed -low latency 3) RAM: GBs -reduced speed (up to 12-24GB/s) -huge latency 4) Network (weakest link) -1 Gbit/sec
10 9 Particles Require… Memory: 24 GB (32-bit) Instructions per iteration: Log 2 (10 9 )x10 9 xconst~3x10 12 ops=3T Flops Time: ~ GFlops