Presentation is loading. Please wait.

Presentation is loading. Please wait.

The performance of NAMD on a large Power4 system

Similar presentations


Presentation on theme: "The performance of NAMD on a large Power4 system"— Presentation transcript:

1 The performance of NAMD on a large Power4 system
Joachim Hein EPCC, The University of Edinburgh

2 Measurement based load balancing
NAMD measures its performance for the first 200 steps Redistributes the work load to optimise the performance Performance benefit for larger number of processors Benchmark time: Better estimate for production jobs from short jobs NAMD on Power4 11 May 2019

3 Measurement based load balancing
NAMD on Power4 11 May 2019

4 Loadbalance Example: All but one CPUs in a narrow Window 128 CPUs
96769 atoms 32000 iters All but one CPUs in a narrow Window Effect of “slow” guy negligible NAMD on Power4 11 May 2019

5 Tune it! MP_EAGER_LIMIT
Environment variable MP_EAGER_LIMIT changes the behaviour of MPI Messages smaller than MP_EAGER_LIMIT are send instantaneous Messages larger than MP_EAGER_LIMIT are send using “hand-shake” Default value is small and not optimal for NAMD Tune it! NAMD on Power4 11 May 2019

6 MP_EAGER_LIMIT NAMD on Power4 11 May 2019

7 Sample loadleveler script
shell = /bin/ksh job_type = parallel network.MPI = csss,shared,us account_no = z001 output = namd_run.$(schedd_host)_$(jobid).out error = namd_run.$(schedd_host)_$(jobid).err wall_clock_limit = 00:30:00 node = 1 tasks_per_node = 8 queue export MP_SHARED_MEMORY=yes export MP_EAGER_LIMIT=65536 poe path/namd2 inputfile Communication: shared memory Setting MP_EAGER_LIMIT Set path & inputfile NAMD on Power4 11 May 2019

8 Benchmark Joint Amber Charm (JAC) Benchmark Apo A-1 benchmark
Dihydrofolate reductase in water, atoms Apo A-1 benchmark Apolipoprotein A-1, atoms TCR peptide-MHC 96796 atoms F1-ATP synthase F1 subunit of ATP synthase, atoms NAMD on Power4 11 May 2019

9 The HPCx system Presently: Future (Summer 2004)
40 IBM p690 Regata H frames 32 POWER4 processors per frame (1.3 GHz) Frames subdivided into LPARs of 8 processors 8 GB of main memory per LPAR IBM SP Switch2 (Colony) network 2 switch adapters per LPAR Dual plane Future (Summer 2004) Upgrade to p690+ frames (1.7 GHz) LPARs of 32 processors IBM HPS (Federation) network NAMD on Power4 11 May 2019

10 Time per step for 32 processors
Benchmark NAMD 2.4 NAMD 2.5 Comment dhf reductase 23558 atoms 0.051s 0.032s Too small for 32 cpus APO A-1 92224 atoms 0.28s 0.19s TCR MHC 96796 atoms 0.30s 0.21s F1-ATP atoms 0.58s NAMD 2.5 substantially faster than NAMD 2.4 NAMD on Power4 11 May 2019

11 Large number of processors
NAMD on Power4 11 May 2019

12 Further Reading Full technical report: The performance of NAMD on HPCx
Joachim Hein NAMD on Power4 11 May 2019


Download ppt "The performance of NAMD on a large Power4 system"

Similar presentations


Ads by Google