Download presentation
Presentation is loading. Please wait.
Published byMarilyn Sullivan Modified over 8 years ago
1
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory w.smith@daresbury.ac.uk
2
Computational Science & Engineering Departm CSE Parallel Computers: Shared Memory M p0p0p0p0 p1p1p1p1 p3p3p3p3 p2p2p2p2
3
Computational Science & Engineering Departm CSE Parallel Computers: Distributed Memory m0m0m0m0 m1m1m1m1 m2m2m2m2 m3m3m3m3 m4m4m4m4 m5m5m5m5 m6m6m6m6 m7m7m7m7 m8m8m8m8 p0p0p0p0 p1p1p1p1 p2p2p2p2 p3p3p3p3 p4p4p4p4 p5p5p5p5 p6p6p6p6 p7p7p7p7 p8p8p8p8
4
Computational Science & Engineering Departm CSE Parallel Computers: Virtual Shared Memory m0m0m0m0 m1m1m1m1 m2m2m2m2 m3m3m3m3 m4m4m4m4 m5m5m5m5 m6m6m6m6 m7m7m7m7 m8m8m8m8 p0p0p0p0 p1p1p1p1 p2p2p2p2 p3p3p3p3 p4p4p4p4 p5p5p5p5 p6p6p6p6 p7p7p7p7 p8p8p8p8
5
Computational Science & Engineering Departm CSE Parallel Computers: Beowulf Clusters M M M M P0P0P0P0 P1P1P1P1 P2P2P2P2 P3P3P3P3 Ethernet/FDDI
6
Computational Science & Engineering Departm CSE Important Issues in Parallel Processing ● Load Balancing: –Sharing work equally between processors –Sharing memory requirement equally –Maximum concurrent use of each processor ● Communication: –Maximum size of messages passed –Minimum number of messages passed –Local versus global communications –Asynchronous communication
7
Computational Science & Engineering Departm CSE Scaling in Parallel Processing ● Type 1 scaling –Fixed number of processors –Scaling of elapsed time with total workload –Ideal: elapsed time directly proportional to workload ● Type 2 scaling (strong scaling) –Fixed total workload –Performance scaling with number of processors –Ideal: double processor count - double performance ● Type 3 scaling (weak scaling) –Fixed workload per processor –Scaling of elapsed time with number of processors –Ideal: double processor count – constant elapsed time
8
Computational Science & Engineering Departm CSE Time required per step: T s =T p +T c where –T s is the time per step –T p is the processing (computation) time/step –T c is the communication time/step Performance Analysis (i)
9
Computational Science & Engineering Departm CSE Performance Analysis (ii) Can also write: T s =T p (1+R cp ) where: R cp= T c/ T p R cp is the Fundamental Ratio NB Assume synchronous communications, without overlap of communication and computation
10
Computational Science & Engineering Departm CSE Initialize Forces Motion Properties Summarize Molecular Dynamics Basics ● Key stages in MD simulation: ● Set up initial system ● Calculate atomic forces ● Calculate atomic motion ● Calculate physical properties ● Repeat ! ● Produce final summary
11
Computational Science & Engineering Departm CSE Basic MD Parallelization Strategies ● This Lecture –Computing Ensemble –Hierarchical Control –Replicated Data ● Next Lecture –Systolic Loops –Domain Decomposition
12
Computational Science & Engineering Departm CSE Setup Forces Motion Stats. Results Setup Forces Motion Stats. Results Setup Forces Motion Stats. Results Setup Forces Motion Stats. Results Proc 0Proc 3Proc 2Proc 1 Parallel MD Algorithms: Computing Ensemble
13
Computational Science & Engineering Departm CSE Computing Ensemble ● Advantages: –Simple to implement - no comms! –Maximum parallel efficiency – excellent throughput –Perfect load balancing –Good scaling behaviour (types 1 and 3) –Suitable method for Monte Carlo –Suitable for parallel replica applications (e.g. hyperdynamics) ● Disadvantages: –Limited to current physical systems –Limited to short timescale dynamics –Algorithmic sterility –Offers no new intellectual challenges
14
Computational Science & Engineering Departm CSE Parallel Replica Dynamics (i) Original configuration Equilibration Period Production Periods t block Decorrelation Period t corr {p i } p0p0 p1p1 p M-1 pMpM X Procs Minimization: m 0 m 1 m 2 …. Transition
15
Computational Science & Engineering Departm CSE Parallel Replica Dynamics (ii) Procedure: 1.Replicate system on M processors. Minimize to get `initial’ state. 2.Equilibrate system using different {v i } on all M (check in same state). Accumulated time t sum =0 3.Run for time t block then minimize & check for transition. Accumulated time t sum =t sum +M t block 4.If no transition repeat step 3. 5.If transition found on proc `i’ continue run for time t corr. Accumulated time t sum =t sum + t corr. 6.Take configuration on proc I as new state, proceed to step 1. Accuracy on time of transition: +/- t block
16
Computational Science & Engineering Departm CSE Parallel Tempering (i) T0T0 T1T1 TiTi T M-1 TMTM Original configuration Equilibrate Temperature Production Period Close down {p i } pMpM p M-1 p1p1 p0p0 Procs Monte Carlo Trials
17
Computational Science & Engineering Departm CSE Parallel Tempering (ii) Procedure: ● Start M simulations (n=0 to M- 1) of model system at different temperatures T=nDT+T 0 ● Equilibrate systems for N equil steps. ● At intervals of N sample steps attempt Monte Carlo controlled swap of the configurations of two processors chosen at random. ● Continue the simulation until the distribution of configuration energies in the lowest temperature system follows Boltzmann. ● Calculate physical properties of low temperature system(s) ● Save all replica configurations for possible restart.
18
Computational Science & Engineering Departm CSE Parallel MD Algorithms: Hierarchical Control - Task Farming Proc 0 Forces Forces Forces Forces Proc 4 Proc 1 Proc 2 Proc 3 Setup Allocate Motion Stats. Results Forces Proc n
19
Computational Science & Engineering Departm CSE Parallel MD Algorithms: Task Farming ● Advantages –Can work with heterogeneous computers –Historical precedence ● Disadvantages –Poor comms hierarchy –Hard to load balance –Poor scaling (types 1 & 2) –Danger of deadlock
20
Computational Science & Engineering Departm CSE Hierarchical Control - Master-Slave Proc 0 Proc 2 Proc 1 Proc 5 Proc 4 Proc 3 Proc 6
21
Computational Science & Engineering Departm CSE Master-Slave MD Algorithm ● Advantages: –Can work on heterogeneous computers –Better comms strategy than task farming ● Disadvantages: –Poor load balancing characteristics –Difficult to scale with system size and processor count
22
Computational Science & Engineering Departm CSE Parallel MD Algorithms: Replicated Data Initialize Forces Motion Statistics Summary Initialize Forces Motion Statistics Summary Initialize Forces Motion Statistics Summary Initialize Forces Motion Statistics Summary Proc 0 Proc 1Proc 2 Proc N-1
23
Computational Science & Engineering Departm CSE Replicated Data MD Algorithm ● Features: –Each node has copy of all atomic coordinates (Ri,Vi,Fi) –Force calculations shared equally between nodes (i.e. N(N-1)/2P pair forces per node). –Atomic forces summed globally over all nodes –Motion integrated for all or some atoms on each node –Updated atom positions circulated to all nodes –Example of Algorithmic Decomposition
24
Computational Science & Engineering Departm CSE Processing Time: Communications Time (hypercube comms): with Replicated Data Performance Analysis NB: O(N 2) Algorithm
25
Computational Science & Engineering Departm CSE Fundamental Ratio: Large N (N>>P): Small N (N~P):
26
Computational Science & Engineering Departm CSE Replicated Data MD Algorithm ● Advantages: –Simple to implement –Good load balancing –Highly portable programs –Suitable for complex force fields –Good scaling with system size (Type 1) –Dynamic load balancing possible ● Disadvantages: –High communication overhead –Sub-optimal scaling with processor count (Type 2) –Large memory requirement –Unsuitable for massive parallelism
27
Computational Science & Engineering Departm CSE RD Load Balancing: DL_POLY Brode-Ahlrichs decomposition
28
Computational Science & Engineering Departm CSE RD Load Balancing: Atom Decomposition Atoms i p0p0 p1p1 p2p2 p3p3 p4p4 Atoms j F ij
29
Computational Science & Engineering Departm CSE Force Decomposition F ij popo p1p1 p 14 p 13 p 12 p 11 p 10 p3p3 p2p2 p9p9 p8p8 p7p7 p6p6 p5p5 p4p4 Atoms i Atoms j Note: Need not be confined to replicated data approach!
30
Computational Science & Engineering Departm CSE Replicated Data: Intramolecular Forces (i)
31
Computational Science & Engineering Departm CSE Replicated Data: Intramolecular Forces (ii) Molecular force field definition Global Force Field P 0 Local forceterms P 1 Local forceterms P 2 Local forceterms Processors
32
Computational Science & Engineering Departm CSEwith: Long Ranged Forces: The Ewald Summation
33
Computational Science & Engineering Departm CSE Parallel Ewald Summation (i) ● Self interaction correction - as is. ● Real Space terms: –Handle as for short ranged forces –For excluded atom pairs replace erfc by -erf ● Reciprocal Space Terms: –Distribute over atoms –Distribute over k-vectors
34
Computational Science & Engineering Departm CSE Partition over atoms: Add to Ewald sum on all processors Global Sum pPpPpPpP p3p3p3p3 p2p2p2p2 p1p1p1p1 p0p0p0p0 Repeat for each k vector Parallel Ewald Summation (ii) Note: Stack sums for efficiency
35
Computational Science & Engineering Departm CSE Parallel Ewald Summation (iii) pPpP p2p2 p1p1 p0p0 Different k on each processor Partition over k-vectors
36
Computational Science & Engineering Departm CSE The End
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.