Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign
Contributors Principal investigators – Laxmikant Kale, Klaus Schulten, Robert Skeel Development team –Milind Bhandarkar, Robert Brunner, Attila Gursoy, Neal Krawetz, Ari Shinozaki, …...
Middle layers Applications Parallel Machines “Middle Layers”: Languages, Tools, Libraries
Molecular Dynamics Collection of [charged] atoms, with bonds Newtonian mechanics At each time-step Calculate forces on each atom bonds: non-bonded: electrostatic and van der Waal’s Calculate velocities and Advance positions 1 femtosecond time-step, millions needed! Thousands of atoms (1, ,000)
Molecular Dynamics Collection of [charged] atoms, with bonds Newtonian mechanics At each time-step –Calculate forces on each atom bonds: non-bonded: electrostatic and van der Waal’s –Calculate velocities and Advance positions 1 femtosecond time-step, millions needed! Thousands of atoms (1, ,000)
Further MD Use of cut-off radius to reduce work – Å –Faraway charges ignored! % work is non-bonded force computations Some simulations need faraway contributions
NAMD Design Objectives Performance Scalability –To a small and large number of processors –small and large molecular systems Modifiable and extensible design –Ability to incorporate new algorithms –Reusing new libraries without re-implementation –Experimenting with alternate strategies
Force Decomposition Distribute force matrix to processors Matrix is sparse, non uniform Each processor has one block Communication: N/sqrt(P) Ratio: sqrt(P) Better scalability (can use 100+ processors) Hwang, Saltz, et al: 6% on 32 Pes 36% on 128 processor
Spatial Decomposition
Spatial decomposition modified
Implementation Multiple Objects per processor –Different types: patches, pairwise forces, bonded forces, –Each may have its data ready at different times –Need ability to map and remap them –Need prioritized scheduling Charm++ supports all of these
Charm++ Data Driven Objects Object Groups: –global object with a “representative” on each PE Asynchronous method invocation Prioritized scheduling Mature, robust, portable
Data driven execution Scheduler Message Q
Object oriented design Two top level classes: –Patches: cubes containing atoms –Computes: force calculation Home patches and Proxy patches –Home patch sends coordinates to proxies, and receives forces from them –Each compute interacts with local patches only
Compute hierarchy Many compute subclasses: –Allow reuse of coordination code –Reuse of bookkeeping tasks –Easy to add new types of force objects Example: steered molecular dynamics Implementor focuses on the new force functionality
Multi-paradigm programming Long-range electrostatic interactions –Some simulations require this feature –Contributions of faraway atoms can be computed infrequently –PVM based library, DPMTA Developed at Duke, by John Board, et al Patch life cycle –better expressed as a thread
Converse Supports multi-paradigm programming Provides portability Makes it easy to implement RTS for new paradigms Several languages/libraries: –Charm++, threaded MPI, PVM, Java, md-perl, pc++, nexus, Path, Cid, CC++,..
Namd2 with Converse
Separation of concerns Different developers, with different interests and knowledge, can contribute effectively –Separation of communication and parallel logic –Threads to encapsulate “life-cycle” of patches –Adding new integrator, improving performance, new MD ideas, can be performed modularly and independently
Load balancing Collect timing data for several cycles Run heuristic load balancer –Several alternative ones Re-map and migrate objects accordingly –Registration mechanisms facilitate migration Needs a separate talk!
Performance: size of system
Performance: various machines
Speedup
Conclusion Multi-domain decomposition works well for dynamically evolving, or irregular apps –When supported by data driven objects (Charm++), user level threads, call backs Multi-paradigm programming is effective! Object oriented parallel programming: –promotes reuse, –good performance Measurement based load balancing