N-Body Gravitational Simulations Joshua White Patrick Loftus
Overview Purpose and types of n-body simulations Gravitational simulations Derivation of direct gravitational algorithm HOT n-body simulation algorithm Derivation of tree-reduction algorithm Results Conclusions Questions
Purpose and types of n-body simulations A simulation of a dynamical system of particles Usually under the effects of a physical force (e.g., gravity, Coulomb force, etc.) Used to model interactions between bodies and forces Used over scales from the absolute smallest possible (quantum many-body simulations) to the absolute largest possible (cosmological evolution simulations) Common applications: galaxy formation and evolution, star cluster interactions, subatomic and quantum interactions, thermodynamic fluid simulations We will focus on gravitational simulations, but the principles are generally applicable to all types of n-body simulations
Gravitational simulations 32,768 solar mass bodies 1.8 million solar mass central core 1 billion year simulation time Video source: Wikimedia Commons
Gravitational simulations Newton’s Law of Universal Gravitation The force between two objects is proportional to the product of their masses and inversely proportional to the square of the distance between them Formally: 𝐹 12 =− 𝐹 21 =𝐺 𝑚 1 𝑚 2 𝑟 2 𝑟 Where the Universal Gravitational Constant, G = 6.67384 x 10-11 m3/kg*s2 Image source: Wikimedia Commons
Gravitational simulations For interactions involving multiple bodies, the resultant force acting on a body is equal to the sum of the contributions from all other bodies. 𝐹 𝑟𝑒𝑠 = 𝑚 𝑖 𝑎 𝑖 = 𝑖,𝑗 𝐺 𝑚 𝑖 𝑚 𝑗 𝑟 𝑖𝑗 2 𝑟 𝑖𝑗 The resultant acceleration of the body can be found by canceling its mass contribution, yielding: 𝑎 𝑖 = 𝐺 𝑚 𝑗 𝑟 𝑖𝑗 2 𝑟 𝑖𝑗
Gravitational simulations For direct simulation, the acceleration is evaluated at discrete time steps, then integrated. 𝑥 𝑡 = 𝑡 0 𝑡 𝑎 𝑥 𝑡 ⅆ𝑡 𝑑𝑡 Given velocity vx, initial position x0 and treating all variables except t as constant over a single time step, the above can be directly integrated. 𝑥 𝑡 = 𝑥 0 + 𝑣 𝑥 t+ 1 2 𝑎 𝑥 𝑡 2 The velocity is then updated for the next iteration. 𝑣 𝑥 𝑡 = 𝑣 0 + 𝑎 𝑥 𝑡 Yields time complexity of 𝒪( 𝑛 2 )
Derivation of direct gravitational algorithm The “heart” of algorithm Determine instantaneous acceleration of each body Direct integration to determine position at next time step Write positions to file Selection of OpenMP Necessity to synchronize at each step Excessive message passing between processors/threads Use of OpenMP “parallel for” pragma creates implicit barrier after each iteration Time complexity 𝒪( 𝑛 2 𝑝 )
Derivation of direct gravitational algorithm The “heart” of the algorithm
Derivation of direct gravitational algorithm Updating the acceleration vectors
Derivation of direct gravitational algorithm Updating the position and velocity vectors
HOT n-body simulation algorithm Breaks the universe into octants using a hashed oct-tree Interactions with non-adjacent octants are “smoothed” using system’s center of mass Assumes fixed universe size Tree must be rebuilt for each body Load balancing issues Synchronization issues Time complexity 𝒪(𝑛 log 𝑛 )
HOT n-body simulation algorithm Universe is partitioned into octants (quadrants shown for clarity) Each octant repartitioned until each contains one body Center of mass for partition calculated Image source: Burtscher and Pingali, 2011
HOT n-body simulation algorithm Interaction with non-adjacent regions calculated based on center of mass Interactions with bodies in adjacent regions calculated directly Preserves resolution of short distance interactions Problematic because tree is dependent on fixed universe size Image source: Burtscher and Pingali, 2011
Derivation of tree-reduction algorithm Resultant force points to center of mass Magnitude determined by combined masses of bodies Recall 𝑎 𝑖 = 𝐺 𝑚 𝑗 𝑟 𝑖𝑗 2 𝑟 𝑖𝑗 Center of mass given by 𝑠 𝑐𝑚 = 1 𝑀 𝑡𝑜𝑡 𝑖 𝑠 𝑖 𝑚 𝑖 Thus, large-scale behavior can be approximated using center of mass of the entire cluster
Derivation of tree-reduction algorithm By sacrificing small-scale resolution, the number of trees per iteration can be reduced to one Using a binary reduction, the center of mass of the entire cluster is calculated Each body removes its contribution to mass total and center of mass when calculating force interaction Has the advantage of making no assumptions about overall universe size Not suitable where fine resolution interactions are required, but preserves large-scale behavior of system Time complexity 𝒪(𝑛+ log 𝑛 )
Derivation of tree-reduction algorithm Tree reduction to determine center of mass
Derivation of tree-reduction algorithm Updating acceleration, position, and velocity vectors
Results N-body simulation results Timing results What are our bodies doing and why are they behaving that way? Timing results Was the tree algorithm significantly faster than the direct algorithm? As we added threads, did we see speedup for both algorithms?
N-body simulation results Direct algorithm simulation Tree algorithm simulation
N-body simulation results Direct algorithm
N-body simulation results Tree algorithm
Barnes-hut approximation
White-loftus approximation
Timing results
Timing results: Direct algorithm
Timing results: Tree algorithm
Conclusions Data confirms that tree algorithm is exponentially faster than the direct algorithm Both algorithms showed significant speedup when ported to OpenMP N-body simulations are fun!
2048 bodies!
References An efficient CUDA implementation of the tree-based Barnes Hut n-body algorithm. Martin Burtscher and Keshav Pingali. In GPU Computing Gems Emerald Edition, pages 75-92. Morgan Kaufmann, 2011. 2HOT: an improved parallel hashed oct-tree n-body algorithm for cosmological simulation. Michael S. Warren. In SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. Article No. 72. A parallel hashed oct-tree n-body algorithm. Michael S. Warren. In Proceedings of the 1993 ACM/IEEE conference on Supercomputing. Pages 12-21.