Simulating Self Gravitating Planetesimal Disks on the Graphics Processing Unit (GPU) Alice C. Quillen Alex Moore University of Rochester Poster DPS 2008.

Slides:

Advertisements

Similar presentations

Time averages and ensemble averages

Advertisements

UNIT 6 (end of mechanics) Universal Gravitation & SHM

Chapter 12 Gravity DEHS Physics 1.

Christopher McCabe, Derek Causon and Clive Mingham Centre for Mathematical Modelling & Flow Analysis Manchester Metropolitan University MANCHESTER M1 5GD.

1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.

Gravitation Ch 5: Thornton & Marion. Introduction Newton, 1666 Published in Principia, 1687 (needed to develop calculus to prove his assumptions) Newton’s.

Resonance Capture generalization to non- adiabatic regime Observatoire de Cote d’Azur, Jan 2007 Alice Quillen University of Rochester.

LINCS: A Linear Constraint Solver for Molecular Simulations Berk Hess, Hemk Bekker, Herman J.C.Berendsen, Johannes G.E.M.Fraaije Journal of Computational.

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

Ryuji Morishima (UCLA/JPL). N-body code: Gravity solver + Integrator Gravity solver must be fast and handle close encounters Special hardware (N 2 ):

1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Feb 26, 2013, DyanmicParallelism.ppt CUDA Dynamic Parallelism These notes will outline CUDA.

1 Lucifer’s Hammer Derek Mehlhorn William Pearl Adrienne Upah A Computer Simulation of Asteroid Trajectories Team 34 Albuquerque Academy.

Chapter 10 Angular momentum Angular momentum of a particle 1. Definition Consider a particle of mass m and linear momentum at a position relative.

Module on Computational Astrophysics Professor Jim Stone Department of Astrophysical Sciences and PACM.

Derek C. Richardson (U Maryland) PKDGRAV : A Parallel k-D Tree Gravity Solver for N-Body Problems FMM 2004.

Ge/Ay133 How do planetesimals grow to form ~terrestrial mass cores?

Chapter 13 Gravitation.

Module on Computational Astrophysics Jim Stone Department of Astrophysical Sciences 125 Peyton Hall : ph :

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.

GPGPU platforms GP - General Purpose computation using GPU

Newton’s Theory of Gravity and Planetary Motion

Developing Simulations and Demonstrations Using Microsoft Visual C++ Mike O’Leary Shiva Azadegan Towson University Supported by the National Science Foundation.

Codeplay CEO © Copyright 2012 Codeplay Software Ltd 45 York Place Edinburgh EH1 3HP United Kingdom Visit us at The unique challenges of.

Resonance Capture When does it Happen? Alice Quillen University of Rochester Department of Physics and Astronomy.

Javier Junquera Molecular dynamics in the microcanonical (NVE) ensemble: the Verlet algorithm.

Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.

Systems of Particles.

1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.

Basics of molecular dynamics. Equations of motion for MD simulations The classical MD simulations boil down to numerically integrating Newton’s equations.

3D Physics Simulation Steven Durant. Contents Equations –Gravity –Euler’s Method –Simple Collisions –Correct Collisions Efficiency Applications Screenshots.

Gravitation. Gravitational Force and Field Newton proposed that a force of attraction exists between any two masses. This force law applies to point masses.

GPU Broad Phase Collision Detection GPU Graphics Gary J. Katz University of Pennsylvania CIS 665 Adapted from articles taken from GPU Gems III.

Chapter 12 Universal Law of Gravity

Molecular Dynamics Simulations on a GPU in OpenCL Alex Cappiello.

Dense Image Over-segmentation on a GPU Alex Rodionov 4/24/2009.

David Nesvorny David Vokrouhlicky (SwRI) Alessandro Morbidelli (CNRS) David Nesvorny David Vokrouhlicky (SwRI) Alessandro Morbidelli (CNRS) Capture of.

Simplified Smoothed Particle Hydrodynamics for Interactive Applications Zakiya Tamimi Richard McDaniel Based on work done at Siemens Corporate.

1996 Eurographics Workshop Mathieu Desbrun, Marie-Paule Gascuel

Mesh Coarsening zhenyu shu Mesh Coarsening Large meshes are commonly used in numerous application area Modern range scanning devices are used.

Numerical Modeling in Astronomy By Astronomers who sleep at night.

Introduction to Particle Simulations Daniel Playne.

By Dirk Hekhuis Advisors Dr. Greg Wolffe Dr. Christian Trefftz.

David Nesvorny (Southwest Research Institute) David Nesvorny (Southwest Research Institute) Capture of Irregular Satellites during Planetary Encounters.

Alice Quillen University of Rochester Department of Physics and Astronomy Oct, 2005 Submillimet er imaging by Greaves and collaborato rs.

Asteroid Modeling and Prediction Introduction Asteroid impact with Earth is an ever present danger for the inhabitants of our planet. The desire to be.

5. Integration method for Hamiltonian system. In many of formulas (e.g. the classical RK4), the errors in conserved quantities (energy, angular momentum)

Application of the MCMC Method for the Calibration of DSMC Parameters James S. Strand and David B. Goldstein The University of Texas at Austin Sponsored.

An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.

Chapter 14 Systems of Particles.

Planetesimal dynamics in self-gravitating discs Giuseppe Lodato IoA - Cambridge.

David Angulo Rubio FAMU CIS GradStudent. Introduction  GPU(Graphics Processing Unit) on video cards has evolved during the last years. They have become.

Smoothed Particle Hydrodynamics Matthew Zhu CSCI 5551 — Fall 2015.

Mapping Computational Concepts to GPUs Mark Harris NVIDIA.

Space Charge with PyHEADTAIL and PyPIC on the GPU Stefan Hegglin and Adrian Oeftiger Space Charge Working Group meeting –

Improving Swift Hal Levison (PI), SwRI Martin Duncan (CoI), Queen’s University Mark Lewis (CoI), Trinity University David Kaufmann, SwRI.

December 13, G raphical A symmetric P rocessing Prototype Presentation December 13, 2004.

1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu

Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.

Celestial Mechanics I Introduction Kepler’s Laws.

Celestial Mechanics VI The N-body Problem: Equations of motion and general integrals The Virial Theorem Planetary motion: The perturbing function Numerical.

Basic Mechanics. Units Velocity and Acceleration Speed: Time rate of change of position. Velocity: Speed in a specific direction. Velocity is specified.

A novel approach to visualizing dark matter simulations

Capture of Irregular Satellites during Planetary Encounters

Gravity Assists and the use of the Slingshot method

The University of Adelaide, School of Computer Science

How do planetesimals grow to

Dr. Tansel Dökeroğlu University of Turkish Aeronautical Association Computer Engineering Department Ceng 442 Introduction to Parallel.

Computer Animation Algorithms and Techniques

Resonance Capture generalization to non-adiabatic regime

Presentation transcript:

Simulating Self Gravitating Planetesimal Disks on the Graphics Processing Unit (GPU) Alice C. Quillen Alex Moore University of Rochester Poster DPS 2008 semi major axis  Eccentricity 

Abstract Planetesimal and dust dynamical simulations require computations of many gravitational force pairs as well as collision and nearest neighbor detection. These are order N 2 or O(N 2 ) computations for N the number of particles. I describe an efficient parallel implementation of an N-body integrator capable of integrating a self-gravitating planetesimal disk in single floating point precision. The software runs on a GPU (graphics processing unit) and is written in CUDA, NVIDIA’s programming language for video cards. The integrator is a parallel 2 nd order symplectic that works in heliocentric coordinates and barycentric momenta. A brute force implementation for sorting interparticle distances requires O(N 2 ) computations, limiting the numbers of particles that have been simulated. Algorithms recently developed for the GPU, such as the radix sort, can run as fast as O(N) and sort distances between a million particles in a few hundred milliseconds. We explore improvements in collision and nearest neighbor detection algorithms and how we can in future incorporate them into our integrator.

Parallel computations on the GPU Graphics cards are a cheap and new parallel computing platform On a single chip: large number of arithmetic computation units (ALUs) and large on-board memory. The fraction of the chip devoted to arithmetic computation units exceeds that on CPUs. Optimization of code: maximizing the numbers of computations while minimizing slow memory transfers. Having a large on-board memory is extremely efficient as it bypasses many of the man hours involved in writing code that shares information across processors ---a problem of MPI (message passing interface) approaches to parallel computing. ALUs

N-body planetesimal systems in single precision floating point Most video applications do not require double precision floating point calculations Current GPUs either are not capable of doing double precision calculations or have a reduced number of units devoted to this and so are not as fast. Precision limit for single precision is ~ Compare this to the ratio of the mass of Earth to the Sun ~ The force from the Sun swamps that from the Earth if single precision is used and the terms added. To carry out N-body calculations in a circumstellar disk in single precision the central force term must be removed from the force computation sum. We work in a coordinate system that allows interaction terms to be separated from Keplerian evolution.

The Hamiltonian for the N-body system in heliocentric coordinates and associated barycentric canonical momentum separates into 3 terms (Duncan et al. 1998). H Int The interaction term contains the interparticle force pairs but lacks the force from central mass! O(N 2 ) computations. H Sun is a correction caused by working in a heliocentric frame rather than an inertial frame. O(N) computations. H Kep Keplerian evolution for each particle separately O(N) computations.

Second order Symplectic Integrator To advance a timestep we use a series of evolution operators each based on a term in the Hamiltonian. The interaction evolution kernel is called once per timestep because this is the most computationally intensive step. The drift and Keplerian evolution kernels are called twice per timestep but they require only O(N) computations.

Memory layout and Kernels All particle coordinates reside in GPU global memory. Heliocentric/barycentric coordinates used on GPU. Particle coordinates only transferred on+off the GPU for initial setup and outputs. Separate Kernels written for each term in Hamiltonian. A kernel is code that runs on the GPU but called from the CPU. Shared memory on GPU used for computational intense force pair calculation

Interaction Kernel on the GPU All force pairs calculated as efficiently as on a GRAPE board but using ~20 processors on board a video card Faster code when shared memory used. p coordinates transferred to shared memory for each computation tile. Same tiling and almost identical code can be used for calculating total energy as for calculating forces Nyland et al GPU Gems3

Kepstep Kernel f and g functions used to advance particle positions. These functions are computed with an iterative solution of the universal differential Kepler’s equation (e.g.,see Prussing & Conway 1993; chapter 2). A Universal form is used so that hyperbolic, parabolic and eccentric orbits can be computed with the same equation. It’s important that the algorithm be robust --- never give NANs. No case statements are required for different classes of orbits facilitating parallel implementation.

7 Iteration solution of the Universal Kepler’s equation on the GPU Kepler’s equation must be solved iteratively. Usually the iteration is done until the equation is solved within a desired precision level. compute_first_iteration(); while(computed_difference() < desired_precision){ compute_next_iteration(); } However this makes little sense on a GPU where all particles are integrated simultaneously. If one particles pops out of the while statement early it does not speed up the code because this particle has to wait for the other particles to exit the loop. Fortunately we can chose an iteration method that rapidly converges. Using the Laguerre cubic convergence method, I find that 7 iterations is enough for all possible initial conditions to achieve double precision accuracy. On the GPU we just iterate 7 times no matter what. We trade computations for complexity which could have slowed the code. compute_first_iteration(); compute_next_iteration(); compute_next_iteration(); compute_next_iteration(); compute_next_iteration(); compute_next_iteration();

Angular momentum conservation enforcement in the Kepstep kernel In our first implementation we noted a steady loss in angular momentum for particles after 10 6 timesteps were taken. We pinpointed the problem finding that it was due to errors in single precision computation in the Kepstep that don’t average to zero. We got rid of the problem by insisting that angular momentum is conserved during the step. We solve for one of these 4 functions, after using the universal Kepler’s equation to iteratively find the other 3. This ensures that errors in angular momentum average to zero rather than accumulating. Angular momentum conservation implies that

Drift Kernel Requires a sum of all momenta. This is done with a parallel reduction sum (Harris et al. 2008) on the GPU. References: Duncan, M. J., Levison, H. F., & Lee, M. H. 1998, A multiple timestep symplectic algorithm for integrating close encounters, AJ, 116, 2067 Nyland, L., Harris, M., & and Prins, J. 2008, Chap 31, Fast N-body simulation with CUDA, in GPUGems3, edited by Hubert Nguyen, 2008, Addison-Wesley, Upper Saddle River, NJ, page 677 Prussing, J. E. & Conway, B. A. 1993, Orbital Mechanics, Oxford University Press, Inc., New York, New York Harris, M., Sengupta, S, & Owens, J. D. 2008, Chap 39, Parallel Prefix Sum (scan) with CUDA, in GPUGems3,

Going beyond N-body: Identifying Collisions and close approaches Regime of sparse collisions: Collision timescale is longer than an orbital timescale. For Gravitating bodies: N-body uses a smoothing length to prevent spurious large velocity kicks caused by close approaches. Smoothing length is chosen based on typical mean particle spacing or Hill sphere radii. We would rather not use smoothing length but instead identify which particles are likely to approach within a Hill sphere. The 2 body problem can be solved exactly, including collisions leading to planetary growth or planetesimal destruction. For dust particles: We can integrate particles that only feel gravity of massive bodies (and not each others) but can be sufficiently numerous to collide.

Algorithms for close approach and collision detection For gravitating bodies: Typical travel distance in each timestep can be chosen to be small compared to Hill sphere. For planetesimals Hill sphere is large enough that a grid with each cell this size can be made. Procedure for rapidly identifying nearest neighbors: create a hash table for each particle on grid, followed by parallel sort. Can be done faster than the interaction step. For dust So numerous and small that we could never simulate them all. Possible approaches: –sample density distribution with a grid so collisions probabilities can be computed from the density field. Drawback is that this is sensitive to the grid size and the number of particles and time used to construct the density distribution. –Inflate dust particle size and use approach above for gravitating bodies. –Create an SPH type kernel and use this to compute collision probabilities. Creating an SPH kernel can also be done rapidly in parallel on the GPU as sorting by distance can be done rapidly.

More information Alex Moore’s talk on Tuesday at the DPS 2008 meeting Planet Migration through a Self-Gravitating Planetesimal Diskhttp://arxiv.org/abs/ earch1.html, for code exampleshttp://astro.pas.rochester.edu/~aquillen/res earch1.html