Timestepping and Parallel Computing in Highly Dynamic N-body Systems Joachim Stadel University of Zürich Institute for Theoretical.

Slides:



Advertisements
Similar presentations
Time averages and ensemble averages
Advertisements

N-Body I CS 170: Computing for the Sciences and Mathematics.
Does the Fornax dwarf spheroidal have a central cusp or core? Collaborators: Justin Read, Ben Moore, Joachim Stadel, Marcel Zemp and George Lake Tobias.
Survival or disruption of CDM micro-haloes: implications for detection experiments Collaborators: Oleg Y. Gnedin, Ben Moore, Jürg Diemand and Joachim Stadel.
Formation of Globular Clusters in  CDM Cosmology Oleg Gnedin (University of Michigan)
Ryuji Morishima (UCLA/JPL). N-body code: Gravity solver + Integrator Gravity solver must be fast and handle close encounters Special hardware (N 2 ):
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
TeV Particle Astrophysics, Venice, August 29, 2007J. Siegal-Gaskins1 Signatures of ΛCDM substructure in tidal debris Jennifer Siegal-Gaskins in collaboration.
Numerical issues in SPH simulations of disk galaxy formation Tobias Kaufmann, Lucio Mayer, Ben Moore, Joachim Stadel University of Zürich Institute for.
Non-linear matter power spectrum to 1% accuracy between dynamical dark energy models Matt Francis University of Sydney Geraint Lewis (University of Sydney)
Module on Computational Astrophysics Professor Jim Stone Department of Astrophysical Sciences and PACM.
In The Beginning  N-body simulations (~100s particles) – to study Cluster formation  Cold collapse produces too steep a density profile (Peebles 1970)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
A Multiphase, Sticky Particle, Star Formation Recipe for Cosmology
A Multiphase, Sticky Particle, Star Formation Recipe for Cosmology Craig Booth Tom Theuns & Takashi Okamoto.
Simon Portegies Zwart (Univ. Amsterdam with 2 GRAPE-6 boards)
Derek C. Richardson (U Maryland) PKDGRAV : A Parallel k-D Tree Gravity Solver for N-Body Problems FMM 2004.
Module on Computational Astrophysics Jim Stone Department of Astrophysical Sciences 125 Peyton Hall : ph :
Cosmological N-body simulations of structure formation Jürg Diemand, Ben Moore and Joachim Stadel, University of Zurich.
Lens Galaxy Environments Neal Dalal (IAS), Casey R. Watson (Ohio State) astro-ph/ Who cares? 2.What to do 3.Results 4.Problems! 5.The future.
THE STRUCTURE OF COLD DARK MATTER HALOS J. Navarro, C. Frenk, S. White 2097 citations to NFW paper to date.
Module on Computational Astrophysics Jim Stone Department of Astrophysical Sciences 125 Peyton Hall : ph :
박창범 ( 고등과학원 ) & 김주한 ( 경희대학교 ), J. R. Gott (Princeton, USA), J. Dubinski (CITA, Canada) 한국계산과학공학회 창립학술대회 Cosmological N-Body Simulation of Cosmic.
Levels of organization: Stellar Systems Stellar Clusters Galaxies Galaxy Clusters Galaxy Superclusters The Universe Everyone should know where they live:
The Dual Origin of a Simulated Milky Way Halo Adi Zolotov (N.Y.U.), Beth Willman (Haverford), Fabio Governato, Chris Brook (University of Washington, Seattle),
Effects of baryons on the structure of massive galaxies and clusters Oleg Gnedin University of Michigan Collisionless N-body simulations predict a nearly.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Impact of Early Dark Energy on non-linear structure formation Margherita Grossi MPA, Garching Volker Springel Advisor : Volker Springel 3rd Biennial Leopoldina.
Different physical properties contribute to the density and temperature perturbation growth. In addition to the mutual gravity of the dark matter and baryons,
Lecture 3 - Formation of Galaxies What processes lead from the tiny fluctuations which we see on the surface of last scattering, to the diverse galaxies.
Early times CMB.
Solving the Poisson Integral for the gravitational potential using the convolution theorem Eduard Vorobyov Institute for Computational Astrophysics.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
, Tuorla Observatory 1 Galaxy groups in ΛCDM simulations and SDSS DR5 P. Nurmi, P. Heinämäki, S. Niemi, J. Holopainen Tuorla Observatory E. Saar,
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
Simulated Annealing.
The Birth of the Universe. Hubble Expansion and the Big Bang The fact that more distant galaxies are moving away from us more rapidly indicates that the.
Ben Moore Institute for Theoretical Physics, University of Zurich Supercomputing: the past, present and future + Oscar Agertz, Juerg Diemand, Tobias Kaufmann,
Origin of solar systems 30 June - 2 July 2009 by Klaus Jockers Max-Planck-Institut of Solar System Science Katlenburg-Lindau.
Diaspora in Cercetarea Stiintifica Bucuresti, Sept The Milky Way and its Satellite System in 3D Velocity Space: Its Place in the Current Cosmological.
PHY306 1 Modern cosmology 3: The Growth of Structure Growth of structure in an expanding universe The Jeans length Dark matter Large scale structure simulations.
Renaissance: Formation of the first light sources in the Universe after the Dark Ages Justin Vandenbroucke, UC Berkeley Physics 290H, February 12, 2008.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Numerical Modeling in Astronomy By Astronomers who sleep at night.
Lecture 29: From Smooth to Lumpy Astronomy 1143 – Spring 2014.
Simulations by Ben Moore (Univ. of Zurich)
Population of Dark Matter Subhaloes Department of Astronomy - UniPD INAF - Observatory of Padova Carlo Giocoli prof. Giuseppe Tormen May Blois.
On the other hand.... CDM simulations consistently produce halos that are cusped at the center. This has been known since the 1980’s, and has been popularized.
Cosmology and Dark Matter III: The Formation of Galaxies Jerry Sellwood.
Cosmological N-Body Simulation - Topology of Large scale Structure Changbom Park with Juhan Kim (Korea Institute for Advanced Study) & J. R. Gott (Princeton),
Semi-analytical model of galaxy formation Xi Kang Purple Mountain Observatory, CAS.
1 Data Structures for Scientific Computing Orion Sky Lawlor /04/14.
V.M. Sliusar, V.I. Zhdanov Astronomical Observatory, Taras Shevchenko National University of Kyiv Observatorna str., 3, Kiev Ukraine
LISA double BHs Dynamics in gaseous nuclear disk.
Computational Physics (Lecture 10) PHY4370. Simulation Details To simulate Ising models First step is to choose a lattice. For example, we can us SC,
ChaNGa CHArm N-body GrAvity. Thomas Quinn Graeme Lufkin Joachim Stadel James Wadsley Greg Stinson Laxmikant Kale Filippo Gioachin Pritish Jetley Celso.
Massively Parallel Cosmological Simulations with ChaNGa Pritish Jetley, Filippo Gioachin, Celso Mendes, Laxmikant V. Kale and Thomas Quinn.
ChaNGa CHArm N-body GrAvity. Thomas Quinn Graeme Lufkin Joachim Stadel James Wadsley Laxmikant Kale Filippo Gioachin Pritish Jetley Celso Mendes Amit.
Cosmology On Petascale Computers. Fabio Governato, Greg Stinson, Chris Brook, Alison Brooks Joachim Stadel, Lucio Mayer, Ben Moore, George Lake (U. ZH)
ChaNGa CHArm N-body GrAvity. Thomas Quinn Graeme Lufkin Joachim Stadel Laxmikant Kale Filippo Gioachin Pritish Jetley Celso Mendes Amit Sharma.
Gamma-ray emission from warm WIMP annihilation Qiang Yuan Institute of High Energy Physics Collaborated with Xiaojun Bi, Yixian Cao, Jie Liu, Liang Gao,
ChaNGa CHArm N-body GrAvity. Thomas Quinn Graeme Lufkin Joachim Stadel Laxmikant Kale Filippo Gioachin Pritish Jetley Celso Mendes Amit Sharma.
ChaNGa: Design Issues in High Performance Cosmology
Outline Part II. Structure Formation: Dark Matter
Outline Part II. Structure Formation: Dark Matter
Core of Coma Cluster (optical)
Parallel Programming in C with MPI and OpenMP
A Prescription for High-Redshift star formation
Brightest ~500,000 Galaxies in the Northern Hemisphere (1977; RA & DEC only) 2-D “lacework” pattern.
Presentation transcript:

Timestepping and Parallel Computing in Highly Dynamic N-body Systems Joachim Stadel University of Zürich Institute for Theoretical Physics

LSS-Surveys Galaxy Formation Solar System Formation Astrophysical N-body Simulations Physics Apps Gravity Hydro CollisionsNear Integrable SS-Stability

Outline Collisionless Simulations and Resolution Collisionless Simulations and Resolution Parallel Computers Parallel Computers Tree Codes – Tree Codes on Parallel Computers Tree Codes – Tree Codes on Parallel Computers PKDGRAV (and Gasoline) PKDGRAV (and Gasoline) Applications – Various Movies Applications – Various Movies Warm Dark Matter Warm Dark Matter Multistepping Part 1 Multistepping Part 1 New Parallelization Problems New Parallelization Problems Multistepping Part 2 Multistepping Part 2 Initial Conditions – Shells Initial Conditions – Shells Blackhole "Mergers" Blackhole "Mergers" Fast Multipole Method Fast Multipole Method PKDGRAV2 PKDGRAV2 Cosmo Initial Conditions Cosmo Initial Conditions GHALO Simulation GHALO Simulation GHALO Prelim. Results GHALO Prelim. Results - Density Profile - Phase-Space Density - Subhalos & Reionization What next? What next?

WMAP Satellite 2003 Fluctuations in the Microwave Background Radiation The initial conditions for structure formation. The Universe is completely smooth to one part in 1,000 at z=1000.

Greenbank radio galaxy survey (1990) 31,000 galaxies At z=0 and on the very largest scales the distribution of galaxies is in fact homogeneous.

On ´smaller´ scales: redshift surveys

Numerical Simulation From the microwave background fluctuations to the present day structure seen in galaxy redshift surveys.

N-body simulations as models of stellar systems j≠i N x i =∑-  Φ(x i,x j ) ∂ƒ/∂t + [ƒ,H] = 0; ƒdz = 1 ¨ dx/dt = v ; dv/dt = -  Φ ∫ N Typically N simulation << N real so the equation below is NOT the one we should be solving. the Collisionless Boltzman Equation (CBE) CBE is 1st order non-linear PDE. These can be solved by the method of characteristics. The characteristics are the path along which information propagates; for CBE defined by: But these are the equations of motion we had above! ƒ is constant along the characterisics, thus each particle carries a piece of ƒ in its trajectory.

only difficulty is in evaluating Φ Φ(x) ≈ -GM/N ∑ ƒ(z i )/ƒ s (z i ) 1/|x-x´| i=1 N ∫ dz g(z) = lim 1/N ∑ g(z i )/ƒ s (z i ) N∞N∞ N i=1 Φ(x) = -GM ∫ dz´ƒ(z´)/|x-x´| In terms of the distribution function, Monte Carlo : for any reasonable function g(z), z i are randomly chosen with sampling probability density ƒ s Apply this to the Poisson Integral So in a conventional N-body simulation ƒ s (z) = ƒ(z), so the particle density represents the underlying phase space density.

Softening dE/dt = x i ∂E/∂x i + v i ∂E/∂v i + ∂E/∂t = ∂Φ/∂t The singularity at x = x´ in the Poisson integral causes very large scatter in the estimation of Φ. This results in a fluctuation in the potential, δΦ, which has 2 effects. 1. Change in the particle‘s energy along its orbit: Fluctuations in Φ due to discrere sampling will cause a random walk in enegy for the particle: this is two-body relaxation. 2. Mass segregation: if more and less massive particles are present, the less massive ones will typically recoil from an encounter with more velocity than a massive particle. Softening, either explicitly introduced or as part of the numerical method, lessens these effects.

All N-body simulations of the CBE suffer from 2-body relaxation! This is even more important for cosmological simulations where all structures formed from smaller initial objects. All particles experienced a large relative degree of relaxation in the past. Diemand and Moore 2002

Increasing Resolution Cluster Resolved 67,500 Galaxy Halos Resoved 1,300,000 Dwarf Galaxy Halos Resolved 10,500,000

zBox : (Stadel & Moore) AMD MP2200+ processors, 144 Gigs ram, 10 Terabyte disk Compact, easy to cool and maintain Very fast Dolphin/SCI interconnects - 4 Gbit/s, microsecond latency A teraflop computer for $500,000 ($250,000 with MBit) Roughly one cubic meter, one ton and requires 40kilowatts of power

Parallel supercomputing

500 CPUs/640 GB RAM ~100 TB of Disk A parallel computer is currently still mostly wiring. The human brain (Gary Kasparov) is no exception. However, wireless CPUs are now under development which will revolutionize parallel computer construction.

Spatial Binary Tree k-D Treespatial binary with squeeze

Forces are calculated using a 4th order multipoles. Ewald summation technique used to introduce periodic boundary conditions (also based on a 4th order expansion). Work is tracked and fed back into domain decomp.

Compute time vs. Accuracy

Parallelizing Gravity (PKDGRAV) Spatial Locality = Computational Locality Spatial Locality = Computational Locality (1/r^2) This means it is benificial to divide space in order to achieve load balance. Minimizes communication with other processors. But... add constraint on the number of particles/processor, Memory is limitted! But... add constraint on the number of particles/processor, Memory is limitted! Domain Decomposition is a global optimization of these requirements which is solved dynamically with every step. Domain Decomposition is a global optimization of these requirements which is solved dynamically with every step. Example division of space for 8 processors

Other decomposition strategies...

How are non-local parts of the tree walked by PKDGRAV? CPU i CPU j Low latency message passing Local cache of remote data elements PKDGRAV does not attempt to determine in advance which data elements are going to be required in a step (LET). The hit rate in the cache is very good with as little as 10 MB.

PKDGRAV Scaling On the T3E it was possible to obtain 80% of linear scaling on 512 processors. PKDGRAV Joachim Stadel Thomas Quinn

GASOLINE: Wadsley, Stadel & Quinn NewA 2003 Fairly standard SPH formulation is used in GASOLINE SPH is very well matched to a particle based gravity code like PKDGRAV since all the core data structures and many of the same algorithms can be used. For example, the neighbor searching can simply use the parallel distrinuted tree structure. Evrard 88, Benz 89 Hernquist & Katz 89 Monaghan 92

Algorithms within GASOLINE We perform 2 NN operations We perform 2 NN operations 1. Find 32 NN and calculate densities. 2. Calculate forces in a second pass. For active particles we do a gather on the k-NN, and a scatter from the k- Inverse NN. We never store the nearest neighbors. (Springel 2001 similar) For active particles we do a gather on the k-NN, and a scatter from the k- Inverse NN. We never store the nearest neighbors. (Springel 2001 similar) Cooling and Heating and Ionization quite efficient. Cooling and Heating and Ionization quite efficient.

The Large Magellanic Cloud (LMC) in gas and stars Chiara Mastropietro (University of Zürich) With fully dynamical Milky Way Halo (dark matter and hot gas and stellar disk and bulge) which are not shown here. Both tidal and ram- pressure stripping of gas is taking place.

Collisional Physics Derek C. Richardson Gravity with hard spheres including surface friction, coefficient of restitution and aggregates; the Euler equations for solid bodies.

Asteroid Collisions

Part of an asteroid disk, where the outcomes of the asteroid impact simulations are included.

Movies of 1000 years of evolution.

The power spectrum of density fluctuations in three different dark matter models Small scales (dwarf galaxies) Large scales (galaxy clusters) CMB Horizon scale

40Mpc N=10^7 Andrea Maccio et al CDM T=GeV

40Mpc N=10^7 Andrea Maccio et al WDM T=2keV

40Mpc N=10^7 Andrea Maccio et al WDM T=0.5keV

CDM ~500 satellites 1kev WDM ~10 satellites Very strong constraint on the lowest mass WDM candidate – need to form at least one Draco sized substructure halo Halo density profiles unchanged – Liouvilles constraint gives cores ~< 50pc

CDM n(M)=M^-2 WDM n(M)=M^-1 Data n(L)=L^-1

With fixed timesteps these codes all scale very well. With fixed timesteps these codes all scale very well. However, this is no-longer the only measure since the scaling of a very "deep" multistepping run can be a lot worse. However, this is no-longer the only measure since the scaling of a very "deep" multistepping run can be a lot worse. How do we do multistepping now and why does it have problems?

Drift-Kick-Drift Multistepping Leapfrog Drift Kick Rung 0 Rung 1 Rung 2 time Select SelectSelect Note that none of the Kick tick marks align, meaning that gravity is calculated for a single rung at a time, despite the fact that the tree is built for all particles. The select operators are performed top-down until all particles end up on appropriate timestep rungs. 0:DSKD, 1:DS(DSKDDSKD)D, 2:DS(DS(DSKD...

Kick-Drift-Kick Multistepping Leapfrog Select SelectSelect Select This method is more efficient since it performs half the number of tree build operations. It also exhibits somewhat lower errors than the standard DKD integrator It is the only scheme used in production at present.

Choice of Timestep Want a criterion which commutes with the Kick operator and is Galilean invariant, so it should not depend on velocities. Want a criterion which commutes with the Kick operator and is Galilean invariant, so it should not depend on velocities. and can take the minimum of any or all of these criteria Local Non-local, based on max acceleration in moderate densities

Multistepping: The real parallel computing challenge. T ~ 1/sqrt(Gρ), even more dramatic in SPH T ~ 1/sqrt(Gρ), even more dramatic in SPH Implies N active << N Implies N active << N Global approach to load balancing fails. Global approach to load balancing fails. Less compute/comm Less compute/comm Too many synchronization points between all processors. Too many synchronization points between all processors. Want all algorithms of the simulation code to scale as O(N active log N)! Everything that isn't introduces a fixed cost which limits the speed- up attainable from multistepping

The Trends Parallel computers are getting ever more independent computing elements. Eg: Bluegene (100'000s), Multicore CPUs Parallel computers are getting ever more independent computing elements. Eg: Bluegene (100'000s), Multicore CPUs Our simulations are always increasing in resolution and hence we need many more timesteps than were required in the past. Our simulations are always increasing in resolution and hence we need many more timesteps than were required in the past. Multistepping methods have ever more potential to speed up calculations, but introduce new complexities into codes, particularly for large parallel machines. Multistepping methods have ever more potential to speed up calculations, but introduce new complexities into codes, particularly for large parallel machines.

What can be done? Tree repair instead of rebuild. Don't drift all particles, only drift terms that appear on the interaction list! Do smart updates of local cache information instead of flushing at each timestep. Use some local form of achieving load balancing, perhaps scheduling? Remote walks? Allow different parts of the simulation to get somewhat out-of-sync? Use O(N^2) for very active regions. Hybrid Methods: Block+Symba

"Take-away's" on Parallel Computing in N-body Simulations Multistepping is a key ingredient of higher resolution simulations. Multistepping is a key ingredient of higher resolution simulations. Multistepping creates challenging parallel computing problems, particularly as machines as machines grow in number of CPUs. Multistepping creates challenging parallel computing problems, particularly as machines as machines grow in number of CPUs. Multistepping must also be done carefully with algorithms that try to preserve time reversal or other symmetries. Multistepping must also be done carefully with algorithms that try to preserve time reversal or other symmetries. As adaptivity in space pushes us to hybrid approaches, adaptivity in time also push us to hybrid techniques (TreeSymba later). As adaptivity in space pushes us to hybrid approaches, adaptivity in time also push us to hybrid techniques (TreeSymba later).