Download presentation
Presentation is loading. Please wait.
Published byBarnaby Bruce Modified over 9 years ago
1
Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported by the INAF-CINECA agreement (http://inaf.cineca.it, grant inarm033).http://inaf.cineca.it The use of High Performance Computing in Astrophysics: an experience report
2
The needs of HPC in Globular Cluster dynamics Theoretical study of a system made up of N ~ 10 5 – 10 7 gravitationally bound stars (Self-gravitating system).
3
The needs of HPC in Globular Cluster dynamics Theoretical study of a system made up of N ~ 10 5 – 10 7 gravitationally bound stars (Self-gravitating system). O(N 2 ) force computations to do.
4
The needs of HPC in Globular Cluster dynamics Gravity is a long-range and attractive force Very unstable dynamical states
5
The needs of HPC in Globular Cluster dynamics virial ratio lagrangian radii time in crossing-time unit
6
The needs of HPC in Globular Cluster dynamics Gravity is a long-range and attractive force Inhomogeneous mass distributions very wide range of time-scales ~ (G ) –1/2 Numerically “expensive” time integration of particle motion Individual and variable time-steps should be adopted
7
The needs of HPC in Globular Cluster dynamics t = 0 t = 4 t cross 1000 times denser
8
The needs of HPC in Globular Cluster dynamics Gravity is a long-range and attractive force Very unstable dynamical states Inhomogeneous mass distributions 3D problems! arduous analytical approach!
9
The needs of HPC in Globular Cluster dynamics Dynamical evolution of self-gravitating systems with N > 10 5 stars > tens of Gflops needed! codes PARALLELIZATION required
10
computational cost independent of n mm rcm F m The tree-code n particles M = tot. mass Q = quadrupole see Barnes & Hut 1986, Nature 324, 446
11
1 3 2 4 ‘tree’ logical structure each node corresponds to a box recursive subdivision in ‘boxes’ The tree-code
12
Multipolar coefficients are evaluated for each box. O(N log N) computations recursive subdivision in ‘boxes’
13
Problems in the tree-code parallelization Gravity is a long range interaction: inter-processor data transfer unavoidable (heavy overhead on DMP) Inhomogeneous mass distributions: particles assignment to PEs has to be done according to the work- load Hierarchical force evaluation: most of force contributions due to closer bodies, spatial domain decomposition.
14
the ‘Adaptive Tree Decomposition’ method Domain decomposition is performed ‘on-the-fly’ during the tree-construction with a low computational cost. The adaptivity of the tree structure is exploited to give a good load-balancing and data-locality in the forces evaluation. The locally essential tree is built ‘dynamically’ during the tree-walking: remote boxes are linked only when really needed.
15
the ‘Adaptive Tree Decomposition’ method LOWER-TREE: few boxes containing many particles. Two different parallelization strategies UPPER-TREE: many boxes with few particles inside. see Miocchi & Capuzzo-Dolcetta 2002, A&A 382, 758 PE 3 2 1 0
16
Some definitions UPPER-tree = made up of boxes with less than kp particles inside; LOWER-tree = made up of boxes with more than kp particles; a Pseudo-terminal (PTERM) box is a box in the upper-tree whose ‘parent box’ is in the lower-tree; p = no. of processors, k = fixed coefficient the ‘Adaptive Tree Decomposition’ approach
17
load balancing: in this stage it is ensured by setting k sufficiently large so to deal always with a number of particles in a box much greater than the number of processors. the ‘Adaptive Tree Decomposition’ method 1.Preliminary “random” particles distribution to PEs. 2.All PEs work, starting from the root box, constructing in synchrony the same lower-boxes (by a recursive procedure). 3.When a PTERM box is found, it is assigned to a certain PE (so to preserve a good load-balancing in the subsequent forces evaluation) and no further ‘branches’ are built up. domain decomposition: Communications among PEs during tree-walking are minimized by the particular order in which PTERM boxes are met. The lower-tree is stored in the local memories of ALL PEs. Parallelization of the lower-tree construction...
18
the ‘Adaptive Tree Decomposition’ method the ‘Adaptive Tree Decomposition’ method Example of a uniform 2-D distribution with PTERM boxes at the 3 rd subdivision level. Every spatial domain is (nearly) contiguous the data transfer among PEs is minimized PTERM order
19
Example of domain decomposition Plummer distribution of 16K particles; 4 processors the ‘Adaptive Tree Decomposition’ method
20
Parallelization of the upper-tree construction Parallelization of the upper-tree construction PTERM boxes have been already distributed to PEs Each PE works independently and asynchronously, starting from every PTERM box in the domain and building the descendant portion of the upper-tree, up to the terminal boxes. the ‘Adaptive Tree Decomposition’ method
21
Parallelization of the tree walking Each PE evaluates independently the forces on the particles belonging to its domain (i.e. those contained in the PTERM boxes previously assigned). Each PE has in its memory the local tree, i.e. the whole lower-tree plus the portion of the upper-tree that is descended from the PTERM boxes of the PE’s domain. When a ‘remote’ box is met, it is linked to the local tree, copying it into the local memory. the ‘Adaptive Tree Decomposition’ method
22
Code performance on a IBM SP4 Performances on one ‘main’ time-step (T ) with complete forces evaluation and time integration of motion for a self- gravitating system with N = 10 6 particles WARNING each particle has its own variable time-step depending on the local density of mass and typical velocity. Dynamical tree recostruction implemented according to the block time scheme the particle step can be T/2 n (Aarseth 1985) The tree is re-built when the no. of interactions evaluated is > N /10 (Springel et al., 2001, New Astr., 6, 51)
23
Code performance on a IBM SP4 Performance on one ‘main’ time-step (T ) with complete forces evaluation and time integration of motion for a self-gravitating system with N = 10 6 particles 2,100,000 time- advancing performed
24
Code performance on a IBM SP4 CPU-time (sec) Performance on one ‘main’ time-step with complete forces evaluation and time integration of motion for a self-gravitating system with N = 10 6 particles ( = 0.7, k = 256, up to 16 PEs per node) 25,000 particles per second
25
Code performance on a IBM SP4 The speedup behaviour is very good up to 16 PEs (= 10). The load-unbalancing is low (10% with 64 PEs). Data transfer and communications still penalize the overall performance with low PEs / N ratio (34% with 64 PEs). An MPI-2 version could fully exploit the ATD parallelization strategy.
26
Merging of Globular Clusters in galactic central regions To what extent can GCs survive the strong tidal bulge interaction? Do they merge at the end? What features the final merging product will have? To what extent can the bulge accrete from the GCs mass lost? Motivation: the study of the dynamical evolution and the fate of young GCs within the bulge
27
Merging of Globular Clusters in galactic central regions 30,000 CPU-hours on an IBM SP4 provided by the INAF-CINECA agreement for a scientific ‘key- project’ (under grant inarm033) Motivation: the study of the dynamical evolution and the fate of young GCs within the bulge
28
Merging of Globular Clusters in galactic central regions N-body (tree-code) accurate simulations with high number of ‘particles’ (10 6 ). Dynamical friction and mass function included. Self-consistent triaxial bulge model (Schwarzschild). Features of the numerical approach 3310090.97215b 37425.51.29820c 37283.81.37715d 33170140.89520a (km/s) t cr (Kyr) r c (pc)cr t (pc)M (10 6 M )clusterSimulation B A higher concentration
29
Merging of Globular Clusters in galactic central regions Quasi-radial orbits Clusters cross each other at every passage (twice per period) t (Myr) x (pc)
30
Merging of Globular Clusters in galactic central regions “tidal tails” around Pal 5 (after Odenkirchen et. al. 2002) Our simulation of a cluster in a circular orbit tidal tails reproduced by our simulation Tidal tails structure and formation
31
Merging of Globular Clusters in galactic central regions “ripples” around a cluster in our simulations “ripples” around NGC 3923 Tidal tails structure and formation
32
Merging of Globular Clusters in galactic central regions “ripples” around a cluster “ripples” around NGC 3923 What “ripples” are? How do they form? 3D visualization tools can help to give answers! Tidal tails structure and formation
33
Merging of Globular Clusters in galactic central regions t = 0 t = 17 Myr (dashed black line: bulge central density) least compact cluster at t = 15 Myr Density profiles of the most compact cluster (solid lines) fitted with a single-mass King model (dotted lines) tidal tails
34
Merging of Globular Clusters in galactic central regions p = fraction of mass lost if / < p/100 central cluster density E = fraction of mass lost if E i > 0 Fraction of mass lost c = 0.8 0.9 1.2 1.3 bulge stellar density
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.