Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported.

Slides:



Advertisements
Similar presentations
Long Term Future of Halos, Definition of Galaxy Mass, Orbital Instabilities, and Stochastic Hill’s Equations Fred Adams, Univ. Michigan fq(x) Foundational.
Advertisements

Formulation of an algorithm to implement Lowe-Andersen thermostat in parallel molecular simulation package, LAMMPS Prathyusha K. R. and P. B. Sunil Kumar.
P. Miocchi 1,2, R. Capuzzo-Dolcetta 2, P. Di Matteo 2,3 1 INAF - Osserv. Astron. di Teramo (Teramo, Italy) 2 Dept. of Physics, Univ. of Rome “La Sapienza”
Ryuji Morishima (UCLA/JPL). N-body code: Gravity solver + Integrator Gravity solver must be fast and handle close encounters Special hardware (N 2 ):
Dissolving globular clusters: the fate of M 12 Work in collaboration with F. Paresce (INAF) and L. Pulone (Obs. Rome)
Session: MGAT9 – Self-Gravitating Systems SPHERICALLY SYMMETRIC RELATIVISTIC STELLAR CLUSTERS WITH ANISOTROPIC MOMENTUM DISTRIBUTION Marco MERAFINA Department.
Parallel Strategies Partitioning consists of the following steps –Divide the problem into parts –Compute each part separately –Merge the results Divide.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
Tidal Disruption of Globular Clusters in Dwarf Galaxies J. Peñarrubia Santiago 2011 in collaboration with: M.Walker; G. Gilmore & S. Koposov.
Vicky Kalogera Jeremy Sepinsky with Krzysztof Belczynski X-Ray Binaries and and Super-Star Clusters Super-Star Clusters.
Implementing on Alar & Juri Toomres’ Model to Simulate the Formations of Galactic Bridges and Tails in Spiral Galaxies. Kyung Taek Lim and Dr. Wolfgang.
Reji Mathew and David S. Taubman CSVT  Introduction  Quad-tree representation  Quad-tree motion modeling  Motion vector prediction strategies.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
I/O Analysis and Optimization for an AMR Cosmology Simulation Jianwei LiWei-keng Liao Alok ChoudharyValerie Taylor ECE Department Northwestern University.
Module on Computational Astrophysics Professor Jim Stone Department of Astrophysical Sciences and PACM.
In The Beginning  N-body simulations (~100s particles) – to study Cluster formation  Cold collapse produces too steep a density profile (Peebles 1970)
Daniel Blackburn Load Balancing in Distributed N-Body Simulations.
Spatial Structure Evolution of Open Star Clusters W. P. Chen and J. W. Chen Graduate Institute of Astronomy National Central University IAU-APRM
Simon Portegies Zwart (Univ. Amsterdam with 2 GRAPE-6 boards)
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
The Milky Way Galaxy James Binney Oxford University.
Cosmological N-body simulations of structure formation Jürg Diemand, Ben Moore and Joachim Stadel, University of Zurich.
MODELING INTRACLUSTER MEDIUM AND DARK MATTER IN GALAXY CLUSTERS Elena Rasia Dipartimento di Astronomia Università di Padova Padova, April 9th, 2002.
Levels of organization: Stellar Systems Stellar Clusters Galaxies Galaxy Clusters Galaxy Superclusters The Universe Everyone should know where they live:
Gravity and Orbits The gravitational force between two objects:
1 Scalable Distributed Fast Multipole Methods Qi Hu, Nail A. Gumerov, Ramani Duraiswami Institute for Advanced Computer Studies Department of Computer.
J. Cuadra – Accretion of Stellar Winds in the Galactic Centre – IAU General Assembly – Prague – p. 1 Accretion of Stellar Winds in the Galactic Centre.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Institute for Mathematical Modeling RAS 1 Dynamic load balancing. Overview. Simulation of combustion problems using multiprocessor computer systems For.
INAF Osservatorio Astrofisico di Catania “ScicomP 9” Bologna March 23 – Using LAPI and MPI-2 in an N-body cosmological code on IBM SP in an N-body.
I N T R O D U C T I O N The mechanism of galaxy formation involves the cooling and condensation of baryons inside the gravitational potential well provided.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Susan CartwrightOur Evolving Universe1 The Milky Way n From a dark site the Milky Way can be seen as a broad band across the sky l l What is it?   telescopes.
Origin of solar systems 30 June - 2 July 2009 by Klaus Jockers Max-Planck-Institut of Solar System Science Katlenburg-Lindau.
Lecture Outlines Astronomy Today 7th Edition Chaisson/McMillan © 2011 Pearson Education, Inc. Chapter 23.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Effects of correlation between halo merging steps J. Pan Purple Mountain Obs.
New and Odds on Globular Cluster Stellar Populations: an Observational Point of View (The Snapshot Database) G.Piotto, I. King, S. Djorgovski and G. Bono,
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.
Population of Dark Matter Subhaloes Department of Astronomy - UniPD INAF - Observatory of Padova Carlo Giocoli prof. Giuseppe Tormen May Blois.
Lecture 18 Stellar populations. Stellar clusters Open clusters: contain stars loose structure Globular clusters: million stars centrally.
Globular Clusters. A globular cluster is an almost spherical conglomeration of 100,000 to 1,000,000 stars of different masses that have practically.
Copyright © 2010 Pearson Education, Inc. Chapter 14 The Milky Way Galaxy Lecture Outline.
Cosmological N-Body Simulation - Topology of Large scale Structure Changbom Park with Juhan Kim (Korea Institute for Advanced Study) & J. R. Gott (Princeton),
Motions of Self-Gravitating bodies to the Second Post- Newtonian Order of General Relativity.
Susanne Pfalzner Christoph Olczak David Madlener Thomas Kaczmarek Jochen Tackenberg Manuel Steinhausen Uni Köln I.Physikalisches Institut.
17 - Galaxy Evolution (and interactions).
Intermediate-mass Black Holes in Star Clusters Holger Baumgardt Astrophysical Computing Laboratory, RIKEN, Tokyo new address:
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory
The Inter-Galactic Populations and Unbound Dark Matter Ing-Guey Jiang and Yu-Ting Wu National Tsing-Hua University Taiwan.
LISA double BHs Dynamics in gaseous nuclear disk.
Parallel Molecular Dynamics A case study : Programming for performance Laxmikant Kale
The prolate shape of the Galactic halo Amina Helmi Kapteyn Astronomical Institute.
A novel approach to visualizing dark matter simulations
ChaNGa CHArm N-body GrAvity. Thomas Quinn Graeme Lufkin Joachim Stadel Laxmikant Kale Filippo Gioachin Pritish Jetley Celso Mendes Amit Sharma.
ChaNGa CHArm N-body GrAvity. Thomas Quinn Graeme Lufkin Joachim Stadel Laxmikant Kale Filippo Gioachin Pritish Jetley Celso Mendes Amit Sharma.
Breaking of spherical symmetry in gravitational collapse.
© 2017 Pearson Education, Inc.
ChaNGa: Design Issues in High Performance Cosmology
Luciano del Valle & Andrés Escala Universidad de Chile
Quad-Tree Motion Modeling with Leaf Merging
HUBBLE DEEP FIELD:.
This is NOT the Milky Way galaxy! It’s a similar one: NGC 4414.
Course Outline Introduction in algorithms and applications
Cosmology Applications N-Body Simulations
Black Hole Binaries Dynamically Formed in Globular Clusters
Presentation transcript:

Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported by the INAF-CINECA agreement ( grant inarm033). The use of High Performance Computing in Astrophysics: an experience report

The needs of HPC in Globular Cluster dynamics Theoretical study of a system made up of N ~ 10 5 – 10 7 gravitationally bound stars (Self-gravitating system).

The needs of HPC in Globular Cluster dynamics Theoretical study of a system made up of N ~ 10 5 – 10 7 gravitationally bound stars (Self-gravitating system).  O(N 2 ) force computations to do.

The needs of HPC in Globular Cluster dynamics Gravity is a long-range and attractive force   Very unstable dynamical states

The needs of HPC in Globular Cluster dynamics virial ratio lagrangian radii time in crossing-time unit

The needs of HPC in Globular Cluster dynamics Gravity is a long-range and attractive force   Inhomogeneous mass distributions  very wide range of time-scales ~ (G  ) –1/2  Numerically “expensive” time integration of particle motion Individual and variable time-steps should be adopted

The needs of HPC in Globular Cluster dynamics t = 0 t = 4 t cross  1000 times denser 

The needs of HPC in Globular Cluster dynamics Gravity is a long-range and attractive force   Very unstable dynamical states  Inhomogeneous mass distributions  3D problems!  arduous analytical approach!

The needs of HPC in Globular Cluster dynamics Dynamical evolution of self-gravitating systems with N > 10 5 stars  > tens of Gflops needed!  codes PARALLELIZATION required

 computational cost independent of n  mm rcm F m The tree-code n particles M = tot. mass Q = quadrupole see Barnes & Hut 1986, Nature 324, 446

‘tree’ logical structure each node corresponds to a box recursive subdivision in ‘boxes’ The tree-code

Multipolar coefficients are evaluated for each box. O(N log N) computations recursive subdivision in ‘boxes’

Problems in the tree-code parallelization  Gravity is a long range interaction: inter-processor data transfer unavoidable (heavy overhead on DMP)  Inhomogeneous mass distributions: particles assignment to PEs has to be done according to the work- load  Hierarchical force evaluation: most of force contributions due to closer bodies, spatial domain decomposition.

the ‘Adaptive Tree Decomposition’ method  Domain decomposition is performed ‘on-the-fly’ during the tree-construction with a low computational cost.  The adaptivity of the tree structure is exploited to give a good load-balancing and data-locality in the forces evaluation.  The locally essential tree is built ‘dynamically’ during the tree-walking: remote boxes are linked only when really needed.

the ‘Adaptive Tree Decomposition’ method LOWER-TREE: few boxes containing many particles. Two different parallelization strategies UPPER-TREE: many boxes with few particles inside. see Miocchi & Capuzzo-Dolcetta 2002, A&A 382, 758 PE

Some definitions  UPPER-tree = made up of boxes with less than kp particles inside;  LOWER-tree = made up of boxes with more than kp particles;  a Pseudo-terminal (PTERM) box is a box in the upper-tree whose ‘parent box’ is in the lower-tree; p = no. of processors, k = fixed coefficient the ‘Adaptive Tree Decomposition’ approach

 load balancing: in this stage it is ensured by setting k sufficiently large so to deal always with a number of particles in a box much greater than the number of processors. the ‘Adaptive Tree Decomposition’ method 1.Preliminary “random” particles distribution to PEs. 2.All PEs work, starting from the root box, constructing in synchrony the same lower-boxes (by a recursive procedure). 3.When a PTERM box is found, it is assigned to a certain PE (so to preserve a good load-balancing in the subsequent forces evaluation) and no further ‘branches’ are built up.  domain decomposition: Communications among PEs during tree-walking are minimized by the particular order in which PTERM boxes are met. The lower-tree is stored in the local memories of ALL PEs. Parallelization of the lower-tree construction...

the ‘Adaptive Tree Decomposition’ method the ‘Adaptive Tree Decomposition’ method Example of a uniform 2-D distribution with PTERM boxes at the 3 rd subdivision level. Every spatial domain is (nearly) contiguous the data transfer among PEs is minimized PTERM order

Example of domain decomposition Plummer distribution of 16K particles; 4 processors the ‘Adaptive Tree Decomposition’ method

Parallelization of the upper-tree construction Parallelization of the upper-tree construction  PTERM boxes have been already distributed to PEs  Each PE works independently and asynchronously, starting from every PTERM box in the domain and building the descendant portion of the upper-tree, up to the terminal boxes. the ‘Adaptive Tree Decomposition’ method

Parallelization of the tree walking  Each PE evaluates independently the forces on the particles belonging to its domain (i.e. those contained in the PTERM boxes previously assigned).  Each PE has in its memory the local tree, i.e. the whole lower-tree plus the portion of the upper-tree that is descended from the PTERM boxes of the PE’s domain.  When a ‘remote’ box is met, it is linked to the local tree, copying it into the local memory. the ‘Adaptive Tree Decomposition’ method

Code performance on a IBM SP4 Performances on one ‘main’ time-step (T ) with complete forces evaluation and time integration of motion for a self- gravitating system with N = 10 6 particles WARNING  each particle has its own variable time-step depending on the local density of mass and typical velocity.  Dynamical tree recostruction implemented according to the block time scheme the particle step can be T/2 n (Aarseth 1985) The tree is re-built when the no. of interactions evaluated is > N /10 (Springel et al., 2001, New Astr., 6, 51)

Code performance on a IBM SP4 Performance on one ‘main’ time-step (T ) with complete forces evaluation and time integration of motion for a self-gravitating system with N = 10 6 particles 2,100,000 time- advancing performed

Code performance on a IBM SP4 CPU-time (sec) Performance on one ‘main’ time-step with complete forces evaluation and time integration of motion for a self-gravitating system with N = 10 6 particles (  = 0.7, k = 256, up to 16 PEs per node) 25,000 particles per second

Code performance on a IBM SP4  The speedup behaviour is very good up to 16 PEs (= 10).  The load-unbalancing is low (10% with 64 PEs).  Data transfer and communications still penalize the overall performance with low PEs / N ratio (34% with 64 PEs).  An MPI-2 version could fully exploit the ATD parallelization strategy.

Merging of Globular Clusters in galactic central regions  To what extent can GCs survive the strong tidal bulge interaction?  Do they merge at the end?  What features the final merging product will have?  To what extent can the bulge accrete from the GCs mass lost? Motivation: the study of the dynamical evolution and the fate of young GCs within the bulge

Merging of Globular Clusters in galactic central regions 30,000 CPU-hours on an IBM SP4 provided by the INAF-CINECA agreement for a scientific ‘key- project’ (under grant inarm033) Motivation: the study of the dynamical evolution and the fate of young GCs within the bulge

Merging of Globular Clusters in galactic central regions  N-body (tree-code) accurate simulations with high number of ‘particles’ (10 6 ).  Dynamical friction and mass function included.  Self-consistent triaxial bulge model (Schwarzschild). Features of the numerical approach b c d a  (km/s) t cr (Kyr) r c (pc)cr t (pc)M (10 6 M  )clusterSimulation B A higher concentration

Merging of Globular Clusters in galactic central regions  Quasi-radial orbits  Clusters cross each other at every passage (twice per period)  t (Myr) x (pc)

Merging of Globular Clusters in galactic central regions “tidal tails” around Pal 5 (after Odenkirchen et. al. 2002) Our simulation of a cluster in a circular orbit tidal tails reproduced by our simulation Tidal tails structure and formation

Merging of Globular Clusters in galactic central regions “ripples” around a cluster in our simulations “ripples” around NGC 3923 Tidal tails structure and formation

Merging of Globular Clusters in galactic central regions “ripples” around a cluster “ripples” around NGC 3923 What “ripples” are? How do they form? 3D visualization tools can help to give answers! Tidal tails structure and formation

Merging of Globular Clusters in galactic central regions t = 0 t = 17 Myr (dashed black line: bulge central density) least compact cluster at t = 15 Myr Density profiles of the most compact cluster (solid lines) fitted with a single-mass King model (dotted lines) tidal tails

Merging of Globular Clusters in galactic central regions  p = fraction of mass lost if  /  < p/100 central cluster density  E = fraction of mass lost if E i > 0 Fraction of mass lost c = bulge stellar density