OWLS: OverWhelmingly Large Simulations Joop Schaye, Craig Booth, Claudio Dalla Vecchia, Alan Duffy, Marcel Haas, Volker Springel, Tom Theuns, Luca Tornatore,

1 OWLS: OverWhelmingly Large Simulations Joop Schaye, Craig Booth, Claudio Dalla Vecchia, Alan Duffy, Marcel Haas, Volker Springel, Tom Theuns, Luca Tornatore, Rob Wiersma, … The formation of galaxies and the evolution of the intergalactic medium

2 Outline 1. Introducing cosmological simulations and OWLS 2. Preparing for and coping with lots of data 3. Data format and units 4. Microsimulations 5. Virtual observations of simulations 6. Discussion points

3 Example science questions What determines the SF history of the universe? Where are the baryons and how can they be detected? Where are the metals and how can they be detected? How does galaxy formation depend on environment? What are the large-scale effects of galactic winds driven by stars and AGN? Virtual observations

4 Cosmological hydro simulations Evolution from z>~100 to z ~< 10 of a representative part of the universe Boundary conditions: periodic Expansion solved analytically and scaled out Initial conditions from the CMB Components: cold dark matter, gas, stars, radiation (optically thin) Scales ~< kpc to ~ 100 Mpc Sub-grid modules are a crucial part of the game Discretizaton: time and mass (SPH)

6 OWLS subgrid physics improvements New star formation New wind model Added chemodynamics New cooling

7 OWLS Numerical Resolution 25 Mpc/h (run to z=2): –m_bar = 1x10^6 Msun/h –Softening = 2 kpc/h comoving < 0.5 kpc/h proper 100 Mpc/h (run to z=0): –m_bar = 7x10^7 Msun/h –Softening = 8 kpc/h comoving < 2.0 kpc/h proper

10 Gas density zoom

11 Zoom CDV, OWLS project

12 Outline 1. Introducing cosmological simulations and OWLS 2. Preparing for and coping with lots of data 3. Data format and units 4. Microsimulations 5. Virtual observations of simulations 6. Discussion points

13 OWLS data volume 7 - 42 floats per particle  up to 168 bytes/particle 268 M particles per snapshot  26 GB/snapshot Up to 35 snapshots per simulations  900 GB/simulation ~60 large simulations  54 TB of snapshot data

14 How do we analyze 50+ TB of data? First analyze lower resolution versions Use hdf5  only read what is actually needed Use fast visualization software (e.g. avoid SPH interpolation) Produce as much as possible on the fly: –Logs (e.g. SF histories) –Grids (e.g. baryon distribution) –Sight lines (e.g. QSO absorption spectra) –Images (e.g. for videos) –Zooms saved on the fly (e.g. most massive object) –Diagnostic particle arrays (e.g. max temperature) –Group catalogues

15 Analysis that can be done on a notebook: Low-resolution versions Evolution of globally averaged (or gridded) quantities Halo integrated properties Halo profiles Absorption spectra Most things related to zooms

16 Outline 1. Introducing cosmological simulations and OWLS 2. Preparing for and coping with lots of data 3. Data format and units 4. Microsimulations 5. Virtual observations of simulations 6. Discussion points

17 Hdf5 – Hierarchical Data Format Binary  fast I/O and compressed Machine-independent Intuitive hierarchical “directory-like” structure Individual elements can be read, modified and saved Easy to add meta-data Utilities available to view the data in ascii or graphical form (incl. tables, images) Very easy to read/write from IDL, C, F90 Free

18 Lay-out of an hdf5 file

19 hdf5 groups

20 hdf5 data sets

21 Data format: Units Data should use code units to allow easy debugging, restarting But data should be in physically sensible units to allow analysis by external users Cosmological data should clarify h and aexp dependence for external users Solution: use code units but include conversion factors to cgs and aexp and h dependencies as meta-data.

22 Attributes to data sets

23 Outline 1. Introducing cosmological simulations and OWLS 2. Preparing for and coping with lots of data 3. Data format and units 4. Microsimulations 5. Virtual observations of simulations 6. Discussion points

25 Virtual Observations: Uses Guide the design of observational campaigns and instruments Test data analysis pipelines Determine selection effects Test theoretical models PR/education

26 Virtual Observations: Types Galaxy magnitudes –Population synthesis, which depends on Assumed IMF Age and composition of star particle –Dust column and assumed reddening law Stellar light images Absorption spectra Gas emission maps (2-D: images, 3-D: IFU datacubes) Other maps: e.g. lensing, SZ, dust absorption/emission

27 Creating Virtual Observations Compute emission and/or absorption of each resolution element Project and grid data, optionally combine different time slices (e.g. lightcones) Observe with chosen instrument (easiest step)

28 Virtual Observations – Star light Population synthesis –Save initial mass, age, and composition of individual star particles –Assume IMF Dust column –Save mass and composition of individual gas particles –Assume reddening law –Require fast calculation of SPH column densities

29 Virtual Observations – Gas absorption Require gas mass, position, velocity, density, temperature, composition Assume ionization equilibrium and use ionization balance tables (e.g. CLOUDY) –Collisional ionization  temperature –Photo-ionization  density, temperature, radiation field

30 Virtual Observations – Gas emission Require gas mass, position, density, temperature, composition, (velocity) Assume ionization equilibrium and use emissivity tables (e.g. CLOUDY) Dust columns

31 On-demand Virtual Observations of cosmological simulations Computed from physical properties (e.g. gas density, temperature) Computed from pre-calculated “ideal” observables – Example: narrow band gas emission for various assumed radiation fields, chemical compositions, or density/temperature cuts – Allows one to vary both physical assumptions and instrumental characteristics – Computationally expensive – Example: narrow band gas emission for a fixed radiation field and chemical composition – Cannot vary physical assumptions – Relatively cheap

32 VOs from physical properties Challenge: Analysis of 3-D simulations is typically memory-intensive (512^3  1 single precision array is 0.5 GB) Solution: Prepare data with reduced dimensionality, calculate on demand –1-D: Absorption spectra, dust columns –2-D: Images from integrated quantities Problem: Emission & (gas) absorption depends on local 3-D rather than projected 2-D properties

33 On-demand VOs Feasible now or in the near future: –(VOs based on) object catalogues –Absorption spectra –SZ effect –Images of stellar light (though maybe not full lightcones) –Other VOs from pre-calculated observables

34 Discussion points: 1.Are raw cosmological simulation data sets too large to put in the VO? 2.Is there a place for non-VO, processed data products in the VO? (e.g. physical properties) 3.Must VO software for non-simulators be like a black box? 4.Is it desirable to create black boxes for non-simulators? 5.How to prevent wasting of resources by wrong use of VO software?

