1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical.

Slides:



Advertisements
Similar presentations
Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
Advertisements

Building a Mock Universe Cosmological nbody dark matter simulations + Galaxy surveys (SDSS, UKIDSS, 2dF) Access to mock catalogues through VO Provide analysis.
The Australian Virtual Observatory e-Science Meeting School of Physics, March 2003 David Barnes.
Eötvös University Budapest in the Network.  Seniors: István Csabai (node coordinator): »Photometric redshift estimation, virtual observatories, science.
VO-DAS Chenzhou CUI Chao LIU, Haijun TIAN, Yang YANG, etc National Astronomical Observatories, CAS.
The Virtual Observatory in Germany and abroad A status report with examples/demos from GAVO and other national VOs.
Simulating the joint evolution of quasars, galaxies and their large-scale distribution Springel et al., 2005 Presented by Eve LoCastro October 1, 2009.
Why Environment Matters more massive halos. However, it is usually assumed in, for example, semianalytic modelling that the merger history of a dark matter.
Studying the mass assembly and luminosity gap in fossil groups of galaxies from the Millennium Simulation Ali Dariush, University of Birmingham Studying.
20 Spatial Queries for an Astronomer's Bench (mark) María Nieto-Santisteban 1 Tobias Scholl 2 Alexander Szalay 1 Alfons Kemper 2 1. The Johns Hopkins University,
Environmental dependence of halo formation times Geraint Harker.
/19 LeidenMillennium DB Tutorial Introduction to the Millennium Database with an SQL tutorial.
Data-Intensive Computing in the Science Community Alex Szalay, JHU.
Simon Portegies Zwart (Univ. Amsterdam with 2 GRAPE-6 boards)
Modeling the 3-point correlation function Felipe Marin Department of Astronomy & Astrophysics University of Chicago arXiv: Felipe Marin Department.
Cosmological constraints from models of galaxy clustering Abstract Given a dark matter distribution, the halo occupation distribution (HOD) provides a.
박창범 ( 고등과학원 ) & 김주한 ( 경희대학교 ), J. R. Gott (Princeton, USA), J. Dubinski (CITA, Canada) 한국계산과학공학회 창립학술대회 Cosmological N-Body Simulation of Cosmic.
Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.
Spatial Indexing and Visualizing Large Multi-dimensional Databases I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,
Millennium Data Dissemination MPA institute seminar1 possible extensions how ambitious can/should we be?
Playing in High Dimensions Bob Nichol ICG, Portsmouth Thanks to all my colleagues in SDSS, GRIST & PiCA Special thanks to Chris Miller, Alex Gray, Gordon.
Toledo, MPA access methods and plans With contributions from JHU : Alex Szalay, Jan Vanderberg MPA: Jeremy Blaizot,
1 Overview of Databases. 2 Content Databases Example: Access Structure Query language (SQL)
The GAVO Cross-Matcher Application Hans-Martin Adorf, Gerard Lemson, Wolfgang Voges GAVO, Max-Planck-Institut für extraterrestrische Physik, Garching b.
Alex Szalay, Jim Gray Analyzing Large Data Sets in Astronomy.
Impact of Early Dark Energy on non-linear structure formation Margherita Grossi MPA, Garching Volker Springel Advisor : Volker Springel 3rd Biennial Leopoldina.
TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/
Simple Database.
Theory in the German Astrophysical VO Summary: We show results of efforts done within the German Astrophysical Virtual Observatory (GAVO). GAVO has paid.
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.
Cosmological simulations in a relational database: modelling and storing merger trees Gerard Lemson, GAVO, Max-Planck-Institut für extraterrestrische Physik,
Dissemination of simulations in the Virtual Observatory Gerard Lemson German Astrophysical Virtual Observatory, Max-Planck Institute for extraterrestrial.
Theory in the Virtual Observatory Gerard Lemson, GAVO.
EÖTVÖS UNIVERSITY BUDAPEST Department of Physics of Complex Systems VO Spectroscopy Workshop, ESAC Spectrum Services 2007 László Dobos (ELTE)
Indexing and Visualizing Multidimensional Data I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,Budapest.
Measuring dark energy from galaxy surveys Carlton Baugh Durham University London 21 st March 2012.
Sean Passmoor Supervised by Dr C. Cress Simulating the Radio Sky.
Millennium Data Dissemination databases and other services Millennium Workshop1.
Making a virtual Universe Adrian Jenkins - ICC, Durham University.
GES 2007, The German Astrophysical Virtual Observatory (GAVO) Knowledge Networking for Astronomy in Germany and abroad Gerard Lemson 1,2, Wolfgang.
Using Baryon Acoustic Oscillations to test Dark Energy Will Percival The University of Portsmouth (including work as part of 2dFGRS and SDSS collaborations)
The International Virtual Observatory Alliance (IVOA) interoperability in action.
Modeling the dependence of galaxy clustering on stellar mass and SEDs Lan Wang Collaborators: Guinevere Kauffmann (MPA) Cheng Li (MPA/SHAO, USTC) Gabriella.
Simulations by Ben Moore (Univ. of Zurich)
Cargèse - August 2006 Semi-analytics and mock catalogues as tools to observe ideas I.Semi-analytic modelling of galaxy formation The long way from first.
German Astrophysical Virtual Observatory Overview and Results So Far W. Voges, G. Lemson, H.-M. Adorf.
Gerard Lemson Theory in the VO and the SimDB specification Euro-VO DCA workshop Garching, June 26, 2008 Feedback questionnaire.
Strasbourg, EURO-VO DCA First Board Meeting (GA)VO projects at MPG(E) and participation in VO-DCA (tbcd) Wolfgang Voges, Gerard Lemson.
Theory, Grid and VO Matthias Steinmetz (AIP)
Web based spectrum databases and utilities László Dobos Tamás Budavári István Csabai MAGPOP kick-off meeting, January Cassis.
Mining Virtual Universes Simulations in a relational database.
Reproducing the Observed Universe with Simulations Qi Guo Max Planck Institute for Astrophysics MPE April 8th, 2008.
CS 540 Database Management Systems
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
Lecture 3 With every passing hour our solar system comes forty-three thousand miles closer to globular cluster 13 in the constellation Hercules, and still.
Feasibility of detecting dark energy using bispectrum Yipeng Jing Shanghai Astronomical Observatory Hong Guo and YPJ, in preparation.
Astronomy toolkits and data structures Andrew Jenkins Durham University.
JENAM 2008 Theory Standards for the Virtual Observatory SimDB + SimDAP.
Lightcones for Munich Galaxies Bruno Henriques. Outline 1. Model to data - stellar populations and photometry 2. Model to data - from snapshots to lightcones.
A self consistent model of galaxy formation across cosmic time Bruno Henriques Simon White, Peter Thomas Raul Angulo, Qi Guo, Gerard Lemson, Volker Springel.
CS 540 Database Management Systems
Light-cone data format and ray-tracing tools
The mock galaxy catalogue for HI survey based on SAMs of galaxy formation 富坚 Guiyang, FRA2015 Shanghai Astronomical Observatory.
Virtual Observatory for cosmological simulations
Spatial Online Sampling and Aggregation
Google Sky.
Core of Coma Cluster (optical)
Voids size distribution in the 2dFGRS
Presentation transcript:

1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical Virtual Observatory (GAVO) ARI, Heidelberg MPE, Garching bei München

Garching, June 26, Acknowledgments  Alex Szalay  Virgo consortium, in particular:  Volker Springel, Simon White, Gabriella DeLucia, Jeremy Blaizot (MPA, Munich, Germany),  Carlos Frenk, Richard Bower, John Helly (ICC, Durham, UK)  Similar efforts/sites to Millennium Database  Durham (mirror of Millennium DB)  Horizon/GalICS (Lyon)  ITVO (Trieste)  GAVO is funded by the German Federal Ministry for Education and Research (BMBF)

Garching, June 26, Summary  VO aims to provide access to remote data for use/analysis by 3 rd parties.  Data analysis requires  advanced methods for analysis  data  Data sets are often very large, often far away (makes them even larger!)  To analyse remote datasets, one needs to be able to bring the analysis to the data.  “Standard” approach using flat files and C/IDL/etc code sub-optimal  To analyse very large datasets we also need advanced methods of data organisation and data access  Structured approach supported by relational database system allows one to concentrate on science, iso worry about I/O optimisation etc  And the questions can become pretty complex !

Garching, June 26, Case study: The Millennium Simulation S pringel V. et al Nature 435, 629

Garching, June 26, Millennium Simulation  Virgo consortium  Gadget 3  10 billion particles, dark matter only  500 Mpc periodic box  Concordance model (as of 2004) initial conditions  64 snapshots  CPU hours  O(30Tb) raw + post-processed data  Post-processing data products complex and large  Challenge to analyse, even locally!  SimDAP-like approach required for remote access.

Garching, June 26, Intermezzo: Data Access is Hitting a Wall (courtesy Alex Szalay) FTP and GREP are not adequate  You can GREP/FTP 1 MB in a second  You can GREP/FTP 1 GB in a minute  You can GREP/FTP 1 TB in 2 days  You can GREP/FTP 1 PB in 3 years  SFTP much slower  and 1PB ~2,000 disks  At some point you need indices to limit search parallel data search and analysis  This is where databases can help

Garching, June 26, Analysis and Databases (courtesy Alex Szalay)  Much statistical analysis deals with  Creating uniform samples -- data filtering  Assembling relevant subsets  Estimating completeness  Censoring bad data  Counting and building histograms  Generating Monte-Carlo subsets  Likelihood calculations  Hypothesis testing  Traditionally these are performed on files  Most of these tasks are much better done inside a database

Garching, June 26, Advantages of relational databases  Encapsulation of data in terms of logical structure, no need to know about internals of data storage  Standard query language for finding information  Advanced query optimizers (indexes, clustering)  Transparent internal parallelization  Authenticated remote access for multiple users at same time  Forces one to think carefully about data structure  Speeds up path from science question to answer  Facilitates communication (query code is cleaner)  Facilitates adaptation to IVOA standards (ADQL)

Garching, June 26, Millennium Simulation Phenomenology  Density field on mesh  CIC  Gaussian smoothed: 1.25,2.5,5,10 Mpc/h  Friends-of-Friends (FOF) groups  SUBFIND Subhalos  Galaxies from 2 semi-analytical models (SAMs)  MPA (L-Galaxies, DeLucia & Blaizot, 2006; Bertone et al 2007)  Durham (GalForm, Bower et al, 2006 )  Subhalo and galaxy formation histories: merger trees  Mock catalogues on light-cone  Pencil beams (Kitzbichler & White, 2006)  All-sky (depth of SDSS spectral sample) (Blaizot et al, 2005)  In preparation: Spectra for light cone galaxies

Garching, June 26,

Garching, June 26, Millennium Simulation Phenomenology  Density field on mesh  CIC  Gaussian smoothed: 1.25,2.5,5,10 Mpc/h  Friends-of-Friends (FOF) groups  SUBFIND Subhalos  Galaxies from 2 semi-analytical models (SAMs)  MPA (L-Galaxies, DeLucia & Blaizot, 2006; Bertone et al 2007)  Durham (GalForm, Bower et al, 2006 )  Subhalo and galaxy formation histories: merger trees  Mock catalogues on light-cone  Pencil beams (Kitzbichler & White, 2006)  All-sky (depth of SDSS spectral sample) (Blaizot et al, 2005)  In preparation: Spectra for light cone galaxies

Garching, June 26, FOF groups, (sub)halos and galaxies

Garching, June 26, Millennium Simulation Phenomenology  Density field on mesh  CIC  Gaussian smoothed: 1.25,2.5,5,10 Mpc/h  Friends-of-Friends (FOF) groups  SUBFIND Subhalos  Galaxies from 2 semi-analytical models (SAMs)  MPA (L-Galaxies, DeLucia & Blaizot, 2006; Bertone et al 2007)  Durham (GalForm, Bower et al, 2006 )  Subhalo and galaxy formation histories: merger trees  Mock catalogues on light-cone  Pencil beams (Kitzbichler & White, 2006)  All-sky (depth of SDSS spectral sample) (Blaizot et al, 2005)  In preparation: Spectra for light cone galaxies

Garching, June 26, Time evolution: merger trees

Garching, June 26, Millennium Simulation Phenomenology  Density field on mesh  CIC  Gaussian smoothed: 1.25,2.5,5,10 Mpc/h  Friends-of-Friends (FOF) groups  SUBFIND Subhalos  Galaxies from 2 semi-analytical models (SAMs)  MPA (L-Galaxies, DeLucia & Blaizot, 2006; Bertone et al 2007)  Durham (GalForm, Bower et al, 2006 )  Subhalo and galaxy formation histories: merger trees  Mock catalogues on light-cone  Pencil beams (Kitzbichler & White, 2006)  All-sky (depth of SDSS spectral sample) (Blaizot et al, 2005)  In preparation: Spectra for light cone galaxies

Garching, June 26, Mock catalogues

Garching, June 26, Millennium Simulation Phenomenology  Density field on mesh  CIC  Gaussian smoothed: 1.25,2.5,5,10 Mpc/h  Friends-of-Friends (FOF) groups  SUBFIND Subhalos  Galaxies from 2 semi-analytical models (SAMs)  MPA (L-Galaxies, DeLucia & Blaizot, 2006; Bertone et al 2007)  Durham (GalForm, Bower et al, 2006 )  Subhalo and galaxy formation histories: merger trees  Mock catalogues on light-cone  Pencil beams (Kitzbichler & White, 2006)  All-sky (depth of SDSS spectral sample) (Blaizot et al, 2005)  In preparation: Spectra for light cone galaxies

Garching, June 26, Synthetic spectra (not yet available)

Garching, June 26, Hierarchy of Data Products Density Field Mesh Cell FOF Group Subhalo MergerTree SAM Galaxy Merger Tree Light Cone Galaxy original Tree relationships Parent halo SUBFIND result Parent FOF group Located in Spectrum

Garching, June 26, Designing the Database  Need a model for data, including relations between different objects  Model needs to support science: “20 questions” (following Gray & Szalay) 1.Return the galaxies residing in halos of mass between 10^13 and 10^14 solar masses. 2.Return the galaxy content at z=3 of the progenitors of a halo identified at z=0 3.Return the complete halo merger tree for a halo identified at z=0 4.Find all the z=3 progenitors of z=0 red ellipticals (i.e. B-V>0.8 B/T > 0.5) 5.Find the descendents at z=1 of all LBG's (i.e. galaxies with SFR>10 Msun/yr) at z=3 6.Find all the z=2 galaxies which were within 1Mpc of a LBG (i.e. SFR>10Msun/yr) at some previous redshift. 7.Find the multiplicity function of halos depending on their environment (overdensity of density field smoothed on certain scale) 8.Find the dependency of halo properties on environment

Garching, June 26, Data model features  Each object its table  properties are columns  each a unique identifier  Relations implemented through foreign keys,  pointers to unique identifier column  FOF to mesh cell it lies in  Sub-halo to its FOF group  galaxy to its sub-halo etc  Special design needed for  Hierarchical relations: merger trees  Spatial relations: multi-dimensional indexes required  Support for random sample selection

Garching, June 26, Formation histories: Subhalo and Galaxy merger trees  Tree structure  halos have single descendant  halos have main progenitor  Hierarchical structures usually handled using recursive code  inefficient for data access  not (well) supported in RDBs  Tree indexes  depth first ordering of nodes defines identifier  pointer to last progenitor in subtree

Garching, June 26,

Garching, June 26, Merger trees : select prog.* from galaxies des, galaxies prog where des.galaxyId = 0 and prog.galaxyId between des.galaxyId and des.lastProgenitorId Branching points : select descendantId from galaxies des where descendantId != -1 group by descendantId having count(*) > 1

Garching, June 26, Spatial queries, random samples  Spatial queries require multi-dimensional indexes.  (x,y,z) does not work: need discretisation  index on (ix,iy,iz) with ix=floor(x/10) etc  More sophisticated: space filling curves  bit-interleaving/oct-tree/Z-Index  Peano-Hilbert curve  Need custom functions for range queries (Implemented in T-SQL)  Random sampling using a RANDOM column  RANDOM from [0, ]

Garching, June 26, The Millennium Database web site  SQLServer 2005 database  Web application (Java in Apache Tomcat web server)  portal:  public DB access:  private access:  MyDB  Access methods  browser with plotting capabilities through VOPlot applet  wget + IDL, R  TOPCAT (3.1)

Garching, June 26,

Garching, June 26, Usage statistics  Up since August 2006 (astro-ph/ )  ~225 registered users  > 5 million queries  > 40 billion rows  ~130 papers, ~50% not related to Virgo consortium (see )

29 Some science questions and their implementation as SQL If time permits, in any case 1-1 demo possible.

Garching, June 26, Find light cone galaxies in a slice in redshift, RA and Dec select ra,dec,redshift_obs from kitzbichler2006a_obs where redshift_obs between 1 and 1.1 and dec between -.05 and.0

Garching, June 26, Color-magnitude for random sample of galaxies select mag_bdust, mag_bdust - mag_vdust as color, type from delucia2006a where snapnum=63 and random between 0 and 100 and mag_b < 0

Garching, June 26,

Garching, June 26, Get merger tree for identified galaxy select p.snapnum, p.x,p.y,p.z, p.stellarmass, p.mag_b-p.mag_v as color from delucia2006a d, delucia2006a p where d.galaxyid=0 and p.galaxyid between d.galaxyid and d.lastprogenitorid

Garching, June 26,

Garching, June 26,

Garching, June 26, Histogram of density field at redshifts 0,1,2,3; Gaussian smoothing 5 Mpc/h select snapnum,.01*floor(f.g5/.01) as g5, count(*) as num from mfield f where f.snapnum in (63,41,32,27) group by snapnum,.01*floor(f.g5/.01) order by 1,2

Garching, June 26,

Garching, June 26, FOF multiplicity function at redshifts 0,1,2,3, select snapnum,.1*floor(log10(np)/.1) as lognp, count(*) as num from fof where snapnum in (63,41,32,27) group by snapnum,.1*floor(log10(np)/.1) order by 1,2

Garching, June 26,

Garching, June 26, FOF mass multiplicity function, conditioned on density in environment select.1*floor(log10(fof.np)/.1) as lognp, count(*) as num from mfield f, fof where fof.snapnum=f.snapnum and fof.phkey = f.phkey and f.snapnum=63 and f.g5 between 1 and 1.1 group by.1*floor(log10(fof.np)/.1) order by 1

Garching, June 26,

42 Thank you !