Cosmological simulations in a relational database: modelling and storing merger trees Gerard Lemson, GAVO, Max-Planck-Institut für extraterrestrische Physik,

Slides:



Advertisements
Similar presentations
Building a Mock Universe Cosmological nbody dark matter simulations + Galaxy surveys (SDSS, UKIDSS, 2dF) Access to mock catalogues through VO Provide analysis.
Advertisements

CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Chapter 5: Introduction to Information Retrieval
Chapter 12 File Processing and Data Management Concepts
18 July Monte Carlo Markov Chain Parameter Estimation in Semi-Analytic Models Bruno Henriques Peter Thomas Sussex Survey Science Centre.
Why Environment Matters more massive halos. However, it is usually assumed in, for example, semianalytic modelling that the merger history of a dark matter.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Studying the mass assembly and luminosity gap in fossil groups of galaxies from the Millennium Simulation Ali Dariush, University of Birmingham Studying.
Environmental dependence of halo formation times Geraint Harker.
/19 LeidenMillennium DB Tutorial Introduction to the Millennium Database with an SQL tutorial.
Physical design. Stage 6 - Physical Design Retrieve the target physical environment Create physical data design Create function component implementation.
Cosmological constraints from models of galaxy clustering Abstract Given a dark matter distribution, the halo occupation distribution (HOD) provides a.
GIANT TO DWARF RATIO OF RED-SEQUENCE GALAXY CLUSTERS Abhishesh N Adhikari Mentor-Jim Annis Fermilab IPM / SDSS August 8, 2007.
Cool white dwarfs in the Sloan & SuperCOSMOS Sky Surveys Nigel Hambly, Wide Field Astronomy Unit, IfA, University of Edinburgh.
Modelling radio galaxies in simulations: CMB contaminants and SKA / Meerkat sources by Fidy A. RAMAMONJISOA MSc Project University of the Western Cape.
Algorithms and data structures Protected by
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Toledo, MPA access methods and plans With contributions from JHU : Alex Szalay, Jan Vanderberg MPA: Jeremy Blaizot,
The GAVO Cross-Matcher Application Hans-Martin Adorf, Gerard Lemson, Wolfgang Voges GAVO, Max-Planck-Institut für extraterrestrische Physik, Garching b.
TIG session 3+Millennium database Millennium Database Overview and some first usage experiences Gerard Lemson and the Virgo Consortium astro-ph/
Theory in the German Astrophysical VO Summary: We show results of efforts done within the German Astrophysical Virtual Observatory (GAVO). GAVO has paid.
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
, Tuorla Observatory 1 Galaxy groups in ΛCDM simulations and SDSS DR5 P. Nurmi, P. Heinämäki, S. Niemi, J. Holopainen Tuorla Observatory E. Saar,
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Dissemination of simulations in the Virtual Observatory Gerard Lemson German Astrophysical Virtual Observatory, Max-Planck Institute for extraterrestrial.
Theory in the Virtual Observatory Gerard Lemson, GAVO.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
Federation and Fusion of astronomical information Daniel Egret & Françoise Genova, CDS, Strasbourg Standards and tools for the Virtual Observatories.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Sean Passmoor Supervised by Dr C. Cress Simulating the Radio Sky.
Workshop Garching, June 27 – July Statistical Cross-Matching Across Distributed Archives H.-M. Adorf & GAVO Team MPI f. extraterrestrische Physik.
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
Millennium Data Dissemination databases and other services Millennium Workshop1.
Making a virtual Universe Adrian Jenkins - ICC, Durham University.
Chapter 10 Designing the Files and Databases. SAD/CHAPTER 102 Learning Objectives Discuss the conversion from a logical data model to a physical database.
CMU-CS lunch talk, Gerard Lemson1 Computational and statistical problems for the Virtual Observatory With contributions from/thanks to: GAVO.
GES 2007, The German Astrophysical Virtual Observatory (GAVO) Knowledge Networking for Astronomy in Germany and abroad Gerard Lemson 1,2, Wolfgang.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
The International Virtual Observatory Alliance (IVOA) interoperability in action.
Modeling the dependence of galaxy clustering on stellar mass and SEDs Lan Wang Collaborators: Guinevere Kauffmann (MPA) Cheng Li (MPA/SHAO, USTC) Gabriella.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Population of Dark Matter Subhaloes Department of Astronomy - UniPD INAF - Observatory of Padova Carlo Giocoli prof. Giuseppe Tormen May Blois.
1 Analysing Cosmological Simulations in the Virtual Observatory: Designing and Mining the Millennium Simulation Database Gerard Lemson German Astrophysical.
Session 1 Module 1: Introduction to Data Integrity
Mining Virtual Universes Simulations in a relational database.
Reproducing the Observed Universe with Simulations Qi Guo Max Planck Institute for Astrophysics MPE April 8th, 2008.
Models & Observations galaxy clusters Gabriella De Lucia Max-Planck Institut für Astrophysik Ringberg - October 28, 2005.
Semi-analytical model of galaxy formation Xi Kang Purple Mountain Observatory, CAS.
Banaras Hindu University. A Course on Software Reuse by Design Patterns and Frameworks.
Feasibility of detecting dark energy using bispectrum Yipeng Jing Shanghai Astronomical Observatory Hong Guo and YPJ, in preparation.
Three color composite image Color-Color diagram SED SED relative to R-band Download FITS Searching for a cosmic string through the gravitational lens effect:
Presented By:. What is JavaHelp: Most software developers do not look forward to spending time documenting and explaining their product. JavaSoft has.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
JENAM 2008 Theory Standards for the Virtual Observatory SimDB + SimDAP.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Understanding Core Database Concepts Lesson 1. Objectives.
CS522 Advanced database Systems
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
The mock galaxy catalogue for HI survey based on SAMs of galaxy formation 富坚 Guiyang, FRA2015 Shanghai Astronomical Observatory.
Virtual Observatory for cosmological simulations
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Database Management System
Databases and Information Management
Voids size distribution in the 2dFGRS
Lecture 20: Indexes Monday, February 27, 2006.
Understanding Core Database Concepts
Presentation transcript:

Cosmological simulations in a relational database: modelling and storing merger trees Gerard Lemson, GAVO, Max-Planck-Institut für extraterrestrische Physik, Garching, Germany Volker Springel, Max-Planck-Institut für Astrophysik, Garching, Germany Abstract:We present a method for storing tree-like data structures in a relational database that allows for fast querying of children and parents of any node and from and down to any level. We have used this method in storing halo merger trees derived from a large cosmological N-body simulation and the merger trees of model galaxy catalogues derived from the halo catalogues using semi-analytical methods. We give SQL queries corresponding to typical science questions that can be asked from such a database and present an online query interface available through the web portal of the German Astrophysical Virtual Observatory. References: [1] Springel V., White S.D.M., et al, 2005, Nature, 435, 629 [2] [3] Croton D.J., Springel V., et al, 2005, MNRAS, submitted (astro-ph/ ) [4] de Lucia G., Kauffmann G. & White S.D.M, MNRAS. 349 (2004) 1101 [5] Gray J., Szalay A., et al, [6] Gao L., Springel V., White S.D.M, 2005, MNRAS, in press (astro-ph/ ) [7] Lemson G., Kauffmann G, 1999, MNRAS 302, 111 Background and goals: This work was done in the context of the German Astrophysical Virtual Observatory (GAVO). GAVO pays special attention to the introduction of theory data (simulations) into the Virtual Observatory (VO). To test our ideas we have created various prototype implementations. Our main goal for the project presented here was to investigate the use of relational database technology in the analysis of results of large scale structure simulations, as well as in their online publication. The former may lead to direct scientific benefits to the owners of the data, the latter leads to benefits to the larger community that gets access to the data in a well defined and standardized manner. Database Implementation: The design of the information system started with the construction of an analysis model, shown in Fig. 3a. It contains the important concepts and their relationships of the domain under investigation (see [8] and references therein). Important for this project are simulator (the code), simulation (the running of the simulator with particular input parameters) and its snapshots. The actual data stored in the database are the results of post-processing: cluster extraction and galaxy formation. In our model all of these specialize a common pattern that in [8] is identified with the basic concepts: protocol, experiment and result. They are especially important for describing the provenance of the data. The physical database model is restricted to the data part of the conceptual model. It is more constrained in that it must fit the data in a relational model, that it must enable translation of the science questions into (relatively easy) SQL and moreover that it do so efficiently. The science questions deal with relations between different types of objects, between object and environment and, especially, with the formation history of objects. The history is embodied in the merger trees of both halos and galaxies. One can store trees using a single link from progenitor to descendant, but this requires recursion to retrieve a complete progenitor tree. This is not a standard feature of all relational databases and a more efficient solution is desirable even where it is supported. Fig. 2 illustrates our solution. Each object gets an identifier corresponding to its order in a depth first sort of the trees rooted in objects at the final snapshot. Each object furthermore gets a pointer (foreign key) to the last progenitor in the ordering of the sub-tree rooted in that object. The complete progenitor tree rooted in a given object (at any snapshot !) is now precisely the set of objects whose identifier has value between the root object’s id and the id of the last progenitor. In SQL the relevant query is as follows: select prog.* from halo des, halo prog where des.haloId = example value and prog.haloId between des.haloId and des.lastProgId This is the query corresponding to science question 1 above. In the database the tables are clustered (ordered) according to the id columns, which ensures that merger trees are sequentially stored on the disks, speeding up the retrieval. One other feature of the data model is the spatial indexing based on the Peano-Hilbert space filling curve. The Millennium simulation’s files are organized around this index (see [9]), which is a higher dimensional equivalent to the recursive HTM [10] or HEALPix [11] indexes on the sky. In the database it will likewise allow efficient spatial searches, though for now it is used to link the objects and the density field. Simulation and science questions: The simulation that was used in this prototype is a relatively small, dark matter, cosmological N-body simulation, that was created as preparation for the Millennium simulation [1,2]. For this project we were interested in post-processing products of this simulation: density fields, halo catalogues including halo merger trees and mock galaxy catalogues. The latter were produced using semi-analytical galaxy formation (SAGF) routines that use the merger trees as input (see [3,4] for descriptions of the SAGF algorithms). The database was designed to answer a number of science questions, similar to the Approach in [5]. We polled astrophysicists associated to the simulation project, which resulted in the following list which is a subset of these questions: 1.Return the complete halo merger tree for a halo identified at z=0 2.Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour and bulge-to-disk ratio within given intervals. 3.Return B-band luminosity function of galaxies residing in halos of mass between 10^13 and 10^14 solar masses. 4.Return the formation time of halos, defined as the maximum time at which it still has a progenitor of greater than half its mass, as function of the matter density in its environment, defined by the matter density smoothed on scale of 10Mpc (inspired by [6,7]). Webportal and example queries: The database is accessible online from a special purpose web application accessible through the GAVO portal ( which follows design ideas from the SkyServer [12] and GalICS [13] web applications. The user can type in free-form SQL queries and retrieve the result in a variety of formats: HTML, CSV, VOTable (Fig 4b). A particular feature is the ability to visualise the results directly via a VOPlot [14] applet (see Fig 4c). A number of example queries are available. In Fig 5. we show the queries corresponding to the other three science questions above. DEMO This GAVO web application is being demonstrated at this conference. Fig. 1: Slice through the density field of the Millennium simulation at redshift z=0. The slice is 15 Mpc/h thick. Fig 2: Illustration of the merger tree structure of objects (halos/galaxies) in the simulation. The black lines indicate the traditional, descendant pointers. The red lines indicate the pointer structure used in the database model. [8] Lemson, G., Dowler, P., Banday, A.J. 2003, in ASP Conf. Ser., Vol. 314 Astronomical Data Analysis Software and Systems XIII, eds. F. Ochsenbein, M. Allen, & D. Egret (San Francisco: ASP), 472 [9] Springel, V., 2005, MNRAS, submitted (astro-ph/ ) [10] [11] [12] [13] [14] Fig. 3: Formal datamodels used in the design of the Millennium database. (a) shows an analysis model (UML), detailing the important domain concepts and their interrelationships. (b) shows a schematic relational model (ER) for the tables in the database and their foreign key relations. a b Fig 4: Snapshots of the GAVO portal web pages providing access to the simulation database. (a) shows the query page, with demo queries and links to the schema and documentation. (b) shows the result of the query in (a) in VOTable format. (c) show the same result plotted with VOPlot. The query implements science question number 1 and the plot shows the evolution of the merger tree below a given halo at redshift 0 by plotting the X-position vs the snapshot number. This gives a very nice illustration of the orbits of the halos and their merging behaviour. a c b 2. select x,y,z, velX, velY, velZ from MMGalaxy where mag_b between –23 and –18 and bulgeMass >=.1*stellarMass 3. select.2*round(5*g.mag_b) as magB, count(*) as num from MMGalaxy g, MMHalo h where g.haloId = h.haloId and h.mTopHat between 1000 and and h.redshift=0 group by magB 4. select zForm, avg(g10) as g10 from MMField f, ( select des.haloId, des.phkey, max(PROG.redshift) as zForm from MMHalo PROG, MMHalo DES where DES.redshift = 0 and PROG.haloId between DES.haloId and DES.lastProgenitorId and prog.np >= des.np/2 and des.np between 100 and 200 group by des.haloId, des.phkey ) t where t.phkey = f.phkey and f.snapnum=63 group by zForm Fig 5: SQL implementations of science questions 2-4. The database dialect is Postgres.