Mining Virtual Universes Simulations in a relational database.

Mining Virtual Universes Simulations in a relational database

Computer simulations. Why?

Simple observations

Simple model

Simple, analytical solution

Complex observations

Galaxy merger John Hibbard http://www.cv.nrao.edu/~jhibbard/n4038/n4038.html NASA/CXC/SAO/G. Fabbiano et al.

X-Ray cluster 8 electron densitygas temperature gas pressure Courtesy Alexis Finoguenov, Ulrich Briel, Peter Schuecker, (MPE)

Galaxy survey

N-Body simulations

Simple dynamics

Complex solutions Only analytical solution for N=2 3 body not in general Let alone 10 billion bodies Need computer simulations approximations scaling like N^2,

14 Di Matteo, Springel and Hernquist, 2005 Courtesy Volker Springel Adding hydrodynamics and gas physics

Millennium- II Simulation 2015-06-03CMU, CosmoMLStat15 100 Mpc/h 10 10 particles 6.9 10 6 M sun /h ~10 million halos ~300GB/snapshot Boylan-Kolchin etal 2009

Millennium Simulation 2015-06-03CMU, CosmoMLStat16 MRII 500 Mpc/h 10 10 particles 8.6 10 8 M sun /h ~18 million halos ~300GB/snapsho t Springel etal 2005

MR- XXL 2015-06-03CMU, CosmoMLStat17 MR 3Gpc/h 3x10 11 particles 750 million halos /snapshot 9TB/snapshot browse

FOF groups, (sub)halos and galaxies

2015-06-03CMU, CosmoMLStat20 Raw data: Particles FOF groups and Subhalos Density fields Subhalo merger trees Synthetic galaxies (SAM) Mock catalogues

millimil@CasJobs

Revisit relational databases again http://www.sdss.jhu.edu/~szalay/class/2015/gl/IntroRDB.html indexing: trees and spatial

INDEX-ing Performance: disk IO is bottleneck Avoid it as much as possible, but can not store whole DB in memory To find rows of interest, avoid having to scan complete tables sequential scan ~ O(N) ~10 min for galaxy tables (10 9 rows, 250 GB) Binary search speed up: requires ordering ~ O(log(N)) B-Trees Can only order in one way: create external data structure, INDEX, ordered according to >=1 columns, with direct pointer to row.

snapnum, stellarMass, galaxyid Indexes mag_b snapnum, x

B-tree

Special indexes trees spatial

Time evolution: merger trees

Formation histories: Subhalo and Galaxy merger trees Tree structure halos have single descendant halos have main progenitor Hierarchical structures usually handled using recursive code inefficient for data access not (well) supported in RDBs Tree indexes depth first ordering of trees label by rank in order pointer to “last progenitor” below each node all progenitors have label BETWEEN label of root AND that of last progenitor cluster table on label

select prog.snapnum, prog.x, prog.y, prog.np from millimil..mpahalo des, millimil..mpahalo prog where prog.haloId between des.haloId and des.lastProgenitorId and des.haloId = 0

select prog.snapnum, prog.x, prog.y, prog.mag_b-prog.mag_v as color from millimil..delucia2006a des, millimil..delucia2006a prog where prog.galaxyId between des.galaxyId and des.lastProgenitorId and des.galaxyId = 0 (See topcat) Galaxies

2007-01-17/19 LeidenMillennium DB Tutorial Some more features of the merger tree data model Leaves : select galaxyId as leaf from galaxies des where galaxyId = lastProgenitorId Branching points : select descendantId from galaxies des where descendantId != -1 group by descendantId having count(*) > 1

2007-01-17/19 LeidenMillennium DB Tutorial Main branches Roots and leaves: select des.galaxyId as rootId, min(prog.lastprogenitorid) as leafId into rootLeaf from mpagalaxies..delucia2006a des, mpagalaxies.. delucia2006a prog where des.galaxyId = 0 and prog.galaxyId between des.galaxyId and des.lastProgenitorId Main branch select rl.rootId, b.* from rootLeaf rl, mpagalaxies..delucia2006a b where b.galaxyId between rl.rootId and rl.leafId

Query particles in volume 38

Find all halos in a subvolume of space: 10 <= x < 20 20 <= y < 30 0 <= z < 10

Inefficient, even when indexed select x,y,z from mpahalotrees..mhalo where snapnum = 63 and x between 10 and 20 and y between 20 and 30 and z between 0 and 10

Why inefficient x y z 15.00108342.47132524.673561 15.00124758.42091442.722874 15.00221538.04248429.557423 15.00273550.48778557.716877 15.00275320.0001778.21466 15.00509513.63759916.135191 15.00659322.17082848.242783 15.01148824.82443819.773285 15.01174148.09990711.500685 15.01186823.31226527.858799 15.01306523.96951518.883507 15.01315856.04186640.82894 15.01436159.50335745.31733 15.01732246.25766444.37695 15.01820227.3338959.441319

Spatial indexes Performance of finding things is improved if those things are co-located on disk: ordering, indices Co-locating a 3D configuration of points on a 1D disk can only be done approximately Space filling curves: Peano-Hilbert requires user defined functions to use Simpler: Zones

Query particles in volume 43

Index cells using space filling curve 44

Query particles in sphere/box Calculate overlap space filling curve with query volume Decide (from index table) which files to query And where to seek, how far to scan Implement as SQLCLR table-valued-function Run from database 2015-06-03CMU, CosmoMLStat45

Simpler: Zones

Zone index Coarse sampling of points in multiple dimensions allows simple multi-dimensional ordering ix = floor(x/10Mpc) iy = floor(y/10Mpc) iz = floor(z/10Mpc) index on (snapnum,ix,iy,iz,x,y,z,galaxyId)

IXIYIZ XY Z 12015.06180420.8919074.4156647 12015.06933623.4376019.812217 12015.10067820.9056424.613036 12015.17396822.368838.01832 12015.19412220.675834.8034463 12015.250030524.2466831.6651521 12015.36557623.2907549.404872 12015.37260620.2036912.0006201 12015.52469621.039974.280077 12015.58394322.3446229.421347 12015.635838526.7859049.881406 12015.6638322.8299837.137772 12015.67380326.9182913.302736 12015.71782422.3653419.221828 12015.84799224.7007471.389664 12015.88389622.5938197.277129 12015.9104126.5311182.5693457 12015.91690527.1378674.289855 12016.04733328.938115.414605

Using zones select x,y,z from mpahalo where snapnum = 63 and ix = 1 and iy = 2 and iz = 0 NB does NOT include galaxies with x=20 exactly!

“20 questions” 1.Return the (B-band luminosity function of) galaxies residing in halos of mass between 10^13 and 10^14 solar masses. 2.Return the galaxy content at z=3 of the progenitors of a halo identified at z=0 3.Return all the galaxies within a sphere of radius 3Mpc around a particular halo 4.Return the complete halo merger tree for a halo identified at z=0 5.Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour and bulge-to-disk ratio within given intervals. 6.Find properties of all galaxies in haloes of mass 10**14 at redshift 1 which have had a major merger (mass-ratio < 4:1) since redshift 1.5. 7.Find all the z=3 progenitors of z=0 red ellipticals (i.e. B-V>0.8 B/T > 0.5) 8.Find the descendents at z=1 of all LBG's (i.e. galaxies with SFR>10 Msun/yr) at z=3 9.Make a list of all haloes at z=3 which contain a galaxy of mass >10**9 Msun which is a progenitor of BCG's in z=0 cluster of mass >10**14.5 10.Find all z=3 galaxies which have NO z=0 descendant. 11.Return the complete galaxy merging history for a given z=0 galaxy. 12.Find all the z=2 galaxies which were within 1Mpc of a LBG (i.e. SFR>10Msun/yr) at some previous redshift. 13.Find the multiplicity function of halos depending on their environment (over density of density field smoothed on certain scale) 14.Find the dependency of halo formation times on environment

Mining Virtual Universes Simulations in a relational database.

Similar presentations

Presentation on theme: "Mining Virtual Universes Simulations in a relational database."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mining Virtual Universes Simulations in a relational database.

Similar presentations

Presentation on theme: "Mining Virtual Universes Simulations in a relational database."— Presentation transcript:

Similar presentations

About project

Feedback