Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining Virtual Universes Simulations in a relational database.

Similar presentations


Presentation on theme: "Mining Virtual Universes Simulations in a relational database."— Presentation transcript:

1 Mining Virtual Universes Simulations in a relational database

2 Computer simulations. Why?

3 Simple observations

4 Simple model

5 Simple, analytical solution

6 Complex observations

7 Galaxy merger John Hibbard http://www.cv.nrao.edu/~jhibbard/n4038/n4038.html NASA/CXC/SAO/G. Fabbiano et al.

8 X-Ray cluster 8 electron densitygas temperature gas pressure Courtesy Alexis Finoguenov, Ulrich Briel, Peter Schuecker, (MPE)

9 Galaxy survey

10 N-Body simulations

11 Simple dynamics

12 Complex solutions Only analytical solution for N=2 3 body not in general Let alone 10 billion bodies Need computer simulations approximations scaling like N^2,

13

14 14 Di Matteo, Springel and Hernquist, 2005 Courtesy Volker Springel Adding hydrodynamics and gas physics

15 Millennium- II Simulation 2015-06-03CMU, CosmoMLStat15 100 Mpc/h 10 10 particles 6.9 10 6 M sun /h ~10 million halos ~300GB/snapshot Boylan-Kolchin etal 2009

16 Millennium Simulation 2015-06-03CMU, CosmoMLStat16 MRII 500 Mpc/h 10 10 particles 8.6 10 8 M sun /h ~18 million halos ~300GB/snapsho t Springel etal 2005

17 MR- XXL 2015-06-03CMU, CosmoMLStat17 MR 3Gpc/h 3x10 11 particles 750 million halos /snapshot 9TB/snapshot browse

18 FOF groups, (sub)halos and galaxies

19

20 2015-06-03CMU, CosmoMLStat20 Raw data: Particles FOF groups and Subhalos Density fields Subhalo merger trees Synthetic galaxies (SAM) Mock catalogues

21 click

22 millimil@CasJobs

23 Revisit relational databases again http://www.sdss.jhu.edu/~szalay/class/2015/gl/IntroRDB.html indexing: trees and spatial

24

25 INDEX-ing Performance: disk IO is bottleneck Avoid it as much as possible, but can not store whole DB in memory To find rows of interest, avoid having to scan complete tables sequential scan ~ O(N) ~10 min for galaxy tables (10 9 rows, 250 GB) Binary search speed up: requires ordering ~ O(log(N)) B-Trees Can only order in one way: create external data structure, INDEX, ordered according to >=1 columns, with direct pointer to row.

26 snapnum, stellarMass, galaxyid Indexes mag_b snapnum, x

27 B-tree

28 Special indexes trees spatial

29 Time evolution: merger trees

30

31 Formation histories: Subhalo and Galaxy merger trees Tree structure halos have single descendant halos have main progenitor Hierarchical structures usually handled using recursive code inefficient for data access not (well) supported in RDBs Tree indexes depth first ordering of trees label by rank in order pointer to “last progenitor” below each node all progenitors have label BETWEEN label of root AND that of last progenitor cluster table on label

32

33 select prog.snapnum, prog.x, prog.y, prog.np from millimil..mpahalo des, millimil..mpahalo prog where prog.haloId between des.haloId and des.lastProgenitorId and des.haloId = 0

34

35 select prog.snapnum, prog.x, prog.y, prog.mag_b-prog.mag_v as color from millimil..delucia2006a des, millimil..delucia2006a prog where prog.galaxyId between des.galaxyId and des.lastProgenitorId and des.galaxyId = 0 (See topcat) Galaxies

36 2007-01-17/19 LeidenMillennium DB Tutorial Some more features of the merger tree data model Leaves : select galaxyId as leaf from galaxies des where galaxyId = lastProgenitorId Branching points : select descendantId from galaxies des where descendantId != -1 group by descendantId having count(*) > 1

37 2007-01-17/19 LeidenMillennium DB Tutorial Main branches Roots and leaves: select des.galaxyId as rootId, min(prog.lastprogenitorid) as leafId into rootLeaf from mpagalaxies..delucia2006a des, mpagalaxies.. delucia2006a prog where des.galaxyId = 0 and prog.galaxyId between des.galaxyId and des.lastProgenitorId Main branch select rl.rootId, b.* from rootLeaf rl, mpagalaxies..delucia2006a b where b.galaxyId between rl.rootId and rl.leafId

38 Query particles in volume 38

39 Find all halos in a subvolume of space: 10 <= x < 20 20 <= y < 30 0 <= z < 10

40 Inefficient, even when indexed select x,y,z from mpahalotrees..mhalo where snapnum = 63 and x between 10 and 20 and y between 20 and 30 and z between 0 and 10

41 Why inefficient x y z 15.00108342.47132524.673561 15.00124758.42091442.722874 15.00221538.04248429.557423 15.00273550.48778557.716877 15.00275320.0001778.21466 15.00509513.63759916.135191 15.00659322.17082848.242783 15.01148824.82443819.773285 15.01174148.09990711.500685 15.01186823.31226527.858799 15.01306523.96951518.883507 15.01315856.04186640.82894 15.01436159.50335745.31733 15.01732246.25766444.37695 15.01820227.3338959.441319

42 Spatial indexes Performance of finding things is improved if those things are co-located on disk: ordering, indices Co-locating a 3D configuration of points on a 1D disk can only be done approximately Space filling curves: Peano-Hilbert requires user defined functions to use Simpler: Zones

43 Query particles in volume 43

44 Index cells using space filling curve 44

45 Query particles in sphere/box Calculate overlap space filling curve with query volume Decide (from index table) which files to query And where to seek, how far to scan Implement as SQLCLR table-valued-function Run from database 2015-06-03CMU, CosmoMLStat45

46 Simpler: Zones

47 Zone index Coarse sampling of points in multiple dimensions allows simple multi-dimensional ordering ix = floor(x/10Mpc) iy = floor(y/10Mpc) iz = floor(z/10Mpc) index on (snapnum,ix,iy,iz,x,y,z,galaxyId)

48 IXIYIZ XY Z 12015.06180420.8919074.4156647 12015.06933623.4376019.812217 12015.10067820.9056424.613036 12015.17396822.368838.01832 12015.19412220.675834.8034463 12015.250030524.2466831.6651521 12015.36557623.2907549.404872 12015.37260620.2036912.0006201 12015.52469621.039974.280077 12015.58394322.3446229.421347 12015.635838526.7859049.881406 12015.6638322.8299837.137772 12015.67380326.9182913.302736 12015.71782422.3653419.221828 12015.84799224.7007471.389664 12015.88389622.5938197.277129 12015.9104126.5311182.5693457 12015.91690527.1378674.289855 12016.04733328.938115.414605

49 Using zones select x,y,z from mpahalo where snapnum = 63 and ix = 1 and iy = 2 and iz = 0 NB does NOT include galaxies with x=20 exactly!

50 “20 questions” 1.Return the (B-band luminosity function of) galaxies residing in halos of mass between 10^13 and 10^14 solar masses. 2.Return the galaxy content at z=3 of the progenitors of a halo identified at z=0 3.Return all the galaxies within a sphere of radius 3Mpc around a particular halo 4.Return the complete halo merger tree for a halo identified at z=0 5.Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour and bulge-to-disk ratio within given intervals. 6.Find properties of all galaxies in haloes of mass 10**14 at redshift 1 which have had a major merger (mass-ratio < 4:1) since redshift 1.5. 7.Find all the z=3 progenitors of z=0 red ellipticals (i.e. B-V>0.8 B/T > 0.5) 8.Find the descendents at z=1 of all LBG's (i.e. galaxies with SFR>10 Msun/yr) at z=3 9.Make a list of all haloes at z=3 which contain a galaxy of mass >10**9 Msun which is a progenitor of BCG's in z=0 cluster of mass >10**14.5 10.Find all z=3 galaxies which have NO z=0 descendant. 11.Return the complete galaxy merging history for a given z=0 galaxy. 12.Find all the z=2 galaxies which were within 1Mpc of a LBG (i.e. SFR>10Msun/yr) at some previous redshift. 13.Find the multiplicity function of halos depending on their environment (over density of density field smoothed on certain scale) 14.Find the dependency of halo formation times on environment


Download ppt "Mining Virtual Universes Simulations in a relational database."

Similar presentations


Ads by Google