Download presentation
Presentation is loading. Please wait.
1
2007-01-17/19 LeidenMillennium DB Tutorial Introduction to the Millennium Database with an SQL tutorial
2
2007-01-17/19 LeidenMillennium DB Tutorial Overview Why relational database ? Overview relational databases –general –Millennium DB design SQL Tutorial Science queries Tools Advanced subjects (not now)
3
2007-01-17/19 LeidenMillennium DB Tutorial Website documentation http://www.g-vo.org/Millennium/Help http://www.g-vo.org/Millennium/Help
4
2007-01-17/19 LeidenMillennium DB Tutorial Why use relational database ? encapsulation of data in terms of rigorous logical model –no need to know about internals of data storage –forces one to think carefully about data structure ANSI standard query language (SQL) for finding information one is interested in –remote filtering –speeds up path from science question to answer –facilitates communication many implementations, commercial and open source –advanced query optimizers (indexes, clustering)
5
2007-01-17/19 LeidenMillennium DB Tutorial Relational Database concepts Millennium database design
6
2007-01-17/19 LeidenMillennium DB Tutorial Relational database stores data in relations ( = tables)
7
2007-01-17/19 LeidenMillennium DB Tutorial Tables Tables have names Related data values are stored in rows Rows have columns –all the same for a given table Columns have names and data types Rows often have a unique identifier consisting of the values of >= 1 columns: primary key
8
2007-01-17/19 LeidenMillennium DB Tutorial Column Row Primary Key Column Foreign Key Columns
9
2007-01-17/19 LeidenMillennium DB Tutorial Foreign keys Database can contain many tables The set of table definitions in a database is called the schema of the database Tables can related by foreign keys: pointers (by value) from a row in one table to a row in another (or possibly the same) table Why not combine these rows into one table ? Consider storing galaxies, with info about their sub-halo as well as the FOF groups these live in. Note, a subhalo contains >=1 galaxies, a FOF halo >= 0 subhalos
10
2007-01-17/19 LeidenMillennium DB Tutorial galIdmStarmagBXhaloIdnphXvMaxfofIdnSubm200fX 1120.215-17.97.666251007.61651232445.777.6 1130.038-15.67.466251007.61651232445.777.6 1540.173-17.17.656626657.91301232445.777.6 2211.20-20.735.1788345235.12004562101.3235.1 2230.225-19.735.0788345235.12004562101.3235.1 2250.04-17.534.9788345235.12004562101.3235.1 2781.54-19.435.2788425535.21904562101.3235.1 … One table: redundancy GalaxyEtc
11
2007-01-17/19 LeidenMillennium DB Tutorial fofIdnSubm200x… 1232445.777.6… 4562101.3235.1… 789170.067.0… …………… haloIdfofIdNpXvMax… 66251231007.6165… 6626123657.9130… 788345645235.1200… 788445625535.2190… 98857893067.0110… ……………… galIdhaloIdmStarmagBX… 11266250.215-17.97.6… 11366250.038-15.67.4… 15466260.173-17.17.65… 22178831.20-20.735.1… 22378830.225-19.735.0… 22578830.04-17.534.9… 27878841.54-19.435.2… ……………… Normalization Galaxy SubHalo FOF
12
2007-01-17/19 LeidenMillennium DB Tutorial DHalo DSubHalo SubHalo Bower2006a DeLucia2006aMPAHalo FOF Millennium database MField MPAMocks
13
2007-01-17/19 LeidenMillennium DB Tutorial Web browser: http://www.g-vo.org/Millennium http://www.g-vo.org/MyMillennium http://www.g-vo.org/Millenniumhttp://www.g-vo.org/MyMillennium
14
2007-01-17/19 LeidenMillennium DB Tutorial SQL Tutorial
15
2007-01-17/19 LeidenMillennium DB Tutorial SQL Sequentiual Query Language Filtering, combining, sub-setting of tables Functions, procedures, aggregations Data manipulation: insert/update/delete A query produces tabular results, which can be used as tables again in sub-queries, or stored in a database Table creation...
16
2007-01-17/19 LeidenMillennium DB Tutorial Table creation statement create table MPAHalo ( haloId long not null, descendantId long, -- foreign key lastProgenitorId long, -- foreign key snapnum integer, redshift real, x real,y real,z real, np integer, velDisp real, vmax real,..., primary key (haloId) );
17
2007-01-17/19 LeidenMillennium DB Tutorial SELECT... FROM... WHERE... 1. select * from MPAHalo 2. select snapnum, redshift, np from MPAHalo 3. select * from MPAHalo where redshift = 0
18
2007-01-17/19 LeidenMillennium DB Tutorial WHERE conditions = <> != = np between 100 and 200 name like ‘%Frenk’ a=b and d=e a=b or e=d id in (1,2,3) a is null a is not null exists... (later)
19
2007-01-17/19 LeidenMillennium DB Tutorial Custom column names select snapnum as snapshotIndex, redshift as z, np as numberOfParticles from MPAHalo
20
2007-01-17/19 LeidenMillennium DB Tutorial Demo queries select x,y from MPAHalo where z between 10 and 12 and np > 50 and snapnum = 63 select haloid,snapnum from MPAHalo where np = 100 select * from snapshots
21
2007-01-17/19 LeidenMillennium DB Tutorial ORDER BY... [ASC | DESC] select h.* from MPAHalo h order by h.snapnum desc, h.x asc
22
2007-01-17/19 LeidenMillennium DB Tutorial TOP select top 10 haloid, np from mpahalo where snapnum = 63 order by np desc
23
2007-01-17/19 LeidenMillennium DB Tutorial Aggregation: count, sum, max, min, avg, stddev select count(*) as num, max(stellarmass) as maxmass, avg(stellarmass) as avgmass from delucia2006a where snapnum = 63 and type = 1
24
2007-01-17/19 LeidenMillennium DB Tutorial JOIN (note the aliases) select h.haloid, g.stellarMass from delucia2006a g, mpahalo h where h.np = 1000 and g.haloid = h.haloid haloIdfofIdNpXvMax 66251231007.6165 6626123657.9130 788345645235.1200 788445625535.2190 98857893067.0110 galIdhaloIdmStarmagBX 11266250.215-17.97.6 11366250.038-15.67.4 15466260.173-17.17.65 22178831.20-20.735.1 22378830.225-19.735.0 22578830.04-17.534.9 27878841.54-19.435.2
25
2007-01-17/19 LeidenMillennium DB Tutorial Demo: galaxies in massive halos select h.haloId, g.* from DeLucia2006a g, MPAHalo h where h.snapnum = 63 and h.np between 10000 and 11000 and g.haloId = h.haloId
26
2007-01-17/19 LeidenMillennium DB Tutorial Demo: direct progenitors of massive halos select prog.* from MPAHalo prog, MPAHalo des where des.haloId = prog.descendantId and des.np > 10000 and des.snapnum = 63
27
2007-01-17/19 LeidenMillennium DB Tutorial GROUP BY select redshift, type, count(*) as numGal, avg(stellarMass) as m_avg, max(stellarMass) as m_max from DeLucia2006a group by redshift, type order by redshift, type
28
2007-01-17/19 LeidenMillennium DB Tutorial Sub-selects select g.galaxyId from DeLucia2006a g, (select top 10 haloId from mpahalo where snapnum = 63 order by np desc) mh where g.haloId = mh.haloId
29
2007-01-17/19 LeidenMillennium DB Tutorial Science questions as SQL
30
2007-01-17/19 LeidenMillennium DB Tutorial Motivation for data model 1.Return the galaxies residing in halos of mass between 10^13 and 10^14 solar masses. 2.Return the galaxy content at z=3 of the progenitors of a halo identified at z=0 3.Return all the galaxies within a sphere of radius 3Mpc around a particular halo 4.Return the complete halo merger tree for a halo identified at z=0 5.Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour and bulge-to-disk ratio within given intervals. 6.Find properties of all galaxies in haloes of mass 10**14 at redshift 1 which have had a major merger (mass-ratio < 4:1) since redshift 1.5. 7.Find all the z=3 progenitors of z=0 red ellipticals (i.e. B-V>0.8 B/T > 0.5) 8.Find the descendents at z=1 of all LBG's (i.e. galaxies with SFR>10 Msun/yr) at z=3 9.Make a list of all haloes at z=3 which contain a galaxy of mass >10**9 Msun which is a progenitor of BCG's in z=0 cluster of mass >10**14.5 10.Find all z=3 galaxies which have NO z=0 descendant. 11.Return the complete galaxy merging history for a given z=0 galaxy. 12.Find all the z=2 galaxies which were within 1Mpc of a LBG (i.e. SFR>10Msun/yr) at some previous redshift. 13.Find the multiplicity function of halos depending on their environment (overdensity of density field smoothed on certain scale) 14.Find the dependency of halo formation times on environment (“Gao-effect”)
31
2007-01-17/19 LeidenMillennium DB Tutorial select x,y,z,velX, velY, velZ from DeLucia2006a where mag_b between -23 and -18 and bulgeMass >=.9*stellarMass and snapnum = 50 5. Find positions and velocities for all galaxies at redshift zero with B-luminosity, colour and bulge-to-disk ratio within given intervals.
32
2007-01-17/19 LeidenMillennium DB Tutorial 4. Return the complete halo merger tree for a halo identified at z=0
33
2007-01-17/19 LeidenMillennium DB Tutorial Efficient storage of merger trees in a relational database Goal: allow queries for the formation history of any object No recursion possible in RDB, nor desired Method: –depth first ordering of trees –label by rank in order –pointer to “last progenitor” below each node –all progenitors have label BETWEEN label of root AND that of last progenitor –cluster table on label
34
2007-01-17/19 LeidenMillennium DB Tutorial Merger trees
35
2007-01-17/19 LeidenMillennium DB Tutorial select prog.snapnum, prog.x, prog.y, prog.np from millimil..mpahalo des, millimil..mpahalo prog where prog.haloId between des.haloId and des.lastProgenitorId and des.haloId = 0
36
2007-01-17/19 LeidenMillennium DB Tutorial Some more features of the merger tree data model Leaves : select galaxyId as leaf from galaxies des where galaxyId = lastProgenitorId Branching points : select descendantId from galaxies des where descendantId != -1 group by descendantId having count(*) > 1
37
2007-01-17/19 LeidenMillennium DB Tutorial Main branches Roots and leaves: select des.galaxyId as rootId, min(prog.lastprogenitorid) as leafId into rootLeaf from mpagalaxies..delucia2006a des, mpagalaxies.. delucia2006a prog where des.galaxyId = 0 and prog.galaxyId between des.galaxyId and des.lastProgenitorId Main branch select rl.rootId, b.* from rootLeaf rl, mpagalaxies..delucia2006a b where b.galaxyId between rl.rootId and rl.leafId
38
2007-01-17/19 LeidenMillennium DB Tutorial Find all halos in a subvolume of space: 15 <= x <= 20 20 <= y <= 25 5 <= z <= 10
39
2007-01-17/19 LeidenMillennium DB Tutorial select x,y,z from mpahalo where snapnum = 63 and x between 10 and 20 and y between 20 and 30 and z between 0 and 10 Inefficient, even when indexed !
40
2007-01-17/19 LeidenMillennium DB Tutorial x y z 15.00108342.47132524.673561 15.00124758.42091442.722874 15.00221538.04248429.557423 15.00273550.48778557.716877 15.00275320.0001778.21466 15.00509513.63759916.135191 15.00659322.17082848.242783 15.01148824.82443819.773285 15.01174148.09990711.500685 15.01186823.31226527.858799 15.01306523.96951518.883507 15.01315856.04186640.82894 15.01436159.50335745.31733 15.01732246.25766444.37695 15.01820227.3338959.441319
41
2007-01-17/19 LeidenMillennium DB Tutorial Spatial indexes Performance of finding things is improved if those things are co-located on disk: ordering, indices Co-locating a 3D configuration of points on a 1D disk can only be done approximately Space filling curves: Peano-Hilbert, Z-curve
42
2007-01-17/19 LeidenMillennium DB Tutorial Zones
43
2007-01-17/19 LeidenMillennium DB Tutorial Zone index Course sampling of points in multiple dimensions allows simple multi- dimensional ordering ix = floor(x/10Mpc) iy = floor(y/10Mpc) iz = floor(z/10Mpc) index on (snapnum,ix,iy,iz,x,y,z,galaxyId)
44
2007-01-17/19 LeidenMillennium DB Tutorial IXIYIZ X Y Z 12015.06180420.8919074.4156647 12015.06933623.4376019.812217 12015.10067820.9056424.613036 12015.17396822.368838.01832 12015.19412220.675834.8034463 12015.250030524.2466831.6651521 12015.36557623.2907549.404872 12015.37260620.2036912.0006201 12015.52469621.039974.280077 12015.58394322.3446229.421347 12015.635838526.7859049.881406 12015.6638322.8299837.137772 12015.67380326.9182913.302736 12015.71782422.3653419.221828 12015.84799224.7007471.389664 12015.88389622.5938197.277129 12015.9104126.5311182.5693457 12015.91690527.1378674.289855 12016.04733328.938115.414605
45
2007-01-17/19 LeidenMillennium DB Tutorial Return B-band luminosity function of galaxies residing in halos of mass between 10^13 and 10^14 solar masses. select.2*floor(5*g.mag_b) as magB, count(*) as num from DeLucia2006a g, MPAHalo h where g.haloId = h.haloId and h.m_TopHat between 1000 and 10000 and h.redshift=0 group by.2*floor(5*g.mag_b)
46
2007-01-17/19 LeidenMillennium DB Tutorial 13.Find the dependency of halo formation times on environment
47
2007-01-17/19 LeidenMillennium DB Tutorial select zForm, avg(g5) as g5, avg(g10) as g10 from MMField, ( select des.haloId, des.phkey, max(PROG.redshift) as zForm from MPAHalo PROG, MPAHalo DES where DES.snapnum = 63 and PROG.haloId between DES.haloId and DES.lastProgenitorId and prog.np >= des.np/2 and des.np between 100 and 200 group by des.haloId, des.phkey ) t where t.phkey = f.phkey and f.snapnum=63 group by zForm
48
2007-01-17/19 LeidenMillennium DB Tutorial Tools
49
2007-01-17/19 LeidenMillennium DB Tutorial Other tools wget, UNIX/LINUX commandwget wget "http://www.g-vo.org/Millennium?action=doQuery & SQL=select top 10 haloid,snapnum, x,y,z,np from mpahalo" Use in R (similar in IDL)...RIDL TOPCAT
50
2007-01-17/19 LeidenMillennium DB Tutorial Thank you.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.