SimDB and SimTAP Dealing with a complex data model Gerard Lemson, Nara,
SimDB and SimDAL Protocols to support describing simulations –Simulation Data Model: Model for N-body 3+1D any simulations publishing simulations –Simulation Database (SimDB): protocol for accessing a database built according to SimDM. finding simulations –SimDB/TAP –queryData in SimDAL –SimTAP retrieving simulation data, whole, in parts, manipulated –SimDAL getData services (not in this talk) Btw: simulation can be –simulation run –simulation result –simulation data –post-processing of simulation results
SimDB/REST simple access to SimDB Uses XML representation of model –XML schema Examples –PDR –Gadget2 –TODO more (SVO) VO-URP –validator –upload –download
SimDB/TAP Model complex –Too(?) complex for trivial (parameter based) query language –Need special navigation tools –Need powerful query language Impement TAP on database built according to SimDM Map UML to RDB model –TAP_SCHEMA for SimDM old) –create table + inserts –VODataService VO-URP SQL query Not always easy!
Model complex Normalised (see image) General Abstract –e.g. parameters must be fully defined, no assumptions Hard to deal with quantities with a priori unknown units –ParameterSetting table has value AND unit attributes (Quantity datatype)
Example queries Find synthetic spectra of white dwarf stars Find cosmological simulations with Ω=0.9, Ω Λ = 0.7 and Ω b =0.02 Find all SPH simulations containing a galaxy cluster with mass around10 14 M sun
select e.* from experiment e, targetObject t, result r, product p where t.label=white_dwarf and t.containerid=e.id and r.containerid=e.id and r.targetId=t.id and p.containerid=r.id and p.productType=spectrum
Example queries Find synthetic spectra of white dwarf stars Find (cosmological) simulations with Ω=0.9, Ω Λ = 0.7 and Ω b =0.02 Find all SPH simulations containing a galaxy cluster with mass around10 14 M sun
select e.* from Experiment e, InputParameter ip1, ParameterSetting ps1, InputParameter ip2, ParameterSetting ps2, InputParameter ip3, ParameterSetting ps3 where ps1.containerId = e.id and ps1.parameterId = ip1.id and ip1.label = omega_lambda and ps1.numericalValue_value=0.7 and ps2.containerId = e.id and ip2.label = omega_baryon and ps2.parameterId = ip1.id and ps2.numericalValue_value=0.02 and ps3.containerId = e.id and ip3.label = omega and ps3.numericalValue_value=0.9
Example queries Find synthetic spectra of white dwarf stars Find (cosmological) simulations with Ω=0.9, Ω Λ = 0.7 and Ω b =0.02 Find all SPH simulations containing a galaxy cluster with mass around10 14 M sun
select e.* from Experiment e, ExperimentRepresentationObject ero, RepresentationObjectType rot, TargetObject to, Property p, StatisticalSummary s where ero.containerId = e.id and ero.typeId= rot.id and rot.label=sph.particle and to.containerId = e.id and to.label = galaxy.cluster and p.containerId = to.id and p.label=mass and s.propertyId = p.id and s.statistic = value and s.numericalValue_value=1e14 and s.numericalValue_unit=M_sun
SELECT r.id as id, r.publisherdid as publisherdid, s0.numericValue_value as mass, s1.numericValue_value as x, s2.numericValue_value as y, s3.numericValue_value as z FROM result r, product o, statisticalsummary s0, property p0, statisticalsummary s1, property p1, statisticalsummary s2, property p2, statisticalsummary s3, property p3 WHERE r.containerid = 6 AND o.containerid = r.id and s0.containerid = o.id and s1.containerid = o.id and s2.containerid = o.id and s3.containerid = o.id and p0.publisherdid = 'mass' and s0.proprtyid=s3.id and s0.statistic = nominal and p1.publisherdid = 'x' and s1.proprtyid=s3.id and s1.statistic = nominal and p2.publisherdid = 'y' and s2.proprtyid=s3.id and s2.statistic = nominal and p3.publisherdid = 'z' and s3.proprtyid=s3.id and s3.statistic = nominal An example from Paris. Find typical values of mass,x,y,z properties in a given simulation result
SELECT r.id as id, r.publisherdid, max(case when p.publisherdid = mass and s.statistic=nominal then s.numericValue_value else null end) as mass, max(case when p.publisherdid = x and s.statistic=nominal then s.numericValue_value else null end) as x, max(case when p.publisherdid = y and s.statistic=nominal then s.numericValue_value else null end) as y, max(case when p.publisherdid = z and s.statistic=nominal then s.numericValue_value else null end) as z FROM result r, product o, statisticalsummary s, property p WHERE r.containerid = 6 AND o.containerid = r.id and s.containerid = o.id and p.id = s.propertyid group by r.id,r.publisherid,o.id
Conclusions Some queries can be phrased nicely Others using standard SQL, but due to level of normalisation and abstraction MANY joins required Can we simplify this a bit?
zoom
containerIdvalueunitparameterId idnamelabeldatatypedescription 456omega_bomega.baryonreal omega_lomega.lambdareal omega real... ParameterSetting InputParameter idomega_bomega_lomega simtap.Experiment
SimTAP When Protocol is fixed, tap schema can be simplified –parameters columns in simtap.Experiment table –property characterisation columns in product specific characterisation table(s) –...
select e.* from Experiment e, InputParameter ip1, ParameterSetting ps1, InputParameter ip2, ParameterSetting ps2, InputParameter ip3, ParameterSetting ps3 where ps1.containerId = e.id and ps1.parameterId = ip1.id and ip1.label = omega_lambda and ps1.numericalValue_value=0.7 and ps2.containerId = e.id and ip2.label = omega_baryon and ps2.parameterId = ip1.id and ps2.numericalValue_value=0.02 and ps3.containerId = e.id and ip3.label = omega and ps3.numericalValue_value=0.9 Instead of this
this select e.* from simtap.Experiment where omegaLambda=0.7 and omegaBaryon=0.02 and omega=0.9
Table definitions can be derived From a Protocol definition –input parameters –for each Representation object type a table with statistical summaries of properties –target object type ala SimDM (units in ADQL required) pivoted per project? –input data sets (urls) Pivoting queries can be generated
Proposal SimDAL services MAY include a SimTAP service 1 SimTAP schema per Protocol Each such schema contains –1 Experiment table with columns for parameters –>=1 Product tables with characterisation of properties –Possibly other tables from SimDB/TAP