SimDB and SimTAP Dealing with a complex data model Gerard Lemson, Nara, 2010-12-10.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Copyright 2003, CEC Services, LLC All Rights Reserved 1 XSD Code to RDBMS Script.
/13SNAP data model Simulation data model.
May 14, 2007TIG, opening plenary Beijing 2007 Theory IG plans Editors: Claudio Gheller, Patrizia Manzato, Laurie Shaw, Herve Wozniak, GL Other participants:
IVOA 2010, Nara TAP implementation on SimDB Application to DEUVO Jonathan Normand VO-Paris Data Centre.
Gerard Lemson, IVOA DM 28/5/2004. Unified domain model for Astronomy Much maligned and misunderstood (anonymous) with Pat Dowler and Tony Banday (MPA)
May 18, 2007TIG, closing plenary Conclusions from theory IG sessions, Beijing 2007.
Building a Mock Universe Cosmological nbody dark matter simulations + Galaxy surveys (SDSS, UKIDSS, 2dF) Access to mock catalogues through VO Provide analysis.
Victoria, May Breakout Session III Theory Interest Group Breakout Session III Victoria, May
SLAP: Simple Line Access Protocol v0.5
SimDB as a TAP service various TIG members (IVOA.IVOATheorySimDB)IVOA.IVOATheorySimDB.
Theory Interest Group Victoria INTEROP May 2010.
The Integration of Biological Data Using Semantic Web Technologies Susie Stephens Principal Product Manager, Life Sciences Oracle
Lexical Analysis Dragon Book: chapter 3.
DIGITAL ACCOUNTING RESEARCH CONFERENCE (2005) The Application of Electronic Forms in the Financial Work Flow.
Skip Lists. Outline and Reading What is a skip list (§9.4) – Operations (§9.4.1) – Search – Insertion – Deletion Implementation Analysis (§9.4.2) – Space.
XML Schema Heewon Lee. Contents 1. Introduction 2. Concepts 3. Example 4. Conclusion.
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
DM: Simulation Data Model RFC and other comments Tuesday, , A.
Database System Concepts and Architecture
Native XML Database or RDBMS. Data or Document orientation If you are primarily storing documents, then a Native XML Database may be the best option.
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Use Case: Populating Business Objects.
Query Methods (SQL). What is SQL A programming language for databases. SQL (structured Query Language) It allows you add, edit, delete and run queries.
Seeing Things in the Clouds over concept lattices with tag clouds browsing semi-structured data Bernd Fischer object attribute context table relation Galois.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Theory Interest Group H. Wozniak May-19H. Wozniak / Obs. Strasbourg / VO-France2.
VO-URP: on data modeling, UTYPEs and more Gerard Lemson Laurent Bourges.
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
IST Databases and DBMSs Todd S. Bacastow January 2005.
T HE I NTERNATIONAL V IRTUAL O BSERVATORY ALLIANCE VAO Registry Relational Schema: Updates and New Interface(s) Theresa Dower Registry WG 16 May 2013 IVOA.
2003 April 151 Data Centres: Connecting to the Real World Clive Page.
Characterisation Data Model applied to simulated data Mireille Louys, CDS and LSIIT Strasbourg.
Developing Reporting Solutions with SQL Server
Theory interest group wiki: see also
Theory in the German Astrophysical VO Summary: We show results of efforts done within the German Astrophysical Virtual Observatory (GAVO). GAVO has paid.
Last News of and
Computer Science 101 Database Concepts. Database Collection of related data Models real world “universe” Reflects changes Specific purposes and audience.
Open Data Protocol * Han Wang 11/30/2012 *
SimDB and DM WG Summary SimDB session this morning.
Validated Model Transformation Tihamér Levendovszky Budapest University of Technology and Economics Department of Automation and Applied Informatics Applied.
UML to XSD. Assumptions Basic Understanding of UML Basic Understanding of XML Basic Understanding of XSD schemas Basic Understanding of UML Basic Understanding.
SimDB: mainly DM thanks to usual suspects: Claudio, Franck, Herve, Igor, Laurent, Mireille, Norman, Patrizia, Rick, Ugo.
BACS 287 Structured Query Language 1. BACS 287 Visual Basic Table Access Visual Basic provides 2 mechanisms to access data in tables: – Record-at-a-time.
CMU-CS lunch talk, Gerard Lemson1 Computational and statistical problems for the Virtual Observatory With contributions from/thanks to: GAVO.
SimDB. Where are we regarding... SimDB note SimDB data model (SimDB/DM) SimDB protocol (SimDB/TAP,..) SimDB prototypes (Rick, Franck etal,
CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.
The International Virtual Observatory Alliance (IVOA) interoperability in action.
SimDB Implementation & Browser IVOA InterOp 2008 Meeting, Theory Session 1. Baltimore, 26/10/2008 Laurent Bourgès This work makes use of EURO-VO software,
Gerard Lemson Theory in the VO and the SimDB specification Euro-VO DCA workshop Garching, June 26, 2008 Feedback questionnaire.
UCL DEPARTMENT OF SPACE AND CLIMATE PHYSICS MULLARD SPACE SCIENCE LABORATORY Taverna Plugin VAMDC and HELIO (part of the ‘taverna-astronomy’ edition) Kevin.
Discussed in Kyoto Schema changes for the next version (Gerard Lemson)  will be included in VOTable1.2 Schema changes for the next version (Gerard Lemson)
IVOA, Trieste, DM Gerard Lemson SimDB Data Model IVOA interop, DM WG session Trieste,
Developing SQL Server 2000 Reporting Services Applications Brian Welcker Group Program Manager SQL Server Reporting Services Microsoft Corporation.
April , 2006 HEASARC Users Group Tom McGlynn The HEASARC On-line Services Tom McGlynn.
VOTable agenda Current VOTable status Current VOTable status News from Applications News from Applications Questions about VOTable schema Questions about.
Experience with XML Schema Ashok Malhotra Schema Usage  Mapping XML Schema and XML documents controlled by the Schema to object classes and instances.
Simulation Databases for the Virtual Observatory Work in progress and application to the StarFormat Project Patrick Hennebelle Franck Le Petit Benjamin.
JENAM 2008 Theory Standards for the Virtual Observatory SimDB + SimDAP.
1 Middle East Users Group 2008 Self-Service Engine & Process Rules Engine Presented by: Ryan Flemming Friday 11th at 9am - 9:45 am.
SimDB Implementation at VO-Paris Datacenter Benjamin Ooghe, Franck Le Petit Nicolas Moreau, Jonathan Normand, Laurent Bourgès LERMA - LUTH - VO-Paris Datacentre.
Understanding Core Database Concepts Lesson 1. Objectives.
Databases and DBMSs Todd S. Bacastow January
Virtual Observatory for cosmological simulations
Database.
Relational Databases The Relational Model.
Relational Databases The Relational Model.
Database Systems Instructor Name: Lecture-3.
Tutorial 7 – Integrating Access With the Web and With Other Programs
Understanding Core Database Concepts
Presentation transcript:

SimDB and SimTAP Dealing with a complex data model Gerard Lemson, Nara,

SimDB and SimDAL Protocols to support describing simulations –Simulation Data Model: Model for N-body 3+1D any simulations publishing simulations –Simulation Database (SimDB): protocol for accessing a database built according to SimDM. finding simulations –SimDB/TAP –queryData in SimDAL –SimTAP retrieving simulation data, whole, in parts, manipulated –SimDAL getData services (not in this talk) Btw: simulation can be –simulation run –simulation result –simulation data –post-processing of simulation results

SimDB/REST simple access to SimDB Uses XML representation of model –XML schema Examples –PDR –Gadget2 –TODO more (SVO) VO-URP –validator –upload –download

SimDB/TAP Model complex –Too(?) complex for trivial (parameter based) query language –Need special navigation tools –Need powerful query language Impement TAP on database built according to SimDM Map UML to RDB model –TAP_SCHEMA for SimDM old) –create table + inserts –VODataService VO-URP SQL query Not always easy!

Model complex Normalised (see image) General Abstract –e.g. parameters must be fully defined, no assumptions Hard to deal with quantities with a priori unknown units –ParameterSetting table has value AND unit attributes (Quantity datatype)

Example queries Find synthetic spectra of white dwarf stars Find cosmological simulations with Ω=0.9, Ω Λ = 0.7 and Ω b =0.02 Find all SPH simulations containing a galaxy cluster with mass around10 14 M sun

select e.* from experiment e, targetObject t, result r, product p where t.label=white_dwarf and t.containerid=e.id and r.containerid=e.id and r.targetId=t.id and p.containerid=r.id and p.productType=spectrum

Example queries Find synthetic spectra of white dwarf stars Find (cosmological) simulations with Ω=0.9, Ω Λ = 0.7 and Ω b =0.02 Find all SPH simulations containing a galaxy cluster with mass around10 14 M sun

select e.* from Experiment e, InputParameter ip1, ParameterSetting ps1, InputParameter ip2, ParameterSetting ps2, InputParameter ip3, ParameterSetting ps3 where ps1.containerId = e.id and ps1.parameterId = ip1.id and ip1.label = omega_lambda and ps1.numericalValue_value=0.7 and ps2.containerId = e.id and ip2.label = omega_baryon and ps2.parameterId = ip1.id and ps2.numericalValue_value=0.02 and ps3.containerId = e.id and ip3.label = omega and ps3.numericalValue_value=0.9

Example queries Find synthetic spectra of white dwarf stars Find (cosmological) simulations with Ω=0.9, Ω Λ = 0.7 and Ω b =0.02 Find all SPH simulations containing a galaxy cluster with mass around10 14 M sun

select e.* from Experiment e, ExperimentRepresentationObject ero, RepresentationObjectType rot, TargetObject to, Property p, StatisticalSummary s where ero.containerId = e.id and ero.typeId= rot.id and rot.label=sph.particle and to.containerId = e.id and to.label = galaxy.cluster and p.containerId = to.id and p.label=mass and s.propertyId = p.id and s.statistic = value and s.numericalValue_value=1e14 and s.numericalValue_unit=M_sun

SELECT r.id as id, r.publisherdid as publisherdid, s0.numericValue_value as mass, s1.numericValue_value as x, s2.numericValue_value as y, s3.numericValue_value as z FROM result r, product o, statisticalsummary s0, property p0, statisticalsummary s1, property p1, statisticalsummary s2, property p2, statisticalsummary s3, property p3 WHERE r.containerid = 6 AND o.containerid = r.id and s0.containerid = o.id and s1.containerid = o.id and s2.containerid = o.id and s3.containerid = o.id and p0.publisherdid = 'mass' and s0.proprtyid=s3.id and s0.statistic = nominal and p1.publisherdid = 'x' and s1.proprtyid=s3.id and s1.statistic = nominal and p2.publisherdid = 'y' and s2.proprtyid=s3.id and s2.statistic = nominal and p3.publisherdid = 'z' and s3.proprtyid=s3.id and s3.statistic = nominal An example from Paris. Find typical values of mass,x,y,z properties in a given simulation result

SELECT r.id as id, r.publisherdid, max(case when p.publisherdid = mass and s.statistic=nominal then s.numericValue_value else null end) as mass, max(case when p.publisherdid = x and s.statistic=nominal then s.numericValue_value else null end) as x, max(case when p.publisherdid = y and s.statistic=nominal then s.numericValue_value else null end) as y, max(case when p.publisherdid = z and s.statistic=nominal then s.numericValue_value else null end) as z FROM result r, product o, statisticalsummary s, property p WHERE r.containerid = 6 AND o.containerid = r.id and s.containerid = o.id and p.id = s.propertyid group by r.id,r.publisherid,o.id

Conclusions Some queries can be phrased nicely Others using standard SQL, but due to level of normalisation and abstraction MANY joins required Can we simplify this a bit?

zoom

containerIdvalueunitparameterId idnamelabeldatatypedescription 456omega_bomega.baryonreal omega_lomega.lambdareal omega real... ParameterSetting InputParameter idomega_bomega_lomega simtap.Experiment

SimTAP When Protocol is fixed, tap schema can be simplified –parameters columns in simtap.Experiment table –property characterisation columns in product specific characterisation table(s) –...

select e.* from Experiment e, InputParameter ip1, ParameterSetting ps1, InputParameter ip2, ParameterSetting ps2, InputParameter ip3, ParameterSetting ps3 where ps1.containerId = e.id and ps1.parameterId = ip1.id and ip1.label = omega_lambda and ps1.numericalValue_value=0.7 and ps2.containerId = e.id and ip2.label = omega_baryon and ps2.parameterId = ip1.id and ps2.numericalValue_value=0.02 and ps3.containerId = e.id and ip3.label = omega and ps3.numericalValue_value=0.9 Instead of this

this select e.* from simtap.Experiment where omegaLambda=0.7 and omegaBaryon=0.02 and omega=0.9

Table definitions can be derived From a Protocol definition –input parameters –for each Representation object type a table with statistical summaries of properties –target object type ala SimDM (units in ADQL required) pivoted per project? –input data sets (urls) Pivoting queries can be generated

Proposal SimDAL services MAY include a SimTAP service 1 SimTAP schema per Protocol Each such schema contains –1 Experiment table with columns for parameters –>=1 Product tables with characterisation of properties –Possibly other tables from SimDB/TAP