Presentation is loading. Please wait.

Presentation is loading. Please wait.

Metadata Concepts / Use in Climate Research Stephan Kindermann, Martina Stockhause German Climate Computing Center (DKRZ) Hamburg, Germany.

Similar presentations


Presentation on theme: "Metadata Concepts / Use in Climate Research Stephan Kindermann, Martina Stockhause German Climate Computing Center (DKRZ) Hamburg, Germany."— Presentation transcript:

1 Metadata Concepts / Use in Climate Research Stephan Kindermann, Martina Stockhause German Climate Computing Center (DKRZ) Hamburg, Germany

2 Overview  Metadata descriptions: sources, usage  data level, preservation level, model level, domain knowledge level  Metadata standards, IT-principles

3 A) A)Metadata descriptions: sources, usage  (I) Data Description Level: source: model run output format: gib, netcdf3/4 container formats (including basic metadata) metadata homogenization („Climate and Forecast Convention (CF)“ conformance, CMOR2 compliance, controlled vocabs) usage: analysis tools, data access script, data search (  „linked data principle“)  (II) Data Preservation Level: target: legacy data centers (e.g. WDCC) format: internal DB, various external formats, e.g. ISO 19139, DIF,.. usage: long term data storage and access, citation e.g. using DOIs

4 A) A)Metadata descriptions: sources, usage  (IIl) Model Description Level: source: Researcher interviews, online questionnaire format: CIM ( Climate Metadata for Climate Modelling Digital Repositories - Metafor FP7) Con-CIM: UML, APP-CIM: XSD + vocabs) usage: model intercomparison, scientific portals, information space browsing / search  (lV) Semantic Annotion Level: source: data metadata, model metadata, domain knowledge metadata format: OWL (RDF) usage: user navigation in portals, „faceted search“ etc. deployments: Earth System Grid CMIP5 portal, IS-ENES portal

5 .. Short Background Info.. The Fifth Coupled Model Intercomparison Project (CMIP5) – Sponsored by the WMO WGCM – Quality Controlled Data to (eventually) appear in the IPCC Data Distribution Centre… World Wide Data Management Infrastructure building effort, consistent wflow from producers to consumers... In Numbers: Simulations: ~90,000 years ~60 experiments ~20 modelling centres using ~30 major(*) model configurations ~2 million output datasets ~10's of petabytes of output ~2 petabytes of CMIP5 requested output ~1 petabyte of CMIP5 “replicated” output – Which will be replicated at BADC & DKRZ, to arrive in 2010/2011! ~10 TB of land-biochemistry (from the long term experiments alone).

6 B) Metadata standards, IT principles  (I) Data Description Level: Grib, netcdf data containers 10`s of PBytes Metadata Data File naming convention based on CVs building uniform URIs (DRS, Data Reference Syntax) Activity/Product/Institute/Model/Exp/frequ/realm/Variable/ensemble Data servers MD catalogue servers wget http://server.org/Activity/Product/../ensemble  Enabling „linked data“

7 B) Metadata standards, IT principles  (II) Data Preservation Level: CERA2 DB schema OWL conceptual model Tape Archive search API QC, DOI assignment,.. WDCC Metadata Concept CERA GUIIS-ENES Portal… Scalability Sustainability Common CV Flexibility User friendly GUIs OAI-PMH ISO 19139 …

8 B) Metadata standards, IT principles (III) Model Description Level: Metafor FP7 project: Common Information Model (CIM)  Formal metadata model of the climate modelling process  It includes descriptions of the experiments being undertaken, the simulations being run in support of these experiments, the software models and tools being used to implement the simulations and the data generated by the software.  CMIP5 use case: CV collection, CMIP5 questionnaire

9 CONCIM (UML) APPCIM (XSD) CIM Instances (interliked XML files) ISO, Geographic Markup Language (GML) series Automatic translation CMIP5 portal(s) IS-ENES portal Metafor catalogue Metafor CIM overview

10 Metadata collection

11

12 Automatic XML  RDF translation CMIP5 gateway(s) IS-ENES 1 portal 1 Infrastructure for the European Network for Earth System Modelling ESG OWL instances

13 (CON)CIM Overview Quality ISO Shared Data Activity: simulations in support of experiments Software (hierarchical model components, Coupled together) Grids

14 B) Metadata standards, IT principles  (IV) Semantic Annotation Level CIM XML RDF Data object XML Community content Content Management System RDF Triple Store Portal(s) ESG Gateways IS-ENES Portal Evolving OWL model Triple Store OWL ontologies: http://ontologies.ucar.edu/owl Rel. DB

15 CMIP5 Quality Control Files Data MetadataCIM Metadata Data in prescribed DRS Syntax Data Quality Checks L2 MD Quality Checks L2 THREDDS Data Server MD on data Metafor / CIM Questionnaire MD on model+simulation QC DB Quality MD Metadata Repository Data MD Information MD

16 CMIP5 STD-DOI Publication TIB:DOI Registration Agency Data NodeMetadata THREDDS Data Server MD on data QC DB Quality MD Data MDInformation MD Filesystem Data Longterm Archive Data Quality Checks L3 double check, cross checks STD-DOI Catalogue STD-DOI MDInformation MD WDCC:DOI Publication Agent DOI Target Page access to data and metadata Metafor / CIM MD on model+simulation +data+quality

17 B) Metadata standards, IT principles  (IV) Semantic Annotation Level CIM XML RDF Data object XML Community content Content Management System RDF Triple Store Portal(s) ESG Gateways IS-ENES Portal Evolving OWL model Triple Store OWL ontologies: http://ontologies.ucar.edu/owl Rel. DB

18 IS-ENES Info Portal

19

20 2010-07-07 16:49:13 INFO triplestorefill.utility Adding item with ID echam at http://localhost:8080/test7/echam 2010-07-07 16:49:13 INFO triplestorefill.sesameconnector Storing RDF... (1118 byte) 2010-07-07 16:49:13 INFO triplestorefill.sesameconnector RDF data: @prefix foaf:. @prefix owl:. @prefix rdfs:. @prefix dc:. @prefix rdf:. @prefix xsd:. @prefix isenes:. isenes:echam rdf:type isenes:ComponentModel. isenes:echam foaf:page. foaf:topic isenes:echam. isenes:echam dc:title "ECHAM". isenes:echam rdfs:label "ECHAM". isenes:echam rdfs:comment "Global circulation model". isenes:dkrz isenes:isResponsibleFor isenes:echam. isenes:echam isenes:hasResponsible isenes:dkrz. isenes:joachim-biercamp rdfs:label "Joachim Biercamp". isenes:joachim-biercamp rdf:type foaf:Person. isenes:dkrz rdfs:label "DKRZ". isenes:dkrz rdf:type foaf:Organization. isenes:joachim-biercamp isenes:isMemberOf isenes:dkrz. isenes:dkrz isenes:hasMember isenes:joachim-biercamp. isenes:dkrz dc:title "DKRZ". isenes:joachim-biercamp foaf:mbox "biercamp@dkrz.de" „save“ Triple Store

21 (B) From a user`s perspective Bildchen: Plone seite mit „related info“ portlet

22 (B) From a user`s perspective Bildchen: Plone Seite nach Klick auf „related“ link: faceted search

23 Summary international CMIP5 / IPCC effort is key driver for collection / standardization of CVs, Metadata, conceptual models (Ontologies) Metadata mainly used for model intercomparison, uniform data search / access + data processing  Prepare for Climate Impact Community use cases !!

24 ..workshop reminder.. - Usage and quality of descriptive keyword type of metadata used in your domain to manage data. - Types of usages of this metadata (management, retrieval, research statistics, machine processing, etc). - The standards used for your metadata descriptions (structure, elements, vocabularies). - Adherence to common IT principles (explicit syntax, registered semantics, use of PIDs, etc). - Compliance with the recommendations to be found in the report of the e-IRG task force on Data Management http://www.e-irg.eu/publications/e-irg-task-force-reports.htmlhttp://www.e-irg.eu/publications/e-irg-task-force-reports.html..therefore we would like the presenters to focus on a few points allowing all of us to draw conclusions at the end:

25 Methodology to create CMIP 5 CIM instancaes

26  Producers: providers of models, tools, model results, HPC ecosystem, Grid.., community Motivation  Consumers: ENES community, impact community Virtual Earth System Modeling Resource Centre Portal E-infrastructure components Governance Agreements, Commitments, Sociology,.. Ticketing Collaboration Metadata (CIM,..) Protocols APIs AAI CMIP5/AR5/+ data services

27 IS-ENES vERC Portal (A) Community info presentation (models, tools, descriptions,..) Content Management Sytem (CMS, Collab.Tool) RequirementE-Infra componentTechnology used Plone + IS-ENES „content-types“ (C) Data portal to AR5 archivesWeb Framework Zope/Plone plugin(s) (F) Additional value provisioning „Cross-selling“ Semantic interlinking RDF triple store (Sesame) (D) CIM metadata (external) Metafor service(s) (external) ESG-gateway (E) External content / metadata collection Web service (proxies) Info (XML) harvester Python info collector based using Atom, OAI-PMH,.. protocols (B) Community development support Project Management / Ticketing Tool Redmine


Download ppt "Metadata Concepts / Use in Climate Research Stephan Kindermann, Martina Stockhause German Climate Computing Center (DKRZ) Hamburg, Germany."

Similar presentations


Ads by Google