Research Data Collections WG Plenary 9 Barcelona

Slides:



Advertisements
Similar presentations
© 2004, The Trustees of Indiana University 1 OneStart Workflow Basics Brian McGough, Manager, Systems Integration, UITS Ryan Kirkendall, Lead Developer.
Advertisements

RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
2 nd Training Workshop 4 – 5 June 2007 Common Data Index - CDI By Dick M.A Schaap Technical Coordinator SeaDataNet.
Z EGU Integration of external metadata into the Earth System Grid Federation (ESGF) K. Berger 1, G. Levavasseur 2, M. Stockhause 1, and M. Lautenschlager.
Trimble Connected Community
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
Using the SAS® Information Delivery Portal
Enticy GROUP THE A Framework for Web and WinForms (Client-Server) Applications “Enterprise Software Architecture”
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
ISpheres Project. Project Overview iSpheresCore iSpheresImage Demonstration References.
® How to Build IBM Lotus Notes Components for Composite Applications 정유신 과장 2007 하반기 로터스 알토란.
Putting it all together Dynamic Data Base Access Norman White Stern School of Business.
Esri UC 2014 | Technical Workshop | Esri Roads and Highways: Integrating and Developing LRS Business Systems Tom Hill.
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
M. Stockhause 1, G. Levavasseur 2, K. Berger 1 1 Deutsches Klimarechenzentrum (DKRZ) 2 Institute Pierre Simon Laplace (IPSL) ESGF-QCWT Quality Control.
Find Research Data b2find.eudat.eu B2FIND User Training How to find data objects and collections using EUDAT’s B2FIND This work is licensed.
Data Citation Implementation Pilot Workshop
International Planetary Data Alliance Registry Development and Coordination Project Report 7 th IPDA Steering Committee Meeting July 13, 2012.
Excel Services Displays all or parts of interactive Excel worksheets in the browser –Excel “publish” feature with optional parameters defined in worksheet.
XML 2002 Annotation Management in an XML CMS A Case Study.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Portlet Development Konrad Rokicki (SAIC) Manav Kher (SemanticBits) Joshua Phillips (SemanticBits) Arch/VCDE F2F November 28, 2008.
Data Foundations And Terminology (DFT) IG Virtual Meeting July 6 th 2016 Co-Chairs DFT IG :Gary Berg-Cross & Raphael Ritz P8 Sessions DFT IG Breakout Session.
Sharing Maps and Layers to Portal for ArcGIS Melanie Summers, Tom Shippee, Ty Fitzpatrick.
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
INFSO-RI Enabling Grids for E-sciencE ESR Database Access K. Ronneberger,DKRZ, Germany H. Schwichtenberg, SCAI, Germany S. Kindermann,
SharePoint 101 – An Overview of SharePoint 2010, 2013 and Office 365
Eclipse Vorto Alexander Edelmann.
Gridpp37 – 31/08/2016 George Ryall David Meredith
RDA 9th Plenary Breakout 3, 5 April :00-17:30
Containers as a Service with Docker to Extend an Open Platform
Current and Upcoming RDA Recommendations Dr. ir. Herman Stehouwer
Business Directory REST API
Building Regression Tests With PeopleSoft Test Framework
WHY? - Found initiative while case statement preparation
Using E-Business Suite Attachments
How to Integrate LabVIEW Applications into a Tango Control System
Data Bridge Solving diverse data access in scientific applications
Automate Custom Solutions Deployment on Office 365 and Azure
WG Research Data Collections RDA P10 Montréal – September 2017
An Overview of Data-PASS Shared Catalog
The RPID Testbed Rob Quick Manager – High Throughput Computing
Evaluating state of the art in AI
Data Ingestion in ENES and collaboration with RDA
VI-SEEM Data Discovery Service
The Hosted Model Charl Roberts Good morning again,
PID centric fabric constructed piece by piece
Maggie, Carlo, Peter, Rebecca (GEDE discussions)
Data Type Registries (DTR)
C2CAMP (A Working Title)
Climate Data Analytics in a Big Data world
Ahmet Fatih Mustacoglu
CMIP6 / ENES Data TF Meeting: DKRZ
Updating GML datasets S-100 WG TSM September 2017
Customizing the SharePoint Mobile Experience
Brief WG/IG reporting Tobias Weigel on behalf of co-chairs
WG Research Data Collections Draft outputs of a RDA bottom-up effort P9 - April 2017 Co-chairs: Bridget Almas, Frederik Baumgardt, Tobias Weigel, Thomas.
WG Research Data Collections An overview of the recommendation
Using the RDA Collections API to Shape Humanities Data
WGISS Connected Data Assets Oct 24, 2018 Yonsook Enloe
Publishing data and metdata From iRODS to repositories
Agenda (AM) 9:30-10:15 Introduction to RDA
Bird of Feather Session
Publishing image services in ArcGIS
Leveraging PIDs for object management in data infrastructures RDA UK Node Workshop, July Tobias Weigel (DKRZ)
9/8/ :03 PM © 2006 Microsoft Corporation. All rights reserved.
Microsoft Azure Data Catalog
QoS Metadata Status 106th OGC Technical Committee Orléans, France
SDMX IT Tools SDMX Registry
Presentation transcript:

Research Data Collections WG Plenary 9 Barcelona

Introduction: Goals and Deliverables Our goal is to facilitate: cross-collection interoperability development of common tools and services for sharing and expanding data collections across repositories and disciplines Our deliverables are: An abstract data model for data collections A Create/Read/Update/List (CRUD/L) API Existing solutions are many but do not offer a full set of generic (CRUD) operations Individual API implementations will be able to declaratively express, via a standard set of capabilities, the operations available for their collections.

Current Status: Completed and In Progress Definitions established and added to RDA DFT Terms Tool Data Model draft completed API Specification draft completed 4 implementations in development Initial datatypes registered in a DTR

API Specification

API: Service Features GET /features

API: Collections - Create/Read/Update/Delete/List GET /collections POST /collections GET /collections/{id} PUT /collections/{id} DELETE/collections/{id}

API: Collections - Set Level Operations Capabilities GET /collections/{id}/capabilities Match POST /collections/{id}/ops/findMatch Intersection GET /collections/{id}/ops/intersection/{otherid} Union GET /collections/{id}/ops/union/{otherid} Flatten GET /collections/{id}/ops/flatten

API: Collection Member - CRUD/L LIST CREATE READ UPDATE DELETE GET /collections/{id}/members POST /collections/{id}/members GET /collections/{id}/members{mid} PUT /collections/{id}/members/{mid} DELETE /collections/{id}/members/{mid}

API: Collection Member - Property Operations READ UPDATE DELETE GET /collections/{id}/members{mid}/properties/{property} PUT /collections/{id}/members/{mid}/properties/{property} DELETE /collections/{id}/members/{mid}/properties/{property}

Data Fabric: RDA Collections API, DTR and PIT Some Collections API properties (Service/Collection/Member) should use registered data types Collections API implementations may delegate PID creation to a PIT API

Data Fabric: RDA Collections in Broker-Driven Workflow

Data Fabric: Collection Types and DTR - Current Status Registered data types for Collection API Models Mapping of properties to formal types started

Implementations: REPTOR Reptor („Repository“) is a data repository which was developed along some RDA recommendations Reptor turns a standard web server into a data repository Implements DTR, DFT and Collection WG recommendations Example instance at: http://dft-rda.esc.rzg.mpg.de/reptor/ http://reptor.thomas-zastrow.de/ Collection API calls: curl -X GET http://dft-rda.esc.rzg.mpg.de/reptor/collections/api.php/collections curl -X GET http://dft-rda.esc.rzg.mpg.de/reptor/collections/api.php/collections/Photos/members ...

Implementations: Tufts (Python + LDP) Python/Flask implementation for Perseids Project at Department of Classics, Tufts University Main payload are linked data annotations Multiple data backends: Filesystem RDF/LDP MongoDB Shared programming interface and data model Demo endpoint: http://collections.perseids.org/ Docker image: http://hub.docker.com/fbaumgardt/rda-collections-api

Use Cases: GEOFON GEOFON (GFZ) is one of the biggest seismological data archives in Europe and member of ORFEUS/EPOS-Seismology. Context: Almost all seismological data centres expose an API for users to request data (an international standard). Only at GEOFON, we have more than 6 million successful requests/year, which are created “on-the-fly” (not predefined). It is not trivial to create a predefined dataset of scientific interest to be shared with users (e.g. seismic waveforms related to an earthquake). Also difficult to create huge datasets due to size limitations for requests from our services (f.i. all data from a station). We cannot afford to keep a copy of every request made in order for it to be frozen and (somehow) reproducible.

Use Cases: GEOFON Aim Be able to define datasets (collections), which can point to data files in our archive. Either via a PIDs or regular URLs. Create big datasets providing all the data related to a project or different (overlapping) subsets from it. Keep a copy of the request definitions made by users in order to be re-created if needed (e.g. if shared with another user). We want to be able to “reproduce” a dataset. Setup restrictions on access to some of the datasets. Provide a single (and simple) link for the collections, so that the users can download the whole collection with one click (an extension to the specification!). Support download resume (related to the previous point). Do it as standard as possible (f.i. within this RDA WG). Current Status The system is already implemented and being tested *today*. We expose around 6000 collections and more than 1.5 million members.

Use Cases: Perseids

Use Cases: DKRZ / Climate data management Needed: Common collection builder solution across multiple use cases, e.g.: ESGF/CMIP6 dataset PID generator + custom collections Climate data processing services - together with CMCC (pre/post Ophidia calls - ECAS) Copernicus Data Services Best option so far: Collection builder (client) library as a reusable open source component Extension of B2HANDLE/pyhandle or add-on library Back end: Handle System + “x”: storage of collection structure still open Front end: The use cases from above - but not all are Python-based (e.g. CMIP6 is Java) PID Kernel Info profile and backing by DTR to be included Offering as a B2HANDLE service for long-term adoption by (unknown) others server-side pathway for e.g. B2SHARE?

Use Cases: CAU Kiel/IGSN Kiel University is setting up the local IGSN registration service for samples in Kiel In collaboration with GEOMAR Helmholtz-Centre for Ocean Research - Central Sample Repository and the Kiel University it is aimed to create a central registration service for all samples A central allocating service will be available to clients from disciplines specific repositories The central service holds the sample birth certificate Collection information (?) Link to detailed record information An initial inventory of samples has been carried out at GEOMAR during the years 2014 to 2015 Various kinds of possible and helpful collection scenarios are already available

Use Cases: CAU Kiel/IGSN Lab1 Collection South Pacific Collection Seamount Collection Cruise1 Collection ... Cruise2 Collection Type1 Collection ... Type2 Collection Event Collection Event Collection Event Collection ... Storage Collection Section Half/liner ... Cruise2 Collection Event Collection Samples Samples Samples Core Geological age Collection

Use Cases: HTRC [ … Beth to Provide]

Next Steps Finalize Data Model and API Specification Incorporate feedback from this session and implementers Examine implications for existing collections solutions and repositories Write final Specification Document Include best practices for workflows and coordination with RDA DTR and PID Types APIs Include best practices for use URL-unfriendly URIs Follow up with existing and potential adopters