Download presentation
Presentation is loading. Please wait.
1
Research Data Collections WG Plenary 9 Barcelona
2
Introduction: Goals and Deliverables
Our goal is to facilitate: cross-collection interoperability development of common tools and services for sharing and expanding data collections across repositories and disciplines Our deliverables are: An abstract data model for data collections A Create/Read/Update/List (CRUD/L) API Existing solutions are many but do not offer a full set of generic (CRUD) operations Individual API implementations will be able to declaratively express, via a standard set of capabilities, the operations available for their collections.
3
Current Status: Completed and In Progress
Definitions established and added to RDA DFT Terms Tool Data Model draft completed API Specification draft completed 4 implementations in development Initial datatypes registered in a DTR
4
API Specification
5
API: Service Features GET /features
6
API: Collections - Create/Read/Update/Delete/List
GET /collections POST /collections GET /collections/{id} PUT /collections/{id} DELETE/collections/{id}
7
API: Collections - Set Level Operations
Capabilities GET /collections/{id}/capabilities Match POST /collections/{id}/ops/findMatch Intersection GET /collections/{id}/ops/intersection/{otherid} Union GET /collections/{id}/ops/union/{otherid} Flatten GET /collections/{id}/ops/flatten
8
API: Collection Member - CRUD/L
LIST CREATE READ UPDATE DELETE GET /collections/{id}/members POST /collections/{id}/members GET /collections/{id}/members{mid} PUT /collections/{id}/members/{mid} DELETE /collections/{id}/members/{mid}
9
API: Collection Member - Property Operations
READ UPDATE DELETE GET /collections/{id}/members{mid}/properties/{property} PUT /collections/{id}/members/{mid}/properties/{property} DELETE /collections/{id}/members/{mid}/properties/{property}
10
Data Fabric: RDA Collections API, DTR and PIT
Some Collections API properties (Service/Collection/Member) should use registered data types Collections API implementations may delegate PID creation to a PIT API
11
Data Fabric: RDA Collections in Broker-Driven Workflow
12
Data Fabric: Collection Types and DTR - Current Status
Registered data types for Collection API Models Mapping of properties to formal types started
13
Implementations: REPTOR
Reptor („Repository“) is a data repository which was developed along some RDA recommendations Reptor turns a standard web server into a data repository Implements DTR, DFT and Collection WG recommendations Example instance at: Collection API calls: curl -X GET curl -X GET ...
14
Implementations: Tufts (Python + LDP)
Python/Flask implementation for Perseids Project at Department of Classics, Tufts University Main payload are linked data annotations Multiple data backends: Filesystem RDF/LDP MongoDB Shared programming interface and data model Demo endpoint: Docker image:
15
Use Cases: GEOFON GEOFON (GFZ) is one of the biggest seismological data archives in Europe and member of ORFEUS/EPOS-Seismology. Context: Almost all seismological data centres expose an API for users to request data (an international standard). Only at GEOFON, we have more than 6 million successful requests/year, which are created “on-the-fly” (not predefined). It is not trivial to create a predefined dataset of scientific interest to be shared with users (e.g. seismic waveforms related to an earthquake). Also difficult to create huge datasets due to size limitations for requests from our services (f.i. all data from a station). We cannot afford to keep a copy of every request made in order for it to be frozen and (somehow) reproducible.
16
Use Cases: GEOFON Aim Be able to define datasets (collections), which can point to data files in our archive. Either via a PIDs or regular URLs. Create big datasets providing all the data related to a project or different (overlapping) subsets from it. Keep a copy of the request definitions made by users in order to be re-created if needed (e.g. if shared with another user). We want to be able to “reproduce” a dataset. Setup restrictions on access to some of the datasets. Provide a single (and simple) link for the collections, so that the users can download the whole collection with one click (an extension to the specification!). Support download resume (related to the previous point). Do it as standard as possible (f.i. within this RDA WG). Current Status The system is already implemented and being tested *today*. We expose around 6000 collections and more than 1.5 million members.
17
Use Cases: Perseids
18
Use Cases: DKRZ / Climate data management
Needed: Common collection builder solution across multiple use cases, e.g.: ESGF/CMIP6 dataset PID generator + custom collections Climate data processing services - together with CMCC (pre/post Ophidia calls - ECAS) Copernicus Data Services Best option so far: Collection builder (client) library as a reusable open source component Extension of B2HANDLE/pyhandle or add-on library Back end: Handle System + “x”: storage of collection structure still open Front end: The use cases from above - but not all are Python-based (e.g. CMIP6 is Java) PID Kernel Info profile and backing by DTR to be included Offering as a B2HANDLE service for long-term adoption by (unknown) others server-side pathway for e.g. B2SHARE?
19
Use Cases: CAU Kiel/IGSN
Kiel University is setting up the local IGSN registration service for samples in Kiel In collaboration with GEOMAR Helmholtz-Centre for Ocean Research - Central Sample Repository and the Kiel University it is aimed to create a central registration service for all samples A central allocating service will be available to clients from disciplines specific repositories The central service holds the sample birth certificate Collection information (?) Link to detailed record information An initial inventory of samples has been carried out at GEOMAR during the years 2014 to 2015 Various kinds of possible and helpful collection scenarios are already available
20
Use Cases: CAU Kiel/IGSN
Lab1 Collection South Pacific Collection Seamount Collection Cruise1 Collection ... Cruise2 Collection Type1 Collection ... Type2 Collection Event Collection Event Collection Event Collection ... Storage Collection Section Half/liner ... Cruise2 Collection Event Collection Samples Samples Samples Core Geological age Collection
21
Use Cases: HTRC [ … Beth to Provide]
22
Next Steps Finalize Data Model and API Specification
Incorporate feedback from this session and implementers Examine implications for existing collections solutions and repositories Write final Specification Document Include best practices for workflows and coordination with RDA DTR and PID Types APIs Include best practices for use URL-unfriendly URIs Follow up with existing and potential adopters
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.