Download presentation
Presentation is loading. Please wait.
Published byDana King Modified over 5 years ago
1
Leveraging PIDs for object management in data infrastructures RDA UK Node Workshop, July 16 2019
Tobias Weigel (DKRZ) - @resdatall CC BY-SA 4.0
2
https://rd-alliance.org/ - https://twitter.com/resdatall
Motivational use case CMIP6: Coupled Model Intercomparison Project, phase 6 Data infrastructure support: Earth System Grid Federation (ESGF) Automated assignment of Handle PIDs as part of e-infrastructure workflow Specific feature requested by users to improve tracking of data versions and replicas referenceability before formal publication ESGF nodes Balaji, V., Taylor, K., Juckes, M. et al. (2018): Requirements for a global data infrastructure in support of CMIP6. Geosci. Model Dev. doi: /gmd -
3
Motivational use case: basic workflow
HPC QC early sharing, community review B2FIND PID pages, tools -
4
FAIR Digital Object Management
CRUD operations What shall we do if number, variety and complexity of objects increase? Make tools and services more efficient and effective – create the Intelligent Data Fabric Automated management of data objects and their metadata early-to-mid data life cycle – pre-publication Record connections between (meta)data objects, software, workflows, ... -
5
Combining multiple RDA groups and recommendations
Data Fabric IG: Architectural vision for DO management PIDs as key element: e-infrastructure layer of stable references PID Kernel Information WG: Required metadata to enable machine actions Based on work of earlier PID Information Types WG Data Type Registries WG / Data Typing WG: Register (meta)data types and provide machine-oriented API Enable service chaining/orchestration Research Data Collections WG: CRUD operations for managing groups of objects -
6
https://rd-alliance.org/ - https://twitter.com/resdatall
Data Fabric IG Group has existed formally since P4 Concerned with cross-group/discipline infrastructural challenges Co-chairs: LI Jianhui (CNIC, CAS), Robert Quick (IU), Tobias Weigel (DKRZ) Aims to promote the creation of an ecosystem of reusable components and approaches built with them Recognizes that challenges are driven from community demand, but overarching solutions provide larger benefits Based on an understanding of Digital Objects and PIDs as foundational technologies -
7
RDA Recommendation on PID Kernel Information
7 Guiding Principles for PID Kernel Information Independent of specific infrastructure or technologies Geared towards minimizing human interaction, long-term stability of processes relying on Kernel Information Draft Kernel Information profile 15 elements, 6 aligned with W3C PROV Exemplary high-level architecture Use cases and community adoption x, y, z Principle 1: The primary purpose of a PID KI record is to serve machine actionable services . Principle 2: The PID KI record is a non-authoritative source for arbitrary metadata . If the information for an attribute duplicates metadata maintained elsewhere, the external source is the authority. Principle 3: PID Kernel Information is stored directly at the resolving service and not referenced. Principle 4: A PID KI record can be changed only by the data object owner or owner delegate (e.g., PID record manager). Principle 5: PID KI record values should change infrequently with update initiated only by an appropriate authority, avoiding human interaction on updates where possible . Principle 6: Attributes (items) in the profile are expressed as key-value pairs where the values are simple (indivisible) (first normal form). Principle 7: Any profile should follow the second and third normal form. Doing so may reduce migration issues if records need to be migrated to a revised profile in the future. -
8
Unified management of Research Data Collections The RDA Research Data Collections WG Recommendations
Not just describe collections, but enable actions on them Create, Read, Update, Delete, List plus some others Machine agents as primary users: built-in scalability and automation Flexible hierarchy of Services, Collections, and Members Manage a collection of objects transparently just as another object API specification against which tools and services can be built across community boundaries RESTful API specification and elemental metadata model Pilot implementations in several application areas exist Collection storage agnostic: Able to work with multiple backends Multiple points for custom extensions, such as ontologies, typing and specialized operations (Research) data management beyond single objects Actions Create Read Update Delete List Service Collection Member Further information:
9
Upcoming: PID Kernel Information Profiles WG
Basic metadata profile defined – community profiles encouraged Proliferation of profiles is challenge for long-term adoption How do we manage profiles? What are good life cycle models for profiles? Are additional techncial interfaces necessary? Follow via the group: -
10
Thank you for your attention.
-
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.