GEDE Focus Area Repositories - motivation -

GEDE Focus Area Repositories - motivation -
Peter Wittenburg

PID Focus Area Successful
many differing documents about PIDs - confusing for communities all communities suggested key documents (about 30) for this topic all documents have been studied and transformed in to 61 essential assertions these assertions were aggregated, grouped and compared documents from expert organisations/initiatives were also included results: clarification of many uncertainties and terminology across communities detecting agreement on core messages if wording differences are ignored some ongoing discussions on usage aspects (granularity, versioning, etc.) GEDE Focus Group is very important to overcome barriers and confusions GEDE Focus Group helped to turn to action

Repository Focus Area - What are repositories?
store and give access to data and metadata via standard protocols may give services on top of the stored data (products) have a team that does data management and stewardship on stored data activities are guided by openly described policies and procedures take care of long-term preservation of data and metadata take care that schemas and concepts being used are registered register themselves in repository registries such as re3data trustworthy repositories are key pillars of our emerging data infrastructure they are trustworthy if they participate in regular quality assessments according to widely accepted standards (DSA/WDS) it is the task of the certification standards to include essentials (long term funding, etc.)

Repository Focus Area - What are registries?
registries also store and give access to data via standard protocols is there a difference? from an IT perspective the difference is in functions and roles in general registries do not store the resources (data) but aggregate metadata about data there is a wide variety of metadata types and thus registries registries have a team that takes care of proper management of metadata activities of registries are guided by openly described policies and procedures registries take care that schemas and concepts being used are registered registires should be findable and thus be registered trustworthy registries are key pillars of our emerging data infrastructure they are trustworthy if they participate in regular quality assessments

Repository Focus Area - How are repositories organised?
repositories can be organised as discipline specific entities with deep knowledge about the discipline or domain repositories can be organised according to organisational boundaries (institutes, research organisations, countries, etc.) repositories can be organised according to commercial interests metadata should be open and free to use they may give access to data requesting to sign a license, to describe the intended use, to adhere to ethical and rights norms, to pay a certain fee, etc. increasingly often repositories are part of a number of trust federations

Repository Focus Area - What are the open Questions?
what are the requirements, tasks and roles in the different communities? are there special functional needs? what are the typical policies and procedures being applied? can we specify ONE generic API to repositories? do all repositories need to participate in quality assessment procedures? what is the status of data of not certified repositories? is there typical software that can be recommended? which standards are out there describing the characteristics of repositories?

Repository Focus Area - why topic in GEDE
there are a few RDA groups working on matters related with repositories do they have a comprehensive picture about the view on repositories in the different communities - if not can GEDE contribute to improve their results there are already good recommendations and best-practices to be disseminated between research infrastructures there could be certification needs that go beyond DSA/WDS rules there could be an exchange of policies and procedures between the communities and also the RDA Practical Policy group specific training sessions on this topic can be organised standard software installations could be discussed (such as OAI-PMH, etc.)

Data Citation landscape need clarification A potential GEDE topic
Carlo Maria Zwölf VAMDC e-infrastructure

Persistent Identifiers and data citation/publication
The initial GEDE activity was focused on Persistent Identifiers (PIDs) bundle We converged on a wide consensus set of statements PIDs are a key element for stable data referencing PIDs are a fundamental building block for data citation mechanisms The road linking PIDs with data citation/publication is not direct: The data citation landscape is highly fragmented (several standards & platform exist). Rapidly changes (new standards appears, some platforms disappear, new ones arrive) Orientation may be difficult for data producers, providers and consumers

The data citation/publication landscape
Data citation and publication are two faces of the same coin: One cite published (in the sense of reachable) material Since some material is published, it can be referred to

Data citation and publication are two faces of the same coin: One cite published (in the sense of reachable) material Since some material is published, it can be referred to Data Citation Working group ( RDA RDA/WDS Publishing Data Services WG ( ICSU / WDS Joint declaration of Data Citation Principle (WDS/Force11) ( Existing Recommendations and best practices landscape Force 11 Force11 data citation Manifesto ( Codata CODATA-ICSTI Data citation Standards and Practices (

Data citation and publication are two faces of the same coin: One cite published (in the sense of reachable) material Since some material is published, it can be referred to Data Cite ( Assign DOI Zenodo ( EUDAT ( ) Existing platforms and services for data citation/publication OpenAire ( The Dataverse project ( former thedata.org) Scholix (

Data citation and publication are two faces of the same coin: One cite published (in the sense of reachable) material Since some material is published, it can be referred to Data Cite ( Zenodo ( EUDAT ( ) May be based on the recalled recommendations Existing platforms and services for data citation/publication OpenAire ( The Dataverse project ( former thedata.org) Scholix (

A disoriented community
Data citation/publication recommendations Data citation/publication services and platforms ? ? ? ? ? ? Users and data providers communities

Data citation/publication recommendations Data citation/publication services and platforms ? ? ? ? ? ? Users and data providers communities A true (very recent) story Very known editor EU e-infrastructure

Data citation/publication recommendations Data citation/publication services and platforms ? ? ? ? ? ? Users and data providers communities A true (very recent) story We would like to cite your data. What could you propose? Very known editor EU e-infrastructure

Data citation/publication recommendations Data citation/publication services and platforms ? ? ? ? ? ? Users and data providers communities A true (very recent) story We have implemented the RDA recommendation on citation of dynamic data… Very known editor EU e-infrastructure

Data citation/publication recommendations Data citation/publication services and platforms ? ? ? ? ? ? Users and data providers communities A true (very recent) story What about using Scholix instead? Very known editor EU e-infrastructure

Data citation/publication recommendations Data citation/publication services and platforms ? ? ? ? ? ? Users and data providers communities A true (very recent) story … Very known editor EU e-infrastructure

A suggestion for the next GEDE activity
The data citation “wild” context is similar to the PIDs one In the PIDs context GEDE succeeded in clearing up the confusion By summarizing Existing solutions and best practices By clustering A similar rationalization work may be done for data-citation fragmented landscape It is a logical continuity of the PIDs work It will answer a real urgent need

Versioning of data - in support of FAIRness and trustability
Maggie Hellström ICOS Carbon Portal

Why version stuff? W3C document “Data on the Web Best Practices” says:
data sets may change over time new data is appended (in time or space) some old data no longer relevant error(s) discovered & fixed no consensus on when changes constitute a new version (major or minor) of existing data set a new data set should be defined who is responsible maintaining trust and reusability requires consistent approach to versioning clear information metadata pointers (‘is-deprecated’, ‘is-replaced-by’, ‘replaces’... (from 31 January 2017)

Recent RDA discussions
Proposed interest group (IG) on data versioning discussions started at P8 (Denver) & continued at P9 (Barcelona) Current effort led by Lesley Wyborn & Jens Klump Example use cases Australia: large remote sensing & geospatial datasets; “composites” of numerous (tiled) sources. If a (small) subset is updated, how to best describe this? USA: NASA’s Socioeconomic Data and Applications Center (SEDAC)

Related initiatives RDA Data Citation WG RDA Data Collection WG
Dynamic data Query-centered approach: store actual queries together with timestamp information, and PID them RDA Data Collection WG Collection objects can be used to point to “dynamic” datasets that belong together Collections can themselves be versionable EUDAT/RDA/COOPEUS: Identification & citation of open-ended data, subsets

What can GEDE do? Survey European RIs
current needs & requirements applied practices available support from e-services Investigate global efforts (search white & gray literature) Gather concrete use cases already defined via RDA groups new ones

GEDE Focus Area Repositories - motivation -

Similar presentations

Presentation on theme: "GEDE Focus Area Repositories - motivation -"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

GEDE Focus Area Repositories - motivation -

Similar presentations

Presentation on theme: "GEDE Focus Area Repositories - motivation -"— Presentation transcript:

Similar presentations

About project

Feedback