Metadata for research outputs management Part 2 Susanna Mornati – 4Science ORCiD 0000-0001-9931-3637 Basic Training Workshop/ 6-8 September 2017
FAIR metadata: the origins Data management is not a goal in itself, but a means to foster the advancement of knowledge. The existing digital ecosystem surrounding scholarly output publication has many barriers preventing optimal discovery and reuse: variety of approaches, fragmentation of repositories, different access policies, uncertain license conditions, lack of machine interfaces, lack of metadata standardization… To overcome these obstacles, a workshop was held in the Netherlands in 2014, bringing together a wide group of academic and private stakeholders, setting foundational principles that all research objects should be FAIR: Findable, Accessible, Interoperable, Reusable
FAIR metadata: the principles F = Findable A = Accessibile I = Interoperable R = Reusable
FAIR metadata: materials The FAIR Guiding Principles for scientific data management and stewardship - https://www.nature.com/articles/sdata201618 FAIR DATA PRINCIPLES - https://www.force11.org/group/fairgroup/fairprinciples - https://www.force11.org/fairprinciples Guidelines on FAIR Data Management in Horizon 2020 - http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf (DMP, Data Management Plan)
FAIR metadata: Findable TO BE FINDABLE: F1. (meta)data are assigned a globally unique and eternally persistent identifier. F2. data are described with rich metadata. F3. (meta)data are registered or indexed in a searchable resource. F4. metadata specify the data identifier.
FAIR metadata: Accessible TO BE ACCESSIBLE: A1 (meta)data are retrievable by their identifier using a standardized communications protocol. A1.1 the protocol is open, free, and universally implementable. A1.2 the protocol allows for an authentication and authorization procedure, where necessary. A2 metadata are accessible, even when the data are no longer available.
FAIR metadata: Interoperable TO BE INTEROPERABLE: I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles. I3. (meta)data include qualified references to other (meta)data.
FAIR metadata: Reusable TO BE RE-USABLE: R1. meta(data) have a plurality of accurate and relevant attributes. R1.1. (meta)data are released with a clear and accessible data usage license. R1.2. (meta)data are associated with their provenance. R1.3. (meta)data meet domain-relevant community standards.
OpenAIRE guidelines for Literature Repositories https://guidelines.openaire.eu/en/latest/literature/index.html OpenAIRE uses the OAI-PMH v2.0 protocol for harvesting publication metadata OpenAIRE expects metadata to be encoded in the Dublin Core metadata format (metadataPrefix oai_dc) OpenAIRE relies on a specific syntax used in the values of standard Dublin Core metadata fields to identify projects, funders, referenced publications, and datasets. This syntax takes the form of URIs and is defined as the info:eu-repo namespace.
OpenAIRE guidelines for Literature Repositories Application profile overview: https://guidelines.openaire.eu/en/latest/literature/application_profile.html
OpenAIRE guidelines for Data Archives https://guidelines.openaire.eu/en/latest/data/index.html OpenAIRE uses the OAI-PMH v2.0 protocol for harvesting dataset metadata. OpenAIRE expects metadata to be encoded in the DataCite metadata format (metadataPrefix oai_datacite). OpenAIRE shares the goal of the DataCite Metadata Schema - to provide a domain-agnostic metadata schema and provide interoperability through a small number of properties - making interoperability possible in the simplest manner possible and as a result keep the technical barriers for implementation as low as possible.
OpenAIRE guidelines for Data Archives Application profile overview: https://guidelines.openaire.eu/en/latest/data/application_profile.html DataCite: http://schema.datacite.org/meta/kernel-3/doc/DataCite-MetadataKernel_v3.1.pdf Example: https://purr.purdue.edu/publications/1118/2 http://schema.datacite.org/meta/kernel-3/example/datacite-example-dataset-v3.0.xml
OpenAIRE Guidelines for CRIS Managers based on CERIF-XML https://guidelines.openaire.eu/en/latest/cris/index.html https://zenodo.org/record/17065 https://zenodo.org/record/17065/files/OpenAIRE_Guidelines_for_CRIS_Managers_v.1.0.pdf The Guidelines provide orientation for CRIS managers to expose their metadata in a way that is compatible with the OpenAIRE infrastructure. CERIF (Common European Research Information Format) is a standard data model for research information and a recommendation by the European Union to Member States. The OpenAIRE data model is CERIF-compliant and CERIF XML has been adopted by OpenAIRE as the basis for harvesting and importing metadata from CRIS systems.
CERIF subset for OpenAIRE
OpenAIRE Guidelines for CRIS Managers based on CERIF-XML The model comprises of the following CERIF Research Entities: • Publication: cfResultPublication (cfResPubl); • Product/Dataset: cfResultProduct (cfResProd); • Person: cfPerson (cfPers); • Organisation: cfOrganisationUnit (cfOrgUnit); • Project: cfProject (cfProj); • Funding: cfFunding (cfFund); • Equipment: cfEquipment (cfEquip); • Service: cfService (cfSrv).
OpenAIRE Guidelines for CRIS Managers based on CERIF-XML The following tables define the CERIF data elements to be utilised for the exchange of data between individual CRIS systems and the OpenAIRE infrastructure. Example: The CERIF entity cfProject (cfProj) in the context of OpenAIRE is used to represent funded projects.
CERIF for OpenAIRE: e.g. Projects Internal Identifier cfProj.cfProjId Start Date cfProj.cfStartDate End Date cfProj.cfEndDate Acronym cfProj.cfAcro Title cfProj.cfTitle Abstract cfProj.cfAbstr Subject cfProj.cfKeyw; cfProj.cfProj_Class Open Access Requirements cfProj.cfProj_Class (at the moment: OA mandated, OA not mandated) Federated Identifiers cfProj.cfFedId.cfFedId (type of identifier is given through cfProj.cfFedId.cfFedId_Class) Relations (e.g.): Product / Dataset cfProj.cfProj_ResProd Person cfProj.Proj_Pers Organisation cfProj.cfProj_OrgUnit
CERIF for OpenAIRE: e.g. Projects Example: https://zenodo.org/record/17065/files/openaire_cerif_xml_example_projects.xml
Tool to implement guidelines automatically DSpace-CRIS: https://wiki.duraspace.org/display/DSPACECRIS/DSpace-CRIS+Home DSpace-CRIS is the open-source extension of DSpace for the Research Data and Information Management. Examples: http://ira.lib.polyu.edu.hk/cris/rp/rp00068 http://ktisis.cut.ac.cy/handle/10488/7613 http://dspacecris.eurocris.org/ https://portalrecerca.csuc.cat/
Practical exercise – small groups - 30 minutes Choose a publication here: https://www.openaire.eu/search/find and describe it according to the OpenAIRE Guidelines for Literature Repositories: https://guidelines.openaire.eu/en/latest/literature/index.html Choose a dataset here: http://www.re3data.org/ and describe it according to the OpenAIRE Guidelines for Data Archives: https://guidelines.openaire.eu/en/latest/data/index.html