From FAIRy tale to FAIR enough FAIR data in practice: From FAIRy tale to FAIR enough Peter Doorn, Eliane Fankhauser, Mustapha Mokrane Webinar, 11 December 2018 Twitter: @pkdoorn @MokraneMA @DANSKNAW
Who we are FAIR data in practice: From FAIRy tale to FAIR enough Webinar, 11 December 2018 Peter Doorn Eliane Fankhauser Peter Doorn, Eliane Fankhauser, Mustapha Mokrane Mustapha Mokrane
FAIR Data in Trustworthy Data Repositories: Everybody wants to play FAIR, but how do we put the principles into practice? Once upon a time, two years ago.... Peter Doorn, Director DANS Ingrid Dillo, Deputy Director DANS https://eudat.eu/events/webinar/fair-data-in-trustworthy-data-repositories-webinar EUDAT/OpenAIRE webinar, 12-13 December 2016
From the previous episode.... DSA Principles (for data repositories) FAIR Principles (for data sets) data can be found on the internet Findable data are accessible Accessible data are in a usable format Interoperable data are reliable Reusable data can be referred to (citable) F A I R FAIR Badging scheme https://www.surveymonkey.com/r/fairdat 4
What have we done since? Test prototype FAIRdat within DANS, within 4 other repositories, and at Open Science FAIR in Athens Participate in FAIR metrics group: see http://fairmetrics.org/ 14 metrics on GitHub: https://github.com/FAIRMetrics/Metrics Wilkinson, M. D. et al. ‘A design framework and exemplar metrics for FAIRness’. Sci. Data 5:180118 doi: 10.1038/sdata.2018.118 (2018) Evaluate DANS archive against FAIR metrics 5
Testing the FAIRdat prototype Test in 4 repositories, summer 2017 Test at Open science Fair, Athens 2017 Name of Repository Number of Datasets Number of Reviewers Number of reviews VirginiaTech 5 1 MendeleyData 10 3 (for 8 datasets) 2 (for 2 datasets) 28 Dryad 9 3 (for 2 datasets) 2 (for 3 datasets) 16 CCDC 11 ? (no names) 2 (for 1 dataset) 12 17 participants + tests within DANS
Pros and Cons of FAIRdat prototype FAIR Metrics: STARRING YOUR DATA Pros, positive feedback Simple/easy to use questionnaire Well-documented Useful Cons, negative feedback Questionnaire oversimplified? Some requirements of Reusability missing/shifted Other observations Variances in FAIR scores across multiple reviewers due to subjectivity Some like starring datasets, others not (should open data score higher than closed data?) Assessing multi-file data sets with different properties F A I R
Other challenges Subjectivity in assessment of principles F2 “rich metadata” I1 “broadly applicable language for knowledge representation” R1 “plurality of attributes” R1.2 “detailed provenance” R1.3. “domain relevant community standards” Use of standard vocabularies: how to define? Misunderstandings of question/meaning of principle Most FAIR metrics can be measured at the level of the repository A month ago we had the opportunity to run a pilot testing of the prototype with 4 data repositories: VirginiaTech, MendeleyData, Dryad and CCDC, in order to see if the questionnaire design is something that would be easy to use and effective. We asked reviewers to assess multiple datasets from different domains and also we had different reviewers assessing the same datasets. According to the results there were some variances in the FAIR scores because of subjectivity of some of the questions [difficulties with assessing the extent of metadata (sufficient/rich)], miss- interpreting what was asked, difficulties with assessing the sustainability of multi-file datasets (preferred vs. accepted file formats). Also, there was concern over the fact that sensitive data/restricted datasets will never be able to score highly even if all its metadata is available and machine readable or even can be available under requested permission is granted by the data holder. So we probably need to find a path for those datasets too! Despite these challenges all the repositories are willing to participate in a second round of testing once adjustments and improvements are made. Slide credits: Eleftheria Tsoupra
(Self) assessment of DANS archive on the basis of the FAIR principles (& metrics) Delft University: DANS EASY complies with 11 out of 15 principles, for 2 DANS does not comply (I2 & R1.2), for 2 more it is unclear (A2 & R1.3) Self assessment: Some metrics: FAIRness of DANS archive could be improved E.g.: Machine accessibility; Interoperability requirements; Use of standard vocabularies; Provenance Some metrics: we are not sure how to apply them E.g.: PID resolves to landing page (metadata), not to dataset; Dataset may consist of multiple files without standard PID Sometimes the FAIR principle itself is not clear E.g.: Principle applies to both data and metadata; What does interoperability mean for images or PDFs? Are some data types intrinsically UNFAIR? Some terms are inherently subjective (plurality, richly)
What are we working on now? A fork in the road ahead Eliane: FAIR enough? A checklist for researchers to evaluate the FAIRness of data(sets) Mustapha: CoreTrustSealEnabling FAIR Data Repositories
Eliane Fankhauser Project Leader DANS FAIR enough? A checklist for researchers to evaluate the FAIRness of data(sets) Eliane Fankhauser Project Leader DANS
FAIR assessment tools: An overview Checklist “How FAIR are your data? “A Checklist produced for use at the EUDAT summer school to discuss how FAIR the participant's research data were…” FAIR self-assessment tool Provided by ANDS / Nectar / RDS ”… designed predominantly for data librarians and IT staff…” FAIRdat tool “Using this tool you will be able to score the 'FAIRness' of a dataset.” Checklist for Evaluation of Dataset Fitness for Use Provided by RDA Working Group “Assessment of Data Fitness for Use” “This checklist is meant to supplement the CoreTrustSeal Repository Certification process.” Not yet available, for more information see here FAIR enough? Checklist to evaluate FAIRness of data(sets) Provided by DANS
FAIR checklist for researchers Short and concise checklist for researchers who are planning to deposit their data Covers different levels of FAIRness (repository, metadata, dataset, files) Embraces two core concepts FAIR data Trustworthy repository Current state: beta version (Google Forms)
Checklist demonstration
Summary Questions formulated as simple as possible No direct ”translation” of FAIR principles Short explanations of terms and concepts Reference to trustworthy repositories and CTS Overall score at the end “Recommendations” for questions answered with no
The FAIR checklist for researchers online: https://dans.knaw.nl/nl/projecten
FAIR checklist factsheet (draft version)
CoreTrustSeal— Enabling FAIR Data Repositories Mustapha Mokrane, Consultant at DANS, and CoreTrustSeal Board This presentation will address the alignment between FAIR Principles and the certification of Trustworthy Data Repositories and more specifically the CoreTrustSeal certification. The main idea I will articulate is that Trustworthy Data Repositories are the natural home and provide an ecosystem for FAIR data.
“Research data will not become nor stay FAIR by magic “Research data will not become nor stay FAIR by magic. We need skilled people, transparent processes, interoperable technologies and collaboration to build, operate and maintain research data infrastructures.” Mari Kleemola, the Secretary of the CoreTrustSeal Board wrote recently in a blog post on the occasion of the World Digital Preservation day that: Mari Kleemola, Finnish Social Science Data Archive, Finland CoreTrustSeal Board, Secretary https://tietoarkistoblogi.blogspot.com/2018/11/being-trustworthy-and-fair.html
FAIR GUIDING PRINCIPLES Focus: Enable discovery and reuse of data Process: Data management & stewardship As you might already be aware, FAIR Principles have a focus on enabling discovery and reuse of data and the essential processes behind the principles are data management and stewardship
FAIR RESEARCH DATA LIFECYCLE Research Data management and stewardship is a cyclic process and FAIR Principles relate to all stages of this lifecycle. An important point here is that FAIR data must be born FAIR and remain FAIR in this lifecycle.
FAIR RESEARCH DATA LIFECYCLE Research Data Repositories Data Repositories are key research infrastructures involved in all stages of this lifecycle. Their main missions are to provide access, enable the reuse and ensure the preservation of data for the long term. From this perspective they play a critical role in enabling the FAIR Principles.
FAIR GUIDING PRINCIPLES In the seminal FAIR paper, the Principles define characteristics that apply to a continuum of objects. It is also recognized that FAIR data cannot exist without FAIR tools, vocabularies and infrastructures such as data repositories! This has been emphasized in a the recent Turning FAIR data into reality report and Action Plan from the European Commission Expert Group on FAIR Data. A model for FAIR Digital Objects The components of a FAIR Ecosystem Turning FAIR data into reality, Final report and Action Plan from the European Commission Expert Group on FAIR Data https://doi.org/10.2777/54599
FAIR ASSESSMENT: FINDABILITY (META)DATA F1. (meta)data are assigned a globally unique and persistent identifier F2. data are described with rich metadata F3. metadata clearly and explicitly include the identifier of the data it describes DATA REPOSITORY F4. (meta)data are registered or indexed in a searchable resource TECHNOLOGIES PROCEDURES EXPERTISE PEOPLE An assessment of data FAIRness cannot be done only at the level of the metadata/data. There is a need to consider the context around the data and data repositories are key elements. Some FAIR Principles cannot actually be assessed without looking at the data repository level. For Findability Principles for e.g. Principle F4. Compliance with this Principle cannot be assessed at the metadata or data level. And even for F1, F2 and F3, ensuring and maintaining compliance will depend on Technologies, Procedures, Expertise, People at the Data Repository level
FAIR ASSESSMENT: ACCESSIBILITY DATA REPOSITORY A1. (meta)data are retrievable by their identifier using a standardized communications protocol A1.1 the protocol is open, free, and universally implementable A1.2 the protocol allows for an authentication and authorization procedure, where necessary A2. metadata are accessible, even when the data are no longer available (META)DATA For Accessibility, all principles require looking at the data repository
FAIR ASSESSMENT: INTEROPERABILITY (META)DATA I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles I3. (meta)data include qualified references to other (meta)data DATA REPOSITORY TECHNICAL INFRASTRUCTURE PROCEDURES EXPERTISE PEOPLE For Interoperability and assessment can be done at the meta and metadata level but once again ensuring that this will be maintained depends on data repositories.
FAIR ASSESSMENT: REUSABILITY (META)DATA R1. (meta)data are richly described with a plurality of accurate and relevant attributes R1.1. (meta)data are released with a clear and accessible data usage license R1.2. (meta)data are associated with detailed provenance DATA REPOSITORY R1.3. (meta)data meet domain-relevant community standards TECHNICAL INFRASTRUCTURE PROCEDURES EXPERTISE PEOPLE For the Reusability Principles, all but R.1.3, can be done at the meta and metadata level but once again ensuring that this will be maintained depends on data repositories.
CORETRUSTSEAL ASSESSMENT Digital Object Management Organizational Infrastructure Technology At the level of the Data Repository, CoreTrustSeal Requirements define minimum requirements for research data repositories to be recognized as trustworthy and thus aligned with FAIR data principles. CoreTrustSeal Data Repositories Requirements https://www.coretrustseal.org/why-certification/requirements/
CORETRUSTSEAL— FAIR ALIGNMENT Offer persistent identifiers [F1 and F3] Recommended data citations [F1] Searchable metadata catalogue to appropriate standards [F2, F3] Search facilities, inclusion in disciplinary or generic registries of resources [F4] R4 R10 R13 R15 R16 A Facilitate machine harvesting of the metadata [A1] Uses international and/or community standards [A1.1] Searchable metadata catalogue to appropriate standards [A1 and A1.1] Technical infrastructure: protection of facility, data, products, services, users [A1.2] Data managed in compliance with discipline and ethical norms [A1.2] Responsibility for long-term preservation [A2] R14 R11 I Metadata required when the data are provided [I1] Formats used by the Designated Community [I1] Measures and plans for the possible evolution and migration of formats [I2] Ensure understandability of the data [I2] Ability to comment on, and/or rate data and metadata [I3] Provide citations to related works or links to citation indices [I3] This alignment can be illustrated by mapping the FAIR Principles to the CoreTrustSeal requirements as you can see on this table. This work will be published and will be used as a foundation to facilitate building and assessing the FAIR data ecosystem. R2 R7 R8 R11 R Integrity and authenticity of the data [R1] Documentation of the completeness of the data and metadata [R1] Links to metadata and to other datasets [R1] Provenance data and related audit trails [R1.2] Maintains licenses covering data access and use and monitors compliance [R1.1] Defined data and metadata: ensure relevance and understandability for users [R1.3] Technical data and metadata quality and assessment of adherence to schema [R1.3]
FAIR ECOSYSTEM Rec. 20: Deposit in Trusted Digital Repositories Research data should be made available by means of Trusted Digital Repositories, and where possible in those with a mission and expertise to support a specific discipline or interdisciplinary research community. Rec. 9: Develop assessment frameworks to certify FAIR services Data services must be encouraged and supported to obtain certification, as frameworks to assess FAIR services emerge. Existing community-endorsed methods to assess data services, in particular CoreTrustSeal (CTS) for trusted digital repositories, should be used as a starting point to develop assessment frameworks for FAIR services. Repositories that steward data for a substantial period of time should be encouraged and supported to achieve CTS certification. Turning FAIR data into reality, Final report and Action Plan from the European Commission Expert Group on FAIR Data doi.org/10.2777/54599
TAKE HOME MESSAGES FAIR Principles apply to more than (meta)data FAIR data assessments must include infrastructure FAIR data live in Trustworthy Data Repositories CoreTrustSeal Requirements are FAIR aligned
To be continued... Work at DANS on FAIR is in full progress. In 2019 we are planning to continue our work in a European project to formulate FAIR rules of participation for the EOSC. Subjects to work on include: Strengthen certification of repositories for FAIR data FAIR data policies Assessment of FAIRness of data and metadata within certified repositories, focusing on those metrics that vary within such repositories FAIR software and services Training on FAIR data (management), both for students within regular academic curricula and for others
Thank you for listening @pkdoorn Peter.Doorn@dans.knaw.nl Eliane.Fankhauser@dans.knaw.nl @MokraneMA Mustapha.Mokrane@dans.knaw.nl @DANSKNAW www.dans.knaw.nl