From FAIRy tale to FAIR enough

Slides:



Advertisements
Similar presentations
TRAC / TDR ICPSR Trustworthy Digital Repositories.
Advertisements

Data Seal of Approval Overview Lightning Talk RDA Plenary 5 – San Diego March 11, 2015 Mary Vardigan University of Michigan Inter-university Consortium.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Trusted Digital Archives and the Data Seal of Approval Peter Doorn Data Archiving.
Towards Data Management Principles (report of progress of the Task Force on Data Management Principles) Alessandro Annoni European Commission Joint Research.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Data Seal of Approval (DSA) SEEDS Kick-off meeting May 5, Lausanne Renate Kunz.
IPDA Architecture Project International Planetary Data Alliance IPDA Architecture Project Report.
GEO Data Management Principles Implementation : World Data System–Data Seal of Approval (WDS-DSA) Core Certification of Digital Repositories Dr Mustapha.
SciDataCon 2014, WDS Forum, Dehli WDS Certification Objective: building trust in the usage of data & data services Michael Diepenbroek Rorie Edmunds Mustapha.
School on Grid & Cloud Computing International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics.
DSA & WDS WG Certification RDA Outputs: Munich 20 February 2015.
Core Certification for Trustworthy Data Repositories
Data Publication (in H2020)
Jeff Moon Data Librarian &
WP3: Common policies and implementation strategies
CESSDA SaW Training on Trust, Identifying Demand & Networking
Principles of Good Governance
FAIR Data in Trustworthy Data Repositories:
Towards a FAIR Assessment Tool for Datasets
2nd DPHEP Collaboration Workshop
Legacy and future of the World Data System (WDS) certification of data services and networks Dr Mustapha Mokrane, Executive Director, WDS International.
Software Quality Control and Quality Assurance: Introduction
Digital Repository Certification Schema A Pathway for Implementing the GEO Data Sharing and Data Management Principles Robert R. Downs, PhD Sr. Digital.
Auditing of Trustworthy Data Repositories – Speakers
DSA and FAIR: a perfect couple
Current and Upcoming RDA Recommendations Dr. ir. Herman Stehouwer
Designing a better future: Active, actionable DMPs
ELIXIR Core Data Resources and Deposition Databases
EOSC MODEL Pasquale Pagano CNR - ISTI
Paolo Budroni, University of Vienna
Certification of Trusted Repositories
RDA/WDS IG Certification of Digital Repositories The new 'Core Trustworthy Data Repository Requirements' hands-on RDA Plenary 9, Barcelona,
FAIR Metadata RDA 10 Luiz Olavo Bonino – - September 21, 2017.
Trustworthiness of Preservation Systems
Libraries as Data-Centers for the Arts and Humanities
FAIR Sample and Data Access
Donatella Castelli CNR-ISTI
Ways to upgrade the FAIRness of your data repository.
Summit 2017 Breakout Group 2: Data Management (DM)
FAIR Metrics RDA 10 Luiz Bonino – - September 21, 2017.
The Challenge.
Welcome slide.
knowledge organization for a food secure world
Publishing software and data
Identifiers Answer Questions
Making Annotations FAIR
Sophia Lafferty-hess | research data manager
OPEN DATA – F.A.I.R. PRINCIPLES
Experiences of the Digital Repository of Ireland
DATA SPHINX & EUDAT Collaboration
EOSCpilot Skills Landscape & Framework
Metadata for research outputs management Part 2
Standard Scripts Project 2
Metadata for research outputs management
EOSCpilot All Hands Meeting 9 March 2018, Pisa
WG/IG Collaboration Meeting June Göteborg METADATA GROUPS PERSPECTIVE Keith G Jeffery & Rebecca Koskela.
Research Data Management
Interoperability – GO FAIR - RDA
EOSCpilot All Hands Meeting 9 March 2018, Pisa
How to Implement the FAIR Data Principles? Elly Dijk
Standard Scripts Project 2
Bird of Feather Session
Automatic evaluation of fairness
Introduction to the CESSDA Data Management Expert Guide
eScience - FAIR Science
Helena Cousijn, Claire Austin, Jonathan Petters & Michael Diepenbroek
Standard Scripts Project 2
One Step Forward, Two Steps Back:
One Step Forward, Two Steps Back:
Australian and New Zealand Metadata Working Group
Presentation transcript:

From FAIRy tale to FAIR enough FAIR data in practice: From FAIRy tale to FAIR enough   Peter Doorn, Eliane Fankhauser, Mustapha Mokrane Webinar, 11 December 2018 Twitter: @pkdoorn @MokraneMA @DANSKNAW

Who we are FAIR data in practice: From FAIRy tale to FAIR enough   Webinar, 11 December 2018 Peter Doorn Eliane Fankhauser Peter Doorn, Eliane Fankhauser, Mustapha Mokrane Mustapha Mokrane

FAIR Data in Trustworthy Data Repositories: Everybody wants to play FAIR, but how do we put the principles into practice?   Once upon a time, two years ago.... Peter Doorn, Director DANS Ingrid Dillo, Deputy Director DANS https://eudat.eu/events/webinar/fair-data-in-trustworthy-data-repositories-webinar EUDAT/OpenAIRE webinar, 12-13 December 2016

From the previous episode.... DSA Principles (for data repositories) FAIR Principles (for data sets) data can be found on the internet Findable data are accessible Accessible data are in a usable format Interoperable data are reliable Reusable data can be referred to (citable) F A I R FAIR Badging scheme https://www.surveymonkey.com/r/fairdat 4

What have we done since? Test prototype FAIRdat within DANS, within 4 other repositories, and at Open Science FAIR in Athens Participate in FAIR metrics group: see http://fairmetrics.org/ 14 metrics on GitHub: https://github.com/FAIRMetrics/Metrics Wilkinson, M. D. et al. ‘A design framework and exemplar metrics for FAIRness’. Sci. Data 5:180118 doi: 10.1038/sdata.2018.118 (2018) Evaluate DANS archive against FAIR metrics 5

Testing the FAIRdat prototype Test in 4 repositories, summer 2017 Test at Open science Fair, Athens 2017 Name of Repository Number of Datasets Number of Reviewers Number of reviews VirginiaTech 5 1 MendeleyData 10 3 (for 8 datasets) 2 (for 2 datasets) 28 Dryad 9 3 (for 2 datasets) 2 (for 3 datasets) 16 CCDC 11 ? (no names) 2 (for 1 dataset) 12 17 participants + tests within DANS

Pros and Cons of FAIRdat prototype FAIR Metrics: STARRING YOUR DATA Pros, positive feedback Simple/easy to use questionnaire Well-documented Useful Cons, negative feedback Questionnaire oversimplified? Some requirements of Reusability missing/shifted Other observations Variances in FAIR scores across multiple reviewers due to subjectivity Some like starring datasets, others not (should open data score higher than closed data?) Assessing multi-file data sets with different properties F A I R

Other challenges Subjectivity in assessment of principles F2 “rich metadata” I1 “broadly applicable language for knowledge representation” R1 “plurality of attributes” R1.2 “detailed provenance” R1.3. “domain relevant community standards” Use of standard vocabularies: how to define? Misunderstandings of question/meaning of principle Most FAIR metrics can be measured at the level of the repository A month ago we had the opportunity to run a pilot testing of the prototype with 4 data repositories: VirginiaTech, MendeleyData, Dryad and CCDC, in order to see if the questionnaire design is something that would be easy to use and effective. We asked reviewers to assess multiple datasets from different domains and also we had different reviewers assessing the same datasets. According to the results there were some variances in the FAIR scores because of subjectivity of some of the questions [difficulties with assessing the extent of metadata (sufficient/rich)], miss- interpreting what was asked, difficulties with assessing the sustainability of multi-file datasets (preferred vs. accepted file formats). Also, there was concern over the fact that sensitive data/restricted datasets will never be able to score highly even if all its metadata is available and machine readable or even can be available under requested permission is granted by the data holder. So we probably need to find a path for those datasets too! Despite these challenges all the repositories are willing to participate in a second round of testing once adjustments and improvements are made. Slide credits: Eleftheria Tsoupra

(Self) assessment of DANS archive on the basis of the FAIR principles (& metrics) Delft University: DANS EASY complies with 11 out of 15 principles, for 2 DANS does not comply (I2 & R1.2), for 2 more it is unclear (A2 & R1.3) Self assessment: Some metrics: FAIRness of DANS archive could be improved E.g.: Machine accessibility; Interoperability requirements; Use of standard vocabularies; Provenance Some metrics: we are not sure how to apply them E.g.: PID resolves to landing page (metadata), not to dataset; Dataset may consist of multiple files without standard PID Sometimes the FAIR principle itself is not clear E.g.: Principle applies to both data and metadata; What does interoperability mean for images or PDFs? Are some data types intrinsically UNFAIR? Some terms are inherently subjective (plurality, richly)

What are we working on now? A fork in the road ahead Eliane: FAIR enough? A checklist for researchers to evaluate the FAIRness of data(sets) Mustapha: CoreTrustSealEnabling FAIR Data Repositories

Eliane Fankhauser Project Leader DANS FAIR enough? A checklist for researchers to evaluate the FAIRness of data(sets) Eliane Fankhauser Project Leader DANS

FAIR assessment tools: An overview Checklist “How FAIR are your data? “A Checklist produced for use at the EUDAT summer school to discuss how FAIR the participant's research data were…” FAIR self-assessment tool Provided by ANDS / Nectar / RDS ”… designed predominantly for data librarians and IT staff…” FAIRdat tool “Using this tool you will be able to score the 'FAIRness' of a dataset.” Checklist for Evaluation of Dataset Fitness for Use Provided by RDA Working Group “Assessment of Data Fitness for Use” “This checklist is meant to supplement the CoreTrustSeal Repository Certification process.” Not yet available, for more information see here FAIR enough? Checklist to evaluate FAIRness of data(sets) Provided by DANS

FAIR checklist for researchers Short and concise checklist for researchers who are planning to deposit their data Covers different levels of FAIRness (repository, metadata, dataset, files) Embraces two core concepts FAIR data Trustworthy repository Current state: beta version (Google Forms)

Checklist demonstration

Summary Questions formulated as simple as possible No direct ”translation” of FAIR principles Short explanations of terms and concepts Reference to trustworthy repositories and CTS Overall score at the end “Recommendations” for questions answered with no

The FAIR checklist for researchers online: https://dans.knaw.nl/nl/projecten

FAIR checklist factsheet (draft version)

CoreTrustSeal— Enabling FAIR Data Repositories Mustapha Mokrane, Consultant at DANS, and CoreTrustSeal Board This presentation will address the alignment between FAIR Principles and the certification of Trustworthy Data Repositories and more specifically the CoreTrustSeal certification. The main idea I will articulate is that Trustworthy Data Repositories are the natural home and provide an ecosystem for FAIR data.

“Research data will not become nor stay FAIR by magic “Research data will not become nor stay FAIR by magic. We need skilled people, transparent processes, interoperable technologies and collaboration to build, operate and maintain research data infrastructures.” Mari Kleemola, the Secretary of the CoreTrustSeal Board wrote recently in a blog post on the occasion of the World Digital Preservation day that: Mari Kleemola, Finnish Social Science Data Archive, Finland CoreTrustSeal Board, Secretary https://tietoarkistoblogi.blogspot.com/2018/11/being-trustworthy-and-fair.html

FAIR GUIDING PRINCIPLES Focus: Enable discovery and reuse of data Process: Data management & stewardship As you might already be aware, FAIR Principles have a focus on enabling discovery and reuse of data and the essential processes behind the principles are data management and stewardship

FAIR RESEARCH DATA LIFECYCLE Research Data management and stewardship is a cyclic process and FAIR Principles relate to all stages of this lifecycle. An important point here is that FAIR data must be born FAIR and remain FAIR in this lifecycle.

FAIR RESEARCH DATA LIFECYCLE Research Data Repositories Data Repositories are key research infrastructures involved in all stages of this lifecycle. Their main missions are to provide access, enable the reuse and ensure the preservation of data for the long term. From this perspective they play a critical role in enabling the FAIR Principles.

FAIR GUIDING PRINCIPLES In the seminal FAIR paper, the Principles define characteristics that apply to a continuum of objects. It is also recognized that FAIR data cannot exist without FAIR tools, vocabularies and infrastructures such as data repositories! This has been emphasized in a the recent Turning FAIR data into reality report and Action Plan from the European Commission Expert Group on FAIR Data. A model for FAIR Digital Objects The components of a FAIR Ecosystem Turning FAIR data into reality, Final report and Action Plan from the European Commission Expert Group on FAIR Data https://doi.org/10.2777/54599

FAIR ASSESSMENT: FINDABILITY (META)DATA F1. (meta)data are assigned a globally unique and persistent identifier F2. data are described with rich metadata F3. metadata clearly and explicitly include the identifier of the data it describes DATA REPOSITORY F4. (meta)data are registered or indexed in a searchable resource TECHNOLOGIES PROCEDURES EXPERTISE PEOPLE An assessment of data FAIRness cannot be done only at the level of the metadata/data. There is a need to consider the context around the data and data repositories are key elements. Some FAIR Principles cannot actually be assessed without looking at the data repository level. For Findability Principles for e.g. Principle F4. Compliance with this Principle cannot be assessed at the metadata or data level. And even for F1, F2 and F3, ensuring and maintaining compliance will depend on Technologies, Procedures, Expertise, People at the Data Repository level

FAIR ASSESSMENT: ACCESSIBILITY DATA REPOSITORY A1. (meta)data are retrievable by their identifier using a standardized communications protocol A1.1 the protocol is open, free, and universally implementable A1.2 the protocol allows for an authentication and authorization procedure, where necessary A2. metadata are accessible, even when the data are no longer available (META)DATA For Accessibility, all principles require looking at the data repository

FAIR ASSESSMENT: INTEROPERABILITY (META)DATA I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles I3. (meta)data include qualified references to other (meta)data DATA REPOSITORY TECHNICAL INFRASTRUCTURE PROCEDURES EXPERTISE PEOPLE For Interoperability and assessment can be done at the meta and metadata level but once again ensuring that this will be maintained depends on data repositories.

FAIR ASSESSMENT: REUSABILITY (META)DATA R1. (meta)data are richly described with a plurality of accurate and relevant attributes R1.1. (meta)data are released with a clear and accessible data usage license R1.2. (meta)data are associated with detailed provenance DATA REPOSITORY R1.3. (meta)data meet domain-relevant community standards TECHNICAL INFRASTRUCTURE PROCEDURES EXPERTISE PEOPLE For the Reusability Principles, all but R.1.3, can be done at the meta and metadata level but once again ensuring that this will be maintained depends on data repositories.

CORETRUSTSEAL ASSESSMENT Digital Object Management Organizational Infrastructure Technology At the level of the Data Repository, CoreTrustSeal Requirements define minimum requirements for research data repositories to be recognized as trustworthy and thus aligned with FAIR data principles. CoreTrustSeal Data Repositories Requirements https://www.coretrustseal.org/why-certification/requirements/

CORETRUSTSEAL— FAIR ALIGNMENT Offer persistent identifiers [F1 and F3] Recommended data citations [F1] Searchable metadata catalogue to appropriate standards [F2, F3] Search facilities, inclusion in disciplinary or generic registries of resources [F4] R4 R10 R13 R15 R16 A Facilitate machine harvesting of the metadata [A1] Uses international and/or community standards [A1.1] Searchable metadata catalogue to appropriate standards [A1 and A1.1] Technical infrastructure: protection of facility, data, products, services, users [A1.2] Data managed in compliance with discipline and ethical norms [A1.2] Responsibility for long-term preservation [A2] R14 R11 I Metadata required when the data are provided [I1] Formats used by the Designated Community [I1] Measures and plans for the possible evolution and migration of formats [I2] Ensure understandability of the data [I2] Ability to comment on, and/or rate data and metadata [I3] Provide citations to related works or links to citation indices [I3] This alignment can be illustrated by mapping the FAIR Principles to the CoreTrustSeal requirements as you can see on this table. This work will be published and will be used as a foundation to facilitate building and assessing the FAIR data ecosystem. R2 R7 R8 R11 R Integrity and authenticity of the data [R1] Documentation of the completeness of the data and metadata [R1] Links to metadata and to other datasets [R1] Provenance data and related audit trails [R1.2] Maintains licenses covering data access and use and monitors compliance [R1.1] Defined data and metadata: ensure relevance and understandability for users [R1.3] Technical data and metadata quality and assessment of adherence to schema [R1.3]

FAIR ECOSYSTEM Rec. 20: Deposit in Trusted Digital Repositories Research data should be made available by means of Trusted Digital Repositories, and where possible in those with a mission and expertise to support a specific discipline or interdisciplinary research community. Rec. 9: Develop assessment frameworks to certify FAIR services Data services must be encouraged and supported to obtain certification, as frameworks to assess FAIR services emerge. Existing community-endorsed methods to assess data services, in particular CoreTrustSeal (CTS) for trusted digital repositories, should be used as a starting point to develop assessment frameworks for FAIR services. Repositories that steward data for a substantial period of time should be encouraged and supported to achieve CTS certification. Turning FAIR data into reality, Final report and Action Plan from the European Commission Expert Group on FAIR Data doi.org/10.2777/54599

TAKE HOME MESSAGES FAIR Principles apply to more than (meta)data FAIR data assessments must include infrastructure FAIR data live in Trustworthy Data Repositories CoreTrustSeal Requirements are FAIR aligned

To be continued... Work at DANS on FAIR is in full progress. In 2019 we are planning to continue our work in a European project to formulate FAIR rules of participation for the EOSC. Subjects to work on include: Strengthen certification of repositories for FAIR data FAIR data policies Assessment of FAIRness of data and metadata within certified repositories, focusing on those metrics that vary within such repositories FAIR software and services Training on FAIR data (management), both for students within regular academic curricula and for others

Thank you for listening @pkdoorn Peter.Doorn@dans.knaw.nl Eliane.Fankhauser@dans.knaw.nl @MokraneMA Mustapha.Mokrane@dans.knaw.nl @DANSKNAW www.dans.knaw.nl