eScience - FAIR Science

Slides:

Advertisements

Similar presentations

Performance Assessment

Advertisements

OLC Spring Chapter Conferences Metadata, Schmetadata … Tell Me Why I Should Care? OLC Spring Chapter Conferences, 2004 Margaret.

Data Seal of Approval Overview Lightning Talk RDA Plenary 5 – San Diego March 11, 2015 Mary Vardigan University of Michigan Inter-university Consortium.

Presented by DOI Create: TERN as a use-case Siddeswara Guru

Data Archiving and Networked Services DANS is an institute of KNAW en NWO Trusted Digital Archives and the Data Seal of Approval Peter Doorn Data Archiving.

Data Archiving and Networked Services DANS is an institute of KNAW en NWO and the Peter Doorn Data Archiving and Networked Services EUDAT Conference Trust.

APARSEN WP22 Identifiers and Citability APARSEN WP22 Identifiers and Citability Some key results Fondazione Rinascimento Digitale Emanuele Bellini, Chiara.

Joint Declaration of Data Citation Principles Notes [1] CODATA 2013: sec 3.2.1; Uhlir (ed.) 2012, ch 14; Altman &

1 24 September BREAKOUT :30 1)Review of Metadata Standards Directory (DCC version and GitHub) 2)Introduction of Metadata Standards Catalog.

1 Digital Preservation Testbed Database Preservation Issues Remco Verdegem Bern, 9 April 2003.

Research Grants and Projects Discovery Service ANDS Webinar 12th August 2015 Monica Omodei, ANDS.

4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group Should.

4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group Should.

4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group.

Joint Declaration of Data Citation Principles (Overview) The Data Citation Synthesis Group Joint Declaration.

Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.

SciDataCon 2014, WDS Forum, Dehli WDS Certification Objective: building trust in the usage of data & data services Michael Diepenbroek Rorie Edmunds Mustapha.

RTI, MUMBAI / CH 61 REPORTING PROCESS DAY 6 SESSION NO.1 (THEORY ) BASED ON CHAPTER 6 PERFORMANCE AUDITING GUIDELINES.

Data Publication (in H2020)

Stuart J. Chalk, Department of Chemistry University of North Florida

WP3: Common policies and implementation strategies

FAIR Data in Trustworthy Data Repositories:

Towards a FAIR Assessment Tool for Datasets

2nd DPHEP Collaboration Workshop

Does it make sense to apply the FAIR Data Principles to Software?

DSA and FAIR: a perfect couple

ELIXIR Core Data Resources and Deposition Databases

RDA Data Fabric (DF) Interest Group Peter Wittenburg & Gary Berg-Cross

FAIR Metadata RDA 10 Luiz Olavo Bonino – - September 21, 2017.

Libraries as Data-Centers for the Arts and Humanities

FAIR Sample and Data Access

Donatella Castelli CNR-ISTI

Ways to upgrade the FAIRness of your data repository.

ACS 2016 Moving research forward with persistent identifiers

FAIR Metrics RDA 10 Luiz Bonino – - September 21, 2017.

Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,

Fitness for use: Users of the U. S

knowledge organization for a food secure world

Maggie, Carlo, Peter, Rebecca (GEDE discussions)

C2CAMP (A Working Title)

Identifiers Answer Questions

Making Annotations FAIR

Sophia Lafferty-hess | research data manager

OPEN DATA – F.A.I.R. PRINCIPLES

Experiences of the Digital Repository of Ireland

Nikhef RDM Policy – first experiences

Attributes and Values Describing Entities.

Metadata for research outputs management Part 2

OpenML Workshop Eindhoven TU/e,

WG/IG Collaboration Meeting June Göteborg METADATA GROUPS PERSPECTIVE Keith G Jeffery & Rebecca Koskela.

Darja Fišer CLARIN ERIC Director of User Involvement

From Observational Data to Information (OD2I IG )

Archives and Records Professionals for Research Data IG

Research Data Management

Metadata in Digital Preservation: Setting the Scene

BETTER AND PROPER ACCESS TO PACIFIC MICRODATA

Business and Management Research

Agenda (AM) 9:30-10:15 Introduction to RDA

Introduction to the MIABIS SOP Working Group

Interoperability – GO FAIR - RDA

How to Implement the FAIR Data Principles? Elly Dijk

The WDS/RDA Assessment of Data Fitness for Use Working Group

From FAIRy tale to FAIR enough

Attributes and Values Describing Entities.

Automatic evaluation of fairness

Introduction to the CESSDA Data Management Expert Guide

It’s all about people Data-related training experiences from EUDAT, OpenAIRE, DANS Marjan Grootveld, DANS EDISON workshop, 29 August 2017.

Helena Cousijn, Claire Austin, Jonathan Petters & Michael Diepenbroek

One Step Forward, Two Steps Back:

One Step Forward, Two Steps Back:

Presentation transcript:

eScience - FAIR Science FAIR enough: A researcher friendly checklist and a not so friendly assessment of the FAIRness of repositories Peter Doorn, Director DANS eScience - FAIR Science PLAN-E Workshop @ IEEE eScience Conference Amsterdam, 29 October 2018 @pkdoorn @dansknaw

Certified Long-term Archive DANS is about keeping data FAIR https://dans.knaw.nl EASY Certified Long-term Archive NARCIS Portal aggregating research information and institutional repositories DataverseNL to support data storage during research until 10 years after

One Step Forward, Two Steps Back: A Design Framework and Exemplar Metrics for Assessing FAIRness in Trustworthy Data Repositories Peter Doorn, Director DANS Last spring in Berlin WG RDA/WDS Assessment of Data Fitness for Use RDA 11th Plenary meeting Berlin, 22-03-2018 @pkdoorn @dansknaw

Towards a FAIR Data Assessment Tool DSA Principles (for data repositories) FAIR Principles (for data sets) data can be found on the internet Findable data are accessible Accessible data are in a usable format Interoperable data are reliable Reusable data can be referred to (citable) Previously on RDA: Barcelona 2017 F A I R FAIR Badging scheme https://www.surveymonkey.com/r/fairdat https://eudat.eu/events/webinar/fair-data-in-trustworthy-data-repositories-webinar

Testing the FAIRdat prototype Test in 4 repositories, summer 2017 Test at Open science Fair, Athens 2017 Name of Repository Number of Datasets Number of Reviewers Number of reviews VirginiaTech 5 1 MendeleyData 10 3 (for 8 datasets) 2 (for 2 datasets) 28 Dryad 9 3 (for 2 datasets) 2 (for 3 datasets) 16 CCDC 11 ? (no names) 2 (for 1 dataset) 12 17 participants + tests within DANS

Pros and Cons of FAIRdat prototype FAIR Metrics: STARRING YOUR DATA Pros, positive feedback Simple/easy to use questionnaire Well-documented Useful Cons, negative feedback Questionnaire oversimplified? Some requirements of Reusability missing/shifted Other observations Variances in FAIR scores across multiple reviewers due to subjectivity Some like starring datasets, others not (should open data score higher than closed data?) Assessing multi-file data sets with different properties F A I R

Other challenges Subjectivity in assessment of principles F2 “rich metadata” I1 “broadly applicable language for knowledge representation” R1 “plurality of attributes” R1.2 “detailed provenance” R1.3. “domain relevant community standards” Use of standard vocabularies: how to define? Misunderstanding of question/meaning of principle Several questions can be answered at the level of the repository A month ago we had the opportunity to run a pilot testing of the prototype with 4 data repositories: VirginiaTech, MendeleyData, Dryad and CCDC, in order to see if the questionnaire design is something that would be easy to use and effective. We asked reviewers to assess multiple datasets from different domains and also we had different reviewers assessing the same datasets. According to the results there were some variances in the FAIR scores because of subjectivity of some of the questions [difficulties with assessing the extent of metadata (sufficient/rich)], miss- interpreting what was asked, difficulties with assessing the sustainability of multi-file datasets (preferred vs. accepted file formats). Also, there was concern over the fact that sensitive data/restricted datasets will never be able to score highly even if all its metadata is available and machine readable or even can be available under requested permission is granted by the data holder. So we probably need to find a path for those datasets too! Despite these challenges all the repositories are willing to participate in a second round of testing once adjustments and improvements are made. Slide credits: Eleftheria Tsoupra

(Self) assessment of DANS archive on the basis of the FAIR principles (& metrics) Delft University: DANS EASY complies with 11 out of 15 principles, for 2 DANS does not comply (I2 & R1.2), for 2 more it is unclear (A2 & R1.3) Self assessment: Some metrics: FAIRness of DANS archive could be improved E.g.: Machine accessibility; Interoperability requirements; Use of standard vocabularies; Provenance Some metrics: we are not sure how to apply them E.g.: PID resolves to landing page (metadata), not to dataset; Dataset may consist of multiple files without standard PID Sometimes the FAIR principle itself is not clear E.g.: Principle applies to both data and metadata; What does interoperability mean for images or PDFs? Are some data types intrinsically UNFAIR? Some terms are inherently subjective (plurality, richly)

FAIR enough checklist by Eliane Fankhauser @ DANS Sneak Preview by Eliane Fankhauser @ DANS Quiz/checklist format Easy to understand and fill out by any researcher Focus on awareness raising rather than on FAIR orthodoxy

FAIR assessment at the level of the repository FAIR principles make no claim to which level of granularity they pertain: repository, collection, data set, file, record, triple, ... However, they often mention “(meta)data”, which we interpret as pertaining both to data and metadata For data in trustworthy (certified CTS) repositories, most FAIR principles are taken care of for all data and metadata by the repository Only F2, I2, I3, R1 (R1.2 and R1.3) can vary for metadata in a CTS certified repository Only I1, I2, I3, R1 (R1.3) can vary for data (sets and files) in a CTS certified repository

FAIR principles turned into questions at three levels - Findable For all Data and Metadata in a Repository For Metadata For Data (Sets and Files or other subunits of information) F1: Does the repository assign globally unique and persistent identifiers to the data(sets and files) in the repository? F1: Does the repository assign a globally unique and persistent identifier to the metadata? F2: Are the metadata that the repository offer in a machine-readable form, that is to say, in a format that can be easily processed by machines? Are the data described with rich metadata? F3: Do the metadata that the repository offer clearly and explicitly include the identifier of the data it describes? F4: Are the data in the repository registered or indexed in a searchable resource?* F4: Are the metadata that the repository offers registered or indexed in a searchable resource? * This is only possible for openly accessible data

FAIR principles turned into questions at three levels - Accessible For all Data and Metadata in a Repository For Metadata For Data (Sets and Files or other subunits of information) A1: Are the metadata retrievable by their identifier using a standardized communications protocol? A1: Are the data(sets and files) retrievable by their identifier using a standardized communications protocol? A1.1: Is the protocol open, free, and universally implementable? A1.2: Does the protocol allow for an authentication and authorization procedure, when required? A2: Are the metadata accessible, even when the data are no longer [or not] available

FAIR principles turned into questions at three levels - Interoperable For all Data and Metadata in a Repository For Metadata For Data (Sets and Files or other subunits of information) I1: Are the metadata in a formal, accessible, shared, and broadly applicable language for knowledge representation? I1: Are the data in a formal, accessible, shared, and broadly applicable language for knowledge representation? 2 I2. Does the metadata schema used by the repository use vocabularies that follow FAIR principles? I2. Do the metadata use vocabularies that follow FAIR principles?1 I2. Do the data use vocabularies that follow FAIR principles? 3 I3. Does the metadata schema the repository uses include one or more elements to refer to other metadata or data? I3. Do the metadata include qualified references to other metadata? I3. Do the data include qualified references to other data? 1 in particular F and A 2 answer may vary for individual files within a dataset 3 especially F and A; answer may depend on level of granulation

FAIR principles turned into questions at three levels - Reusable For all Data and Metadata in a Repository For Metadata For Data (Sets and Files or other subunits of information) R1. Are the attributes of the metadata schema relevant and sufficient? R1. Are the metadata sufficient, accurate and relevant? R1. Are the data accurate (or fit for purpose?) R1.1. Does the repository supply clear and accessible data usage licenses? R1.1. Does the repository supply clear and accessible licenses for the metadata? R1.2. Does the metadata schema of the repository contain one or more elements to describe the provenance of the data? R1.2. Is the provenance of the metadata accurately described? R1.2. Is the provenance of the data accurately described in the metadata or additional documentation? R1.3. Does the metadata schema of the repository meet domain standards of the community or communities it serves? R1.3. Do the metadata or does the additional documentation meet domain-relevant community standards? R1.3. Do the data meet domain-relevant community standards?

Next Steps Paper on FAIR self assessment of DANS EASY Establish where and how trustworthy repositories can improve their FAIRness – perhaps propose extensions to Core Trust Seal “FAIR enough” checklist for researchers as a tool to raise awareness Deal with as many FAIR principles/metrics as possible at the level of the repository Focus FAIR assessment on what varies within trustworthy repositories Have separate metrics/questions at level of dataset, metadata and single data file (or other subunits of data collections) Questionnaire approach remains useful as a FAIR data review tool