One Step Forward, Two Steps Back:

Slides:



Advertisements
Similar presentations
Developing an Outcomes Assessment Plan. Part One: Asking a Meaningful Question OA is not hard science as we are doing it. Data that you collect is only.
Advertisements

Web E’s goal is for you to understand how to create an initial interaction design and how to evaluate that design by studying a sample. Web F’s goal is.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
Technical Writing for Researchers and Graduate Students Spring 2003 Lincoln Campus Instructor: Deborah Derrick.
Usability presented by the OSU Libraries’ u-team.
Data Seal of Approval Overview Lightning Talk RDA Plenary 5 – San Diego March 11, 2015 Mary Vardigan University of Michigan Inter-university Consortium.
© Curriculum Foundation1 Section 2 The nature of the assessment task Section 2 The nature of the assessment task There are three key questions: What are.
An Introduction to Content Management. By the end of the session you will be able to... Explain what a content management system is Apply the principles.
Product Evaluation the outcome phase. Do the magic bullets work? How do you know when an innovative educational program has “worked”? How do you know.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Trusted Digital Archives and the Data Seal of Approval Peter Doorn Data Archiving.
Data Fabric IG Introduction. 2  about 50 interviews & about 75 community interactions  Data Management and Processing is too time consuming and costly.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.
| 1 Open Access Advancing Text and Data Mining Libraries & Publishers working together to support Researchers What is Text Mining?
WGS Data management course Try-out , Hugo Besemer.
Data Seal of Approval (DSA) SEEDS Kick-off meeting May 5, Lausanne Renate Kunz.
6. (supplemental) User Interface Design. User Interface Design System users often judge a system by its interface rather than its functionality A poorly.
SciDataCon 2014, WDS Forum, Dehli WDS Certification Objective: building trust in the usage of data & data services Michael Diepenbroek Rorie Edmunds Mustapha.
SHERPA/RoMEO Open Access Policy Tool for Publishers Peter Millington Centre for Research Communications University of Nottingham SHERPA/RoMEO for Publishers.
School on Grid & Cloud Computing International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics.
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
WP4 Models and Contents Quality Assessment
WP3: Common policies and implementation strategies
FAIR Data in Trustworthy Data Repositories:
Towards a FAIR Assessment Tool for Datasets
2nd DPHEP Collaboration Workshop
Publishing DDI-Related Topics Advantages and Challenges of Creating Publications Joachim Wackerow EDDI16 - 8th Annual European DDI User Conference Cologne,
User Interface Evaluation
Digital Repository Certification Schema A Pathway for Implementing the GEO Data Sharing and Data Management Principles Robert R. Downs, PhD Sr. Digital.
Does it make sense to apply the FAIR Data Principles to Software?
Scholarly Workflow: Federal Prototype and Preprints
DSA and FAIR: a perfect couple
Research Data Repository Interoperability WG David Wilcox, Thomas Jejkal Montreal, 09/20/17 CC BY-SA 4.0.
ECA 2010, Geneva, Switzerland Creating a synergy between BPM
Ways to upgrade the FAIRness of your data repository.
FAIR Metrics RDA 10 Luiz Bonino – - September 21, 2017.
The Challenge.
knowledge organization for a food secure world
Introduction Helena Cousijn, Claire Austin & Michael Diepenbroek
Qualitative and Quantitative Data
C2CAMP (A Working Title)
The Scientific Method in Psychology
What’s New in Colectica 5.3 Part 1
Sophia Lafferty-hess | research data manager
Experiences of the Digital Repository of Ireland
EOSC Governance Development Forum

Standard Scripts Project 2
OpenML Workshop Eindhoven TU/e,
September 7, 2018 Courtney Baird, MS
From Observational Data to Information (OD2I IG )
Archives and Records Professionals for Research Data IG
PRINCIPLES OF WRITING AND CLASSIFICATION OF QUESTIONS
Improving the Sailor Performance Evaluation Journey
DataverseNL Laura Huis in ’t Veld & Paul Boon Dutch Dataverse
The WDS/RDA Assessment of Data Fitness for Use Working Group
From FAIRy tale to FAIR enough
Standard Scripts Project 2
Improving the Sailor Performance Evaluation Journey
Introduction To Distributed Systems
Automatic evaluation of fairness
Introduction to the CESSDA Data Management Expert Guide
eScience - FAIR Science
How to make training materials discoverable
It’s all about people Data-related training experiences from EUDAT, OpenAIRE, DANS Marjan Grootveld, DANS EDISON workshop, 29 August 2017.
Helena Cousijn, Claire Austin, Jonathan Petters & Michael Diepenbroek
Standard Scripts Project 2
One Step Forward, Two Steps Back:
QoS Metadata Status 106th OGC Technical Committee Orléans, France
Cultivating Semantics for Data in Agriculture and Nutrition
Presentation transcript:

One Step Forward, Two Steps Back: A Design Framework and Exemplar Metrics for Assessing FAIRness in Trustworthy Data Repositories   Peter Doorn, Director DANS WG RDA/WDS Assessment of Data Fitness for Use RDA 11th Plenary meeting Berlin, 22-03-2018 @pkdoorn @dansknaw

Certified Long-term Archive DANS is about keeping data FAIR https://dans.knaw.nl EASY Certified Long-term Archive NARCIS Portal aggregating research information and institutional repositories DataverseNL to support data storage during research until 10 years after 2

Previously on RDA: Barcelona 2017 DSA Principles (for data repositories) FAIR Principles (for data sets) data can be found on the internet Findable data are accessible Accessible data are in a usable format Interoperable data are reliable Reusable data can be referred to (citable) F A I R FAIR Badging scheme https://www.surveymonkey.com/r/fairdat https://eudat.eu/events/webinar/fair-data-in-trustworthy-data-repositories-webinar 3

Some measuring problems encountered in tests of FAIR data assessment tool Assessing multi-file data sets: Which are in different formats, some open, some proprietary Some files well-documented, others less so Some are openly accessible, others are protected Quality of metadata: when is metadata minimal / insufficient / sufficient / extensive / rich ? Use of standard vocabularies: how to define? Often these apply only to a subset of the data, e.g. specific variables ? 4

LiveSlide Site https://www.youtube.com/watch?v=w-3oLA7kjFY&t=0m18s 5

What have we done since? Test prototype FAIRdat within DANS, within 4 other repositories, and at Open Science FAIR in Athens Participate in FAIR metrics group: see http://fairmetrics.org/ 14 metrics on GitHub: https://github.com/FAIRMetrics/Metrics Preprint of paper ‘A design framework and exemplar metrics for FAIRness’: https://www.biorxiv.org/content/early/2017/12/01/225490 Evaluate DANS archive against FAIR metrics 6

3 (for 8 datasets) 2 (for 2 datasets) FAIRdat Prototype Testing (4 repositories) Name of Repository Number of Datasets Number of Reviewers Number of reviews VirginiaTech 5 1 MendeleyData 10 3 (for 8 datasets) 2 (for 2 datasets) 28 Dryad 9 3 (for 2 datasets) 2 (for 3 datasets) 16 CCDC 11 ? (no names) 2 (for 1 dataset) 12 Results: Variances in FAIR scores across multiple reviewers because of: subjectivity of some questions (e.g. sufficiency of metadata) misunderstanding of what was asked Worry that sensitive data will never get a high FAIR rating even if all its metadata is available and machine-readable A month ago we had the opportunity to run a pilot testing of the prototype with 4 data repositories: VirginiaTech, MendeleyData, Dryad and CCDC, in order to see if the questionnaire design is something that would be easy to use and effective. We asked reviewers to assess multiple datasets from different domains and also we had different reviewers assessing the same datasets. According to the results there were some variances in the FAIR scores because of subjectivity of some of the questions [difficulties with assessing the extent of metadata (sufficient/rich)], miss-interpreting what was asked, difficulties with assessing the sustainability of multi-file datasets (preferred vs. accepted file formats). Also, there was concern over the fact that sensitive data/restricted datasets will never be able to score highly even if all its metadata is available and machine readable or even can be available under requested permission is granted by the data holder. So we probably need to find a path for those datasets too! Despite these challenges all the repositories are willing to participate in a second round of testing once adjustments and improvements are made. Slide credits: Eleftheria Tsoupra 7

Prototype Testing (Open Science FAIR) Feedback from 17 participants Pros Simple/easy to use questionnaire Well-documented Useful Cons Oversimplified questionnaire structure Some subjective indicators Some requirements based on Reusability may be missing from the current operationalization Furthermore, 2 weeks ago the pilot version of the assessment tool was tested by the participants of a workshop - part of the Open Science FAIR conference in Athens. This time we gathered feedback by a diverse group of people and by using a set of questions (such as ‘What was best?’, ‘What was the main obstacle?’, etc.) rather than just asking for input… Pros-> According to most of the participants the tool is simple and easy to use, well- documented and useful. Cons-> While for some others the questionnaire structure appeared to be oversimplified and was suggested to add more questions…A couple of participants think that treating R as the average might mean that some requirements are missing, while other... Slide credits: Eleftheria Tsoupra 8

Can we align match the DANS FAIRdat questions with the new FAIR metrics? DANS FAIRdat metrics FAIR metrics K.I.S.S. Aspirational Questionnaire based Fully automatic assessment F + A + I = R R also operationalized Works for men & women Aimed at men & machines

(Self) assessment of DANS archive on the basis of the FAIR principles (& metrics) Delft University: DANS EASY complies with 11 out of 15 principles, for 2 DANS does not comply (I2 & R1.2), for 2 more it is unclear (A2 & R1.3) Self assessment: Some metrics: FAIRness of DANS archive could be improved E.g.: Machine accessibility; Interoperability requirements; Use of standard vocabularies; Provenance Some metrics: we are not sure how to apply them E.g.: PID resolves to landing page (metadata), not to dataset; Dataset may consist of multiple files without standard PID Sometimes the FAIR principle itself is not clear E.g.: Principle applies to both data and metadata; What does interoperability mean for images or PDFs? Are some data types intrinsically UNFAIR? Some terms are inherently subjective (plurality, richly)

General conclusion Before we started thinking about implementing the FAIR principles we were confused about the subject. Having tried to implement them we are still confused -- but on a higher level. This confusion is partly due to the very good ideas underlying FAIR, but we may need a clearer FAIR 2.0 specification of the principles

Steps forward again DANS paper to be out soon Deal with as many FAIR principles/metrics at the repository level (i.e. by Core Trust Seal certification) Have separate metrics/questions at level of dataset (= collection), metadata and single data file Questionnaire approach remains useful as a FAIR data review tool – and we doubt whether automatic testing will work in practice Some new FAIR metrics are too ambitious to be applicable to legacy data Relax on tying questions directly to F, A, I and R and separate scores for each letter Keep some form of badging for the data user to get an impression of fitness for use!