Download presentation
Presentation is loading. Please wait.
1
Towards a FAIR Assessment Tool for Datasets
@dansknaw Towards a FAIR Assessment Tool for Datasets Lisa de Leeuw, DANS …I’d like to talk to you about an assessment tool DANS has been developing for assessing datasets in TDR on how FAIR they are… Let’s move on to the next slide to see what FAIR means… 21 Sept 2017, RDA Plenary, Montreal Thanks to Peter Doorn, Ingrid Dillo, Emily Thomas and Eleftheria Tsoupra
2
…based on the FAIR Data Principles
During the 2014 workshop “Designing a data FAIRport” for the life sciences in Leiden a minimal set of community-agreed guiding principles were formulated. The FAIR Data Principles: Easy to find by both humans and machines based on metadata With well-defined use license and access conditions (Open Access when possible) Ready to be linked with other datasets Ready to be re-used for future research and to be processed further using computational methods and tools. FAIR refers to a set of guiding principles – designed/formulated in 2014 by a diverse set of stakeholders during a a workshop in Leiden - to make data Findable, Accessible, Interoperable and Re-usable, which means… The intent was (and still is) that the principles may act as a guidance for those wishing to enhance the reusability of their data holdings. The FAIR Data Principles put specific emphasis on enhancing the ability of machines to automatically find and re-use the data, in addition to supporting its reuse by individuals.
3
A 5 Star Scale Specification
Findable (defined by metadata (PID included) and documentation) 1. No URI or PID nor metadata/documentation 2. PID without or with insufficient metadata 3. Sufficient/limited metadata without PID 4. PID with sufficient metadata 5. Extensive metadata and rich additional documentation available Accessible (defined by presence of user license) 1. Metadata nor data are accessible 2. Metadata are accessible but data is not accessible (no clear terms of reuse in license) 3. User restrictions apply (i.e. privacy, commercial interests, embargo period) 4. Public access (after registration) 5. Open access unrestricted Interoperable (defined by data format) 1. Proprietary (privately owned), non-open format data 2. Proprietary format, accepted by Certified Trustworthy Data Repository 3. Non-proprietary, open format = ‘preferred format’ 4. As well as in the preferred format, data is standardised using a standard vocabulary format (for the research field to which the data pertain) 5. Data additionally linked to other data to provide context Badge scheme F A I R DANS has been working on an operationalization of the FAIR Data Principles that could be used for assessing datasets using a star rating system. At this point we have outlined 5 criterion levels for the principles Findable, Accessible and Interoperable, and each criterion level represents a star level in the FAIR profile. We have decided to use R as an average of the score of F, A and I principles resulting in an overall FAIRness or Re-usability score. Why? Because we felt R isn’t a separate dimension and also that scoring highly on the F, A and I principles in turn makes the Reusable nature of the data higher. For example, if a dataset is well findable, accessible with some restrictions, and has low interoperability, it may receive the following FAIR profile resulting in moderate reusability -> ’badge scheme’
4
Towards a FAIR data assessment tool
A first pilot version (demonstrator) of the tool based on the operationalized FAIR Data Principles has been developed in Survey Monkey.The tool runs a series of questions (maximum of 5 per principle) to determine the star rating per principle. The assessment tool is accompanied by a guidance document (work in progress) that provides examples and additional explanations to each question in the tool so that people know how to answer. At the end of the assessment, the tool will display the star score of each principle and will also calculate and display the overall ‘R’ FAIRness score. For this we will integrate another tool (Parthenos wizard) since SM can’t achieve what we want… Although the current tool is quite basic so has been created using Survey Monkey, we plan to use another medium in order to be able to display the scores and generate the badge and the FAIR report in last page of the assessment. Link to the prototype!
5
Testing the prototype (Data Repositories)
Name of Repository Number of Datasets Number of Reviewers Total number of reviews Virginia Tech 5 1 Mendeley Data 10 3 (for 8 datasets) 2 (for 2 datasets) 28 Dryad 9 3 (for 2 datasets) 2 (for 3 datasets) 16 CCDC 11 ? (no names) 2 (for 1 dataset) 12 A month ago we had a pilot testing the prototype FAIRdat tool with 4 data repositories: VirginiaTech(USA), MendeleyData(NL), Dryad(USA) and CCDC(UK), in order to see if the questionnaire design is something that would be easy to use and effective. We asked reviewers to assess multiple datasets from different domains, next to that we also had different reviewers assessing the same datasets.
6
Data Repository testing results
Some variances in FAIR scores across multiple reviewers of the same dataset due to: misunderstandings on what was being asked (more elaborate help text needed) Valuation difficulties eg.: sufficient metadata, sustainability of multi-file datasets, licenses, linked data Concerns that sensitive data will never be able to get a high FAIR rating even if all its metadata is available and machine-readable According to the results there were some variances in the FAIR scores because of sometimes miss-interpreting what was asked or because of some difficulties participants encountered in assessing the extent of metadata (sufficient/rich)[since metadata may depend on the repository and/or the scientific domain], or the sustainability of multi-file datasets (preferred vs. accepted file formats), or identifying and assessing the dataset’s access conditions etc. Also there was concern over the fact that sensitive data/restricted datasets will never be able to score highly even if all its metadata is available and machine readable or even can be available under requested permission is granted by the data holder. So we probably need to find a path for those datasets too! (Despite these challenges all the repositories are willing to participate in a second round of testing once adjustments and improvements are made.)
7
Prototype Testing (Open Science FAIR)
Feedback from 17 participants Simple/Easy to use questionnaire Well-documented Useful Very specific questions Oversimplified questionnaire structure Subjective indicator Some requirements based on Reusability may be missing from the current operationalization Yes/no answers instead of scale Furthermore, 2 weeks ago the pilot version of the tool was tested by participants of a workshop which was part of the Open Science FAIR conference in Athens. This time we gathered feedback by a diverse group of people (mainly data publishers and curators) using a set of questions such as ‘What was best?’, ‘What was the main obstacle?’, etc. Pros-> According to most of the participants the tool is simple and easy to use, well documented and useful. Cons-> While for some others the questionnaire structure appeared to be oversimplified and they suggested to add more questions…A couple of participants think that treating R as the average might mean that some requirements are missing, while other worry that the result will be just a subjective indicator.
8
Next Steps and Challenges
Explore possibilities of: operationalization of the R different routing of questions for single and multi file datasets. Further cooperation and liaison with the other FAIR initiatives out there changing the rating system to ensure data sets that are not open because of privacy/ restricted access but do have open meta data and license agreement can get a higher rating. different views/ guidance for the 3 designated users: depositor – for self-assessment data expert at the repository – quality, fitness of reuse of the data Data re-users - quality, fitness of reuse of the data Operationalization of the R: good solution on the one hand, on the other hand we would like criteria for these same as for the F, A and I. -> system is additive now, but could be revised to. If you answer positive for a question you get 1 star (max of 5 stars per letter) Difference between single and multi datasets -> have different routing of questions. Go FAIR is also working on a metrics. Have a cooperation going with this group. The tool is a proof of concept/ demonstrator not a service. We are very open to the other initiatives working on this and suggestions on how to improve the tool Access: Data sets that are not open because of privacy (restricted access) can never get 5 stars; this is because Open Data is separate from FAIR. You should get higher stars if the meta data and the licence agreement is there. For future development, more reviews per data set should be possible, and show an average. Like in Amazon, booking.com etc. 3 groups that will use the tool have the tool ask for the role when you start: depositor – for self-assessment (Test, outcome will not be preserved or count), data expert at the repository – FAIRness (Quality, Fitness of reuse) of the data (done upon ingest, could be used to advise the depositor on improving the dataset), users - FAIRness (Quality, Fitness of reuse) of the data (could have different helptext in laymen terms). We would like to position the tool as a stepping stone in a larger framework of FAIR initiatives.
9
Longer term envisioned developments
Automate the answers to some of the question especially the ones related to the trusted digital repository certification enhancing the objectivity of the scoring Enable more reviews per data set and show the multiple reviews including an average Automation of filling some questions will be worked on in a later stage, this could work towards the objectivity of the score.
10
Questions?? Link to the prototype:
@dansknaw Questions?? Link to the prototype:
11
Picture references: Sangya Pundir, FAIR Guiding Principles for Data Resources, [image], 17 Nov Retrieved from Orange scale, [image]. Retrieved from FAIR Data, [image]. Retrieved from Cartoon about overcoming challenges, [image]. Retrieved from -Feedback icons, [image]. Retrieved from
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.