Presentation is loading. Please wait.

Presentation is loading. Please wait.

The WDS/RDA Assessment of Data Fitness for Use Working Group

Similar presentations


Presentation on theme: "The WDS/RDA Assessment of Data Fitness for Use Working Group"— Presentation transcript:

1 The WDS/RDA Assessment of Data Fitness for Use Working Group
Jonathan Petters (Virginia Tech) Marina Soares e Silva (Elsevier) Claire Austin (Department of the Environment, Government of Canada) Michael Diepenbroek (PANGAEA) RDA 12 - November 2018

2 Shared Google Doc notes
RDA 12 - November 2018

3 Problem: I have the data but can’t use it
I have found data in a domain/generic repository that I can access but… I can’t be sure it’s complete The metadata contains conflicting information I am having issues with the format … and I just wasted 6 hours of my time figuring out I can’t use it! RDA 12 - November 2018

4 Problem: I have the data but can’t use it
Provider gives access to a dataset which is FAIRly deposited by creator BUT The same dataset might not be fit for the data user! RDA 12 - November 2018

5 Problem: I have the data but can’t use it
Provider gives access to a dataset which is FAIRly deposited by creator BUT The same dataset might not be fit for the data user! Challenge How to make research data fit for the widest possible use? RDA 12 - November 2018

6 Our working group’s approach
Data fitness for use Assessment of the fitness of use for individual data sets should consolidate current efforts and be thorough & comprehensive reliable & of efficient application high impact & visibility RDA 12 - November 2018

7 Our working group’s approach
Data fitness for use RDA/WDS Working Group Specify criteria of reusability expanding on FAIR to support providers in assessing data quality Our target group: Data repositories RDA 12 - November 2018

8 Our working group’s approach
Data fitness for use RDA/WDS Working Group Specify criteria of reusability expanding on FAIR to support providers in assessing data quality→ a checklist Our target group: Data repositories RDA 12 - November 2018

9 Our working group’s approach
Data fitness for use RDA/WDS Working Group Specify criteria of reusability expanding on FAIR to support providers in assessing data quality → a checklist Our target group: Data repositories → for use by repository managers/external evaluator such as CoreTrustSeal RDA 12 - November 2018

10 Our working group’s approach
Data fitness for use RDA/WDS Working Group Specify criteria of reusability expanding on FAIR to support providers in assessing data quality → a checklist + rating system! Our target group: Data repositories → for use by repository managers/external evaluator such as CoreTrustSeal RDA 12 - November 2018

11

12 Criteria to assess data fitness for use
Categories Metadata completeness (R) Accessibility (A) Data completeness and correctness (R) Findability & interoperability (F, I) Curation (leading to FAIRness) → Expanding on reusability of FAIR RDA 12 - November 2018

13 Assessing data fitness for use (data correctness)
Repository hosts weather observation data in a spreadsheet Spreadsheet is findable, accessible But is it fit for use? RDA 12 - November 2018

14 Assessing data fitness for use (data correctness)
RDA 12 - November 2018

15 Assessing data fitness for use (data correctness)
RDA 12 - November 2018

16 Initial Feedback on Checklist - ICPSR
RDA 12 - November 2018

17 Initial Feedback on Checklist - ICPSR
How to evaluate level of curation for dataset? RDA 12 - November 2018

18 Initial Feedback on Checklist - ICPSR
How to evaluate level of curation for dataset? Through standard curation procedures for repository RDA 12 - November 2018

19 Initial Feedback on Checklist - ICPSR
How to evaluate level of curation for dataset? Through standard curation procedures for repository Some questions are general, making it hard to evaluate RDA 12 - November 2018

20 Initial Feedback on Checklist - ICPSR
How to evaluate level of curation for dataset? Through standard curation procedures for repository Some questions are general, making it hard to evaluate Might envision multiple reviewers ala CoreTrustSeal certification RDA 12 - November 2018

21 Initial Feedback on Checklist - ICPSR
How to evaluate level of curation for dataset? Through standard curation procedures for repository Some questions are general, making it hard to evaluate Might envision multiple reviewers ala CoreTrustSeal certification Evaluation (and time to evaluate) dataset properties will vary with heterogeneity of dataset – how to address? RDA 12 - November 2018

22 Challenges Volunteer effort Inherent to our approach
Level of expertise of repository manager matters How do repository managers currently evaluate data fitness? Sample size might influence result of assessment Manual labor RDA 12 - November 2018

23 Challenges Rating system How to weigh criteria to determine
How to implement: potential automation Resources to implement RDA 12 - November 2018

24 Outlook Implementation of rating system
Maybe (semi) automation of assessment → refer as an example of something that could work for semi automated assessment (users evaluate datasets) Draft article for peer-reviewed journal RDA 12 - November 2018

25 Outlook Roll work into new RDA working groups?
Proposed RDA WG on a FAIR Data Maturity Model Propose new RDA WG for automating data quality for verification - coordinating role? Rolloff new WG from Domain Repositories IG on data/metadata standards in communities Collaboration with other groups such as GOFAIR? Get involved!!! RDA 12 - November 2018

26 Next in this session Luiz Bonino (GOFAIR – automation approach
Wade Bishop (Univ. Tennessee, USA – data provider perspective) Discussion time RDA 12 - November 2018

27 The WDS/RDA Assessment of Data Fitness for Use Working Group
Jonathan Petters - Marina Soares e Silva - RDA 12 - November 2018

28 Working group wrapping up…
Where do we go from here? Do we have the interest and resources to continue checklist development?

29 Challenges Efforts on volunteer basis - here was the plan
Terminology & Definition of Terms Pilot assessment of criteria Development/design of badge system and integration with current certification schemes Concept for integration of data repository service components. Piloting Integration of badge system RDA 12 - November 2018

30 Challenges What has been accomplished Terminology for data fitness
Creation and comparisons of data fitness criteria (spreadsheet) Data fitness for use checklist (Google Form) Minimal testing RDA 12 - November 2018

31 Presented at Domain Repositories IG
Heterogeneity of datasets leads to difficulty in evaluating datasets with domain expertise (not just a time sink) Sampling 6 to 12 datasets is not representative for a repository with 40,000 datasets Should we expect the same level of curation for all datasets? Not all have the same perceived value For some repositories, use analytics for datasets are available and should be used Need for agreement on data/metadata standards within communities – could roll out of this work 

32 Annex

33 Intensive work on data quality
There are many initiatives to define data standards Data publishing and repository certification Principles of data FAIRness There are various approaches to assess quality Ratings vs. good practices Data can be assessed by different entities Independent certification bodies Data curators (e.g in repositories) Users (e.g. through social media) F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads RDA 12 - November 2018

34 Implementing FAIR principles
Requirements to create new data Assessing existing data Transformation tools to make data FAIR (Go-FAIR initiative) F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads nestor - network of expertise in long-term storage of digital resources in Germany ( like DANS… but then German)

35 Implementing FAIR principles
Requirements to create new data Assessing existing data Transformation tools to make data FAIR (Go-FAIR initiative) Certification Reviewer Data center/repository Curation Downloads, social tagging Users F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads

36 Assessing data fitness for use (data correctness)
RDA 12 - November 2018

37 Assessing data fitness for use (data correctness)
This spreadsheet could be seen as one where the content of the dataset seemingly agrees nicely with the dataset content RDA 12 - November 2018

38 Before getting started: who is who
Initiatives Data reusability: why? Data Fitness for use Defining fitness for use Assessing data fitness for use Outlook

39 Many approaches to assess data quality
Also: Open data institute (UK) Centre for open science (US) … BUT these do not define good practice They certify that a particular practice was followed. Portable Document format XLS - Binary Interchange File Format comma-separated values Resource Description Framework standard model for data interchange on the web Linked Open Data Open Data rating by Tim Berners- Lee

40 Defining fitness for use
Glossaries Data Quality Vocabulary, W3C Working Group Note Science Europe Data Glossary RDA Term Definition Tool (TeD-T) Standard Glossary for Research Data Management (IRiDiuM) Data quality “degree to which a set of characteristics of data fulfills requirements” (ISO 9000) Any data are usable as long as they fit the purpose Assessment of usability implies definition of requirements

41 Initiatives RDA/WDS Data Publishing Workflows WG
Certification of data centers/repositories GEO label facets (DMP) FAIR principles ICSU international council for science DMP Data Management Principles GEO group on Earth observations

42 Data Reusability: why? https://www.nature.com/articles/sdata201618
BECAUSE The elements of the FAIR Principles are related, but independent and separable

43 Data Fitness for Use and FAIR
Challenge Data provider & user not necessarily aligned Define fitness for use How/Who Provider User Provider User


Download ppt "The WDS/RDA Assessment of Data Fitness for Use Working Group"

Similar presentations


Ads by Google