The WDS/RDA Assessment of Data Fitness for Use Working Group

Slides:



Advertisements
Similar presentations
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
Advertisements

Software Quality Assurance For Software Engineering && Architecture and Design.
ICSU World Data System - trusted data services for global science Michael Diepenbroek, Vice-Chair WDS-SC.
Repository Requirements and Assessment August 1, 2013 Data Curation Course.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA 6 th Plenary Paris, Sept. 25, 2015 Gary Berg-Cross, Raphael Ritz Co-Chairs.
4 April 2007METIS Work Session1 Metadata Standards and Their Support of Data Management Needs Daniel W. Gillman Bureau of Labor Statistics Paul Johanis.
Software Quality Assurance SE Software Quality Assurance What is “quality”?
Repository Audit and Certification DSA–WDS Partnership WG RDA Working Groups Meeting at NIST November 13-14, 2014.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
CLARIN work packages. Conference Place yyyy-mm-dd
Research Information Management: Continuity, Change and Impact Michael Jubb Research Information Network UUK Workshop 5 December 2007.
Barry Weiss 1/4/ Jet Propulsion Laboratory, California Institute of Technology Quality Elements in ISO Metadata Design for Proposed SMAP Data.
NEFIS (WP5) Evaluation Meeting, November 2004 Evaluation Metadata Aljoscha Requardt, University of Hamburg Response rate: 93% (14 of 15 partners.
Discussion of Data Fabric Terms & Preparation for RDA P7 Virtual Meeting Monday, January 25, 2016 Organized by Gary Berg-Cross (DFT-IG) and Peter Wittenburg.
Santi Thompson - Metadata Coordinator Annie Wu - Head, Metadata and Bibliographic Services 2013 TCDL Conference Austin, TX.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
GEO Data Management Principles Implementation : World Data System–Data Seal of Approval (WDS-DSA) Core Certification of Digital Repositories Dr Mustapha.
by: Er. Manu Bansal Deptt of IT Software Quality Assurance.
Metadata Workflows. Metadata Specialist Scenario The typical digital library development situation facing the metadata specialist: –We have some functional.
SciDataCon 2014, WDS Forum, Dehli WDS Certification Objective: building trust in the usage of data & data services Michael Diepenbroek Rorie Edmunds Mustapha.
FAIR Data in Trustworthy Data Repositories:
Data Management Plans Ron Dekker Director CESSDA.
Digital Repository Certification Schema A Pathway for Implementing the GEO Data Sharing and Data Management Principles Robert R. Downs, PhD Sr. Digital.
DSA and FAIR: a perfect couple
ELIXIR Core Data Resources and Deposition Databases
Implementing the Data Management Principles Opportunities and Advantages Robert R. Downs, PhD Sr. Digital Archivist, CIESIN, Columbia University.
Ted Klein Klein Consulting Informatics LLC
Metadata Catalogue and Knowledge Network
DMP GEO Label from self assessment to certification Joan Masó
Certification of Trusted Repositories
Building A Repository for Digital Objects
The RESEARCH DATA ALLIANCE Maturity Model Approach WG: Repository Audit and Certification Wim Hugo – ICSU-WDS/ SAEON.
Susanna-Assunta Sansone, Rebecca Lawrence and Simon Hodson
AIM Operational Concept
Fitness for use: Users of the U. S
knowledge organization for a food secure world
Toward FAIR Semantic Resources
Institutional role in supporting open access, open science, open data
Introduction Helena Cousijn, Claire Austin & Michael Diepenbroek
C2CAMP (A Working Title)
EU R&D in cybersecurity's certification
Sophia Lafferty-hess | research data manager
Digital Curation Centre
Open Access to your Research Papers and Data
Data Stewardship Interest Group WGISS-45 Meeting
EOSCpilot Skills Landscape & Framework
EOSCpilot All Hands Meeting 9 March 2018, Pisa
WG/IG Collaboration Meeting June Göteborg METADATA GROUPS PERSPECTIVE Keith G Jeffery & Rebecca Koskela.
From Observational Data to Information (OD2I IG )
An EUDAT-based FAIR Data Approach for Data Interoperability
Research Data Alliance (RDA) 9th WG/IG Collaboration Meeting: Repository Platforms for Research Data (RPRD) Interest Group 13nd June 2018 Co-Chairs:
Repository Platforms for Research Data Interest Group: Requirements, Gaps, Capabilities, and Progress Robert R. Downs1, 1 NASA.
WDS/RDA Assessment of Data Fitness for Use Claire Austin, Helena Cousijn, Michael Diepenbroek, Jon Petters.
A Case Study for Synergistically Implementing the Management of Open Data Robert R. Downs NASA Socioeconomic Data and Applications.
EOSCpilot All Hands Meeting 9 March 2018, Pisa
Bird of Feather Session
Automatic evaluation of fairness
eScience - FAIR Science
A modest attempt at measuring and communicating about quality
A Brief Update on the Activity of the RDA FAIR Data Maturity Model Working Group – An action item from WGISS-46 Ge Peng North Carolina State University,
Research data lifecycle²
It’s all about people Data-related training experiences from EUDAT, OpenAIRE, DANS Marjan Grootveld, DANS EDISON workshop, 29 August 2017.
Helena Cousijn, Claire Austin, Jonathan Petters & Michael Diepenbroek
One Step Forward, Two Steps Back:
Persistent identifiers for instruments (PIDINST) working group
Supporting Open Research
One Step Forward, Two Steps Back:
Research Data Dr Aoife Coffey, Research Data Coordinator
Introduction to reference metadata and quality reporting
Cultivating Semantics for Data in Agriculture and Nutrition
Presentation transcript:

The WDS/RDA Assessment of Data Fitness for Use Working Group Jonathan Petters (Virginia Tech) Marina Soares e Silva (Elsevier) Claire Austin (Department of the Environment, Government of Canada) Michael Diepenbroek (PANGAEA) RDA 12 - November 2018

Shared Google Doc notes https://tinyurl.com/DataFitnessForUse RDA 12 - November 2018

Problem: I have the data but can’t use it I have found data in a domain/generic repository that I can access but… I can’t be sure it’s complete The metadata contains conflicting information I am having issues with the format … and I just wasted 6 hours of my time figuring out I can’t use it! RDA 12 - November 2018

Problem: I have the data but can’t use it Provider gives access to a dataset which is FAIRly deposited by creator BUT The same dataset might not be fit for the data user! RDA 12 - November 2018

Problem: I have the data but can’t use it Provider gives access to a dataset which is FAIRly deposited by creator BUT The same dataset might not be fit for the data user! Challenge How to make research data fit for the widest possible use? RDA 12 - November 2018

Our working group’s approach Data fitness for use Assessment of the fitness of use for individual data sets should consolidate current efforts and be thorough & comprehensive reliable & of efficient application high impact & visibility RDA 12 - November 2018

Our working group’s approach Data fitness for use RDA/WDS Working Group Specify criteria of reusability expanding on FAIR to support providers in assessing data quality Our target group: Data repositories RDA 12 - November 2018

Our working group’s approach Data fitness for use RDA/WDS Working Group Specify criteria of reusability expanding on FAIR to support providers in assessing data quality→ a checklist Our target group: Data repositories RDA 12 - November 2018

Our working group’s approach Data fitness for use RDA/WDS Working Group Specify criteria of reusability expanding on FAIR to support providers in assessing data quality → a checklist Our target group: Data repositories → for use by repository managers/external evaluator such as CoreTrustSeal RDA 12 - November 2018

Our working group’s approach Data fitness for use RDA/WDS Working Group Specify criteria of reusability expanding on FAIR to support providers in assessing data quality → a checklist + rating system! Our target group: Data repositories → for use by repository managers/external evaluator such as CoreTrustSeal RDA 12 - November 2018

Criteria to assess data fitness for use Categories Metadata completeness (R) Accessibility (A) Data completeness and correctness (R) Findability & interoperability (F, I) Curation (leading to FAIRness) → Expanding on reusability of FAIR RDA 12 - November 2018

Assessing data fitness for use (data correctness) Repository hosts weather observation data in a spreadsheet Spreadsheet is findable, accessible But is it fit for use? RDA 12 - November 2018

Assessing data fitness for use (data correctness) RDA 12 - November 2018

Assessing data fitness for use (data correctness) RDA 12 - November 2018

Initial Feedback on Checklist - ICPSR RDA 12 - November 2018

Initial Feedback on Checklist - ICPSR How to evaluate level of curation for dataset? RDA 12 - November 2018

Initial Feedback on Checklist - ICPSR How to evaluate level of curation for dataset? Through standard curation procedures for repository RDA 12 - November 2018

Initial Feedback on Checklist - ICPSR How to evaluate level of curation for dataset? Through standard curation procedures for repository Some questions are general, making it hard to evaluate RDA 12 - November 2018

Initial Feedback on Checklist - ICPSR How to evaluate level of curation for dataset? Through standard curation procedures for repository Some questions are general, making it hard to evaluate Might envision multiple reviewers ala CoreTrustSeal certification RDA 12 - November 2018

Initial Feedback on Checklist - ICPSR How to evaluate level of curation for dataset? Through standard curation procedures for repository Some questions are general, making it hard to evaluate Might envision multiple reviewers ala CoreTrustSeal certification Evaluation (and time to evaluate) dataset properties will vary with heterogeneity of dataset – how to address? RDA 12 - November 2018

Challenges Volunteer effort Inherent to our approach Level of expertise of repository manager matters How do repository managers currently evaluate data fitness? Sample size might influence result of assessment Manual labor RDA 12 - November 2018

Challenges Rating system How to weigh criteria to determine How to implement: potential automation Resources to implement RDA 12 - November 2018

Outlook Implementation of rating system Maybe (semi) automation of assessment → refer https://fairshake.cloud/ as an example of something that could work for semi automated assessment (users evaluate datasets) Draft article for peer-reviewed journal RDA 12 - November 2018

Outlook Roll work into new RDA working groups? Proposed RDA WG on a FAIR Data Maturity Model Propose new RDA WG for automating data quality for verification - coordinating role? Rolloff new WG from Domain Repositories IG on data/metadata standards in communities Collaboration with other groups such as GOFAIR? Get involved!!! https://www.rd-alliance.org/groups/assessment-data-fitness-use RDA 12 - November 2018

Next in this session Luiz Bonino (GOFAIR – automation approach Wade Bishop (Univ. Tennessee, USA – data provider perspective) Discussion time RDA 12 - November 2018

The WDS/RDA Assessment of Data Fitness for Use Working Group Jonathan Petters - jpetters@vt.edu Marina Soares e Silva - m.soaresesilva@elsevier.com data-fitness@rda-groups.org RDA 12 - November 2018

Working group wrapping up… Where do we go from here? Do we have the interest and resources to continue checklist development?

Challenges Efforts on volunteer basis - here was the plan 2017-08 Terminology & Definition of Terms 2017-12 - Pilot assessment of criteria 2018-02 - Development/design of badge system and integration with current certification schemes 2018-08 - Concept for integration of data repository service components. Piloting Integration of badge system RDA 12 - November 2018

Challenges What has been accomplished Terminology for data fitness Creation and comparisons of data fitness criteria (spreadsheet) Data fitness for use checklist (Google Form) Minimal testing RDA 12 - November 2018

Presented at Domain Repositories IG Heterogeneity of datasets leads to difficulty in evaluating datasets with domain expertise (not just a time sink) Sampling 6 to 12 datasets is not representative for a repository with 40,000 datasets Should we expect the same level of curation for all datasets? Not all have the same perceived value For some repositories, use analytics for datasets are available and should be used Need for agreement on data/metadata standards within communities – could roll out of this work 

Annex

Intensive work on data quality There are many initiatives to define data standards Data publishing and repository certification Principles of data FAIRness There are various approaches to assess quality Ratings vs. good practices Data can be assessed by different entities Independent certification bodies Data curators (e.g in repositories) Users (e.g. through social media) F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads RDA 12 - November 2018

Implementing FAIR principles Requirements to create new data Assessing existing data Transformation tools to make data FAIR (Go-FAIR initiative) F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads nestor - network of expertise in long-term storage of digital resources in Germany ( like DANS… but then German)

Implementing FAIR principles Requirements to create new data Assessing existing data Transformation tools to make data FAIR (Go-FAIR initiative) Certification Reviewer Data center/repository Curation Downloads, social tagging Users F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads

Assessing data fitness for use (data correctness) RDA 12 - November 2018

Assessing data fitness for use (data correctness) This spreadsheet could be seen as one where the content of the dataset seemingly agrees nicely with the dataset content RDA 12 - November 2018

Before getting started: who is who Initiatives Data reusability: why? Data Fitness for use Defining fitness for use Assessing data fitness for use Outlook

Many approaches to assess data quality Also: Open data institute (UK) Centre for open science (US) … BUT these do not define good practice They certify that a particular practice was followed. Portable Document format XLS - Binary Interchange File Format comma-separated values Resource Description Framework standard model for data interchange on the web Linked Open Data Open Data rating by Tim Berners- Lee https://5stardata.info/en/

Defining fitness for use Glossaries Data Quality Vocabulary, W3C Working Group Note Science Europe Data Glossary RDA Term Definition Tool (TeD-T) Standard Glossary for Research Data Management (IRiDiuM) Data quality “degree to which a set of characteristics of data fulfills requirements” (ISO 9000) Any data are usable as long as they fit the purpose Assessment of usability implies definition of requirements

Initiatives RDA/WDS Data Publishing Workflows WG Certification of data centers/repositories GEO label facets (DMP) FAIR principles ICSU international council for science DMP Data Management Principles GEO group on Earth observations

Data Reusability: why? https://www.nature.com/articles/sdata201618 BECAUSE The elements of the FAIR Principles are related, but independent and separable

Data Fitness for Use and FAIR Challenge Data provider & user not necessarily aligned Define fitness for use How/Who Provider User Provider User