Helena Cousijn, Claire Austin, Jonathan Petters & Michael Diepenbroek WDS/RDA Publishing Data IG WDS/RDA Certification of Digital Repositories IG Assessment of Data Fitness for Use Introduction Helena Cousijn, Claire Austin, Jonathan Petters & Michael Diepenbroek
Initiatives F A I R 5 ★ Open Data (Tim Berners Lee) FAIR principles A design framework & exemplar metrics for FAIRness GEO label facets ESIP Information Quality Cluster Enabling FAIR Data Across the Earth & Space Sciences Certification of data centers/repositories F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads
Criteria I Inherent properties Non-inherent properties objectively verifiable or even measurable e.g. validity of used methodologies, completeness of metadata Non-inherent properties ~subjective descriptions assigned to data e.g. social tagging, downloads as indicator to data quality
Criteria II properties directly related to data objects E.g. PIDs, citation, precision of data values properties related to data findability & accessibility E.g. quality of services for data discovery and interoperability properties characterizing data management processes E.g. curational workflows, tools, human resources! Not transparent to users
Metrics Dimensions should be independant Evaluation and ranking should be practical Automatic versus manual evaluation Direct versus indirect evaluation (proxies)
Agenda Data Fitness for Use as part of the CoreTrustSeal: Mustapha Mokrane (ICSU WDS) – Chair of the CoreTrustSeal Board Assessing FAIRness within the Enabling FAIR Data project: Shelley Stall - Director of the AGU Data Program A design framework and exemplar metrics for FAIRness: Peter Doorn – Director of Data Archiving and Networked Services (DANS) Proposed criteria Data Fitness for Use WG: Michael Diepenbroek (PANGAEA) – Co-Chair of the Data Fitness for Use WG Discussion on governance (30 minutes)
Dimensions & evidence required Completeness & Quality of Content Evidence: Metadata & data Findability Evidence: Services exposed, metadata & data Accessibility Evidence: Services exposed, metadata, documentation of system & services Interoperability Evidence: Services exposed, metadata Curation Evidence: Services exposed, metadata & data, documentation of system & services
Completeness & Quality of Content Evidence: metadata Metadata completeness Citation (authorship, year, comprehensive title, PID) Content description (listing of measurement & obvervation types incl. used methods) Coverage (spatial, temporal) Provenance authorship (PIs, institutions, labs) data collection/generation (sampling events, processing steps, experimental setup) references to related work (literature) Terms of usage: licenses, other conditions, protection (ethical issues) Persistent identifier (for the data set, others for literature, authors, projects, terms etc.) Metadata adequate to science domain (domain expertise needed)
Completeness & Quality of Content Evidence: metadata & data Data completeness difficult to evaluate. Minimum: content description should match data content (for data matrices comparison of column headers with content description)
Completeness & Quality of Content Evidence: metadata & data Metadata & data correctness Content description matches data content Validity of used methods (needs domain expertise)
Completeness & Quality of Content Evidence: metadata & data Machine readibility of data & metadata data & metadata consistently structured (consistent, standard formatting) data & metadata harmonized (consistent use of metadata elements (possibly needs complementary information from other requirements, e.g. usage of RDBMS & standard terminologies (ontologies), type of curation)
Completeness & Quality of Content Evidence: metadata & data Machine readibility of data & metadata data & metadata consistently structured (consistent, standard formatting) data & metadata harmonized (consistent use of metadata elements (possibly needs complementary information from other requirements, e.g. usage of RDBMS & standard terminologies (ontologies), type of curation)
All other dimensions Do supplied services match results from certification? Findability: Sufficient discovery metadata - usually metadata are enriched with related terms, e.g. using terminologies/ontologies. Such terms might not be visible in the metadata or data Curation: Curation level claimed by the repository matches the completeness, correctness, structuring, and harmonization of metadata & data
Service