DSA and FAIR: a perfect couple Rob Hooft (DTL, ELIXIR NL), Peter Doorn and Ingrid Dillo (DANS) SciDataCon, session Data Fitness for Reuse Denver, 12 September 2016
DANS and the Data Seal of Approval 2005: DANS to promote and provide permanent access to digital research information Formulate quality guidelines for digital repositories including DANS (TRAC, Nestor) 2006: 5 basic principles as basis for 16 DSA guidelines 2009: international DSA Board Today: over 62 seals acquired and many in progress
DSA principles The DSA is intended to ensure that: The data can be found on the internet The data are accessible (clear rights and licenses) The data are in a usable format The data are reliable The data are identified in a unique and persistent way so that they can be referred to (guidelines: metadata, ltp, integrity and authenticity, persistency, technical infrastrcuture, workflows and processes..)
FAIR principles Leiden 2014: minimal set of community agreed guiding principles to make data more easily discoverable, accessible, appropriately integrated and re-usable, and adequately citable. FAIR Principles: Findable Accessible Interoperable Re-usable (all four both for machines and for people)
Resemblance DSA Principles FAIR Principles data can be found on the internet findable data are accessible accessible data are in a usable format interoperable data are reliable reusable data can be referred to (citable) usable format (DSA) is just an aspect of interoperability (FAIR) reliability (DSA) is a condition for reuse (FAIR) FAIR explicitly addresses machine readability citability is in FAIR an aspect of findability
From the FAIR perspective: Findable DSA does Well known location on the internet (meta)Data gets persistent identifier FAIR also requires Well described what it is (not only what was done with it) Also computer readable * Especially institutional or special: catalog it! * Keywords for re-use. What is in there, not why you collected * Keywords for subsets? * Librarians/archivists can help! Go look for them! * Talk to the catalog people early to find out what they need. ==== Especially for data that is deposited to institutional repositories, it is very important that they will also be listed in a catalog: some place where researchers like yourself would be looking for existing data sets. Keywords under which your data can be found should especially be triggered by possibilities for re-use: they should indicate what is in the data and what could be done with it, it is not enough to mention why you collected the data. Subsets of your data may be usable for researchers of different fields, and it is a good idea to think of keywords and descriptions that make this kind of subsets findable by themselves. Librarians and archival specialists can often help you to find suitable keywords and make good descriptions. If cataloging is applicable to your data, you should select the catalogs that you want to list your data, and check out as early as possible what you need to do to end up in there.
From the FAIR perspective: Accessible DSA does Define protocol, require reliability FAIR also requires Standardized authentication where needed Metadata are kept even if data are deleted Also computer readable * Especially institutional or special: catalog it! * Keywords for re-use. What is in there, not why you collected * Keywords for subsets? * Librarians/archivists can help! Go look for them! * Talk to the catalog people early to find out what they need. ==== Especially for data that is deposited to institutional repositories, it is very important that they will also be listed in a catalog: some place where researchers like yourself would be looking for existing data sets. Keywords under which your data can be found should especially be triggered by possibilities for re-use: they should indicate what is in the data and what could be done with it, it is not enough to mention why you collected the data. Subsets of your data may be usable for researchers of different fields, and it is a good idea to think of keywords and descriptions that make this kind of subsets findable by themselves. Librarians and archival specialists can often help you to find suitable keywords and make good descriptions. If cataloging is applicable to your data, you should select the catalogs that you want to list your data, and check out as early as possible what you need to do to end up in there.
From the FAIR perspective: Interoperable DSA does Require usable format FAIR also requires Standardized vocabulary (mapped?) FAIR vocabularies Also computer readable * Especially institutional or special: catalog it! * Keywords for re-use. What is in there, not why you collected * Keywords for subsets? * Librarians/archivists can help! Go look for them! * Talk to the catalog people early to find out what they need. ==== Especially for data that is deposited to institutional repositories, it is very important that they will also be listed in a catalog: some place where researchers like yourself would be looking for existing data sets. Keywords under which your data can be found should especially be triggered by possibilities for re-use: they should indicate what is in the data and what could be done with it, it is not enough to mention why you collected the data. Subsets of your data may be usable for researchers of different fields, and it is a good idea to think of keywords and descriptions that make this kind of subsets findable by themselves. Librarians and archival specialists can often help you to find suitable keywords and make good descriptions. If cataloging is applicable to your data, you should select the catalogs that you want to list your data, and check out as early as possible what you need to do to end up in there.
From the FAIR perspective: Reusable DSA does Require license FAIR also requires Rich metadata Rich provenance Adherence to community standards Also computer readable * Especially institutional or special: catalog it! * Keywords for re-use. What is in there, not why you collected * Keywords for subsets? * Librarians/archivists can help! Go look for them! * Talk to the catalog people early to find out what they need. ==== Especially for data that is deposited to institutional repositories, it is very important that they will also be listed in a catalog: some place where researchers like yourself would be looking for existing data sets. Keywords under which your data can be found should especially be triggered by possibilities for re-use: they should indicate what is in the data and what could be done with it, it is not enough to mention why you collected the data. Subsets of your data may be usable for researchers of different fields, and it is a good idea to think of keywords and descriptions that make this kind of subsets findable by themselves. Librarians and archival specialists can often help you to find suitable keywords and make good descriptions. If cataloging is applicable to your data, you should select the catalogs that you want to list your data, and check out as early as possible what you need to do to end up in there.
Data must be stored in a DSA certified repository in order to become FAIR * Especially institutional or special: catalog it! * Keywords for re-use. What is in there, not why you collected * Keywords for subsets? * Librarians/archivists can help! Go look for them! * Talk to the catalog people early to find out what they need. ==== Especially for data that is deposited to institutional repositories, it is very important that they will also be listed in a catalog: some place where researchers like yourself would be looking for existing data sets. Keywords under which your data can be found should especially be triggered by possibilities for re-use: they should indicate what is in the data and what could be done with it, it is not enough to mention why you collected the data. Subsets of your data may be usable for researchers of different fields, and it is a good idea to think of keywords and descriptions that make this kind of subsets findable by themselves. Librarians and archival specialists can often help you to find suitable keywords and make good descriptions. If cataloging is applicable to your data, you should select the catalogs that you want to list your data, and check out as early as possible what you need to do to end up in there.
From the DSA perspective: Combine and operationalize Growing demand for quality criteria for research datasets Combine the ideas of DSA and FAIR Focus of the principles as quality criteria: DSA – digital repositories FAIR – research data(sets) Operationalize the principles to make them easily implementable in any trustworthy digital repository
From the DSA perspective: How could it work? Each principle a separate dimension of data quality Score data on each dimension, e.g. for Findable - defined by metadata, documentation (and identifier for citation): 0 = No URI or PID and no documentation 1 = PID without or with insufficient metadata 2 = PID with limited metadata present to understand the data 3 = PID with extensive metadata and rich additional documentation available Total score of FAIRness as an indicator of data quality Scoring by humans and machines Scoring: scoring at ingest by data archivists of TDR after reuse by data users (community review)
DSA and FAIR: a perfect couple To sum up: DSA and FAIR: a perfect couple Data must be stored in a DSA certified repository in order to become FAIR DSA and FAIR together offer great possibilities for quality assessment of research data
Thank you for listening! rob.hooft@dtls.nl peter.doorn@dans.knaw.nl ingrid.dillo@dans.knaw.nl http://www.datasealofapproval.org/en/ https://rd-alliance.org/group/repository-audit-and-certification-dsa–wds- partnership-wg/outcomes/dsa-wds-partership http://www.nature.com/articles/sdata201618