Download presentation
Presentation is loading. Please wait.
Published byPer-Erik Bergqvist Modified over 5 years ago
1
Assessing FAIRness within the Enabling FAIR Data project WG WDS/RDA Assessment of Data Fitness for Use RDA 11th Plenary 22 March 2018 Shelley Stall, AGU Director, Data Programs @ShelleyStall Short introduction describing the scope of the group and if any previous activities The increasing availability of research data and its evolving role as a first class scientific output in the scholarly communication requires a better understanding of and the possibility to assess data quality, which in turn can be described as conformance of data properties to data usability or fitness for use. These properties are multifaceted and cover various aspects related to data objects, access services, and data management processes such as the level of annotation, curation, peer review, and citability or machine readability of datasets. Moreover, the compliance of a data repository or data center providing datasets - for example with certification requirements - could serve as a useful proxy. Firstly, a concept of data fitness requires assessment of quality criteria to be included as well as the weighing of each of those criteria. The process should preferably lead to the development of a corresponding metric. Secondly, we want to find effective ways to expose and communicate this metric, for e.g. by using a labelling or tagging system whereby different usability levels are made explicit. Additional links to informative material related to the group Group page: CoreTrustSeal page: A design framework and exemplar metrics for FAIRness: Enablin FAIR Data: Meeting objectives Through several presentations and a discussion including a wider audience, we aim to establish the best possible approaches for the assessment of data fitness for use. Objectives are: - To inform about the various possible approaches for the assessment of data fitness for use. - To compare and evaluate possible approaches with respect to later adoption and application by the different stakeholders, namely data repositories, science publishers, and data users. Meeting agenda The session will be introduced with an overview on the progress of the group, in particular presenting the consolidated list of criteria characterizing fitness for use. The following presentations will outline various approaches for the evaluation of fitness for use of individual data sets. We will end with a discussion on the different possibilities for governance. Presentations (60 min) Data Fitness for Use as part of the CoreTrustSeal: Mustapha Mokrane – Chair of the CoreTrustSeal Board Assessing FAIRness within the Enabling FAIR Data project: Shelley Stall - Director of the AGU Data Program A design framework and exemplar metrics for FAIRness: Peter Doorn – Director of Data Archiving and Networked Services (DANS) Proposed criteria Data Fitness for Use WG: Michael Diepenbroek – Co-Chair of the Data Fitness for Use WG Discussion on governance (30 minutes)
2
American Geophysical Union
Galvanizes a community of earth and space scientists that collaboratively advances and communicates science and its power to ensure a sustainable future. American Geophysical Union > 60,000 members across 144 countries 20 peer-reviewed scholarly journals 100 year anniversary coming in 2019 Scientific meetings EOS.org - online and print magazine
3
AGU’s position statement on data affirms that
“Earth and space sciences data are a world heritage. Properly documented, credited, and preserved, they will help future scientists understand the Earth, planetary, and heliophysics systems.” Photo by Rick Meyers on Unsplash, Smith County I-40 Rest Area, Lancaster, United States Statement adopted by the American Geophysical Union 29 May 1997; Reaffirmed May 2001, May 2005, May 2006; Revised and Reaffirmed May 2009, February 2012 and September 2015. Photo Credit: Photo by Rick Meyers on Unsplash
4
FAIR Guiding Principles
FAIR is… Findable Accessible Interoperable Reusable Article in Nature journal Scientific Data: Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3: doi: /sdata (2016).
5
New Grant from Laura and John Arnold Foundations (LJAF)
Align publishers and repositories in following best practices to enable FAIR and open data and to create workflows so that researchers will have a simplified, common experience when submitting their paper to any leading Earth and space science journal. This will accelerate scientific discovery and enhance the integrity, transparency, and reproducibility of this data.
6
Enabling FAIR Data Project - Objectives
FAIR-compliant data repositories will add value to research data, provide metadata and landing pages for discoverability, and support researchers with documentation guidance, citation support, and data curation. FAIR-compliant Earth and space science publishers will align their policies to establish a similar experience for researchers. Data will be available through citations that resolve to repository landing pages. Data are not placed in the supplement.
7
FAIR-Compliant Repositories
Services Provided: Benefits: Metadata support: Repository, Datasets, Citation Supports Discovery, Understanding, Reuse Repository: NSF 418 project, re3data.org Dataset: Repository determined, community-driven Data/Software Citation: Roadmap, ESIP, RDA, etc. Persistent identifiers Supports Data Citation and Credit for Data and Reuse Data Citation / Landing page compliance Supports Best Practices and Common Experience for Researchers [Roadmap for Data Citation for Scientific Repositories, elements 1-6] Publication Peer Review Support Supports access for publication peer reviewers even if data isn’t public yet. Licensing policies (data and software) Supports reuse of data and software. Common list of approved FAIR-compliant repositories Supports researchers locating compliant repositories. [re3data / FairSharing] Supports publishers individually determining endorsement. CoreTrustSeal.org Certification validating that many of the elements described above are implemented correctly in the repository.
8
FAIR-Compliant Journals
Services Provided: Benefits: Common data and software citation policies and practices Supports best practices and providing a common experience for all researchers. Data are no longer placed in the Supplemental Information [Roadmap for Data Citation for Scholarly Publishers] Improves research credit for data and reuse. [Scholix] TOP Guidelines, FORCE 11 Joint Declaration of Data Citation Principles Common workflows for data citations Supports best practices and providing a common experience for all researchers. [THOR Project Outcomes – Data Linking with publication] Common expectations for publication peer review when evaluating science and determining if the data and metadata are adequate. Supports publication peer review process. Identifies the role of reviewer vs. repository when it comes to evaluating the cited data (and software).
9
Research Data Ecosystem
Other Roles: Research Labs Service providers to the ecosystem (e.g. PID providers like DataCite, github/Zenodo, CrossRef, FundRef, Scholix) Research offices -- not at institutions (e.g. Ronin) Make Data FAIR Open and Persistent Data Store ESIP – Research as Art – where can there better connections Each oval represents a critical role within the data ecosystem. The sub-bullets provide a general description as to the nature of the role. The processes that are shared between the roles are influenced by the items in the list on the right. Additionally, the shared processes can be informed and guided by the collaborations listed. Authors: Shelley Stall and Erin Robinson
10
Targeted Adoption Group (TAG) Report Outs
TAG A/D - Repository Guidance for Researchers Danie Kinkade (BCO-DMO, WHOI), Michael Witt (Purdue, re3data) TAG B - Publishers in the ESS team Anita DeWaard (Elsevier), Helena Cousijn (Elsevier), Joerg Heber (PLOS) TAG C - FAIR Resources and Training for Researchers Nancy Hoebelheinrich (Knowledge Motifs, ESIP DM Training), Jon Petters (Virginia Tech) TAG E - Data and DOI Workflows and Handoffs Trisha Cruse (DataCite), Mark Servilla (Environmental Data Initiative) TAG F - Culture Change through Credit Denise Hills (Geological Survey of Alabama), Reid Boehm (Johns Hopkins University) TAG G - Key Elements of Active DMPs Raleigh Martin (National Science Foundation) Michael - TAG A Brooks - TAG B Nancy - TAG C Denise - TAG F Stephanie/Sophie - TAG G
11
TAG A update: Repository Guidance
Process: Initial launch meeting November 17; second in-person meeting at AGU in December; seven web meetings (every other week minus holidays); draft deliverables submitted on March 6 & 7 for comment Pile of working documents: TAG A Workspace on Google Drive Products: Decision Tree (link) - can we diagram the decision points for a researcher in selecting a repository to deposit their data? Requesting comments now Informing design of data repository recommendation tool Interview Questions (link) - how are repositories currently and planning to implement the FAIR Principles? Pilot interviews with repository managers - volunteers needed Summary-level to be included in overall project report
12
TAG A: People Danie Kinkade (co-chair) Michael Witt (co-chair)
Tim Ahern Wenbo Chu Don Collins Merce Crosas Ruth Duerr (lead decision tree) Kirsten Elger Douglas Fils Kerstin Lehnert Mustapha Mokrane Mark Parsons Jonathan Petters Beth Plale Ray Plante (lead interview questions) Mohan Ramamurthy Erin Robinson Linda Rowan Shelley Stall (TAG support) Karen Stocks Mark Uhen Robert Vocke Lesley Wyborn Lynn Yarmey (TAG support) Eva Zanzerkia
13
Review Distribution Soon…
14
FAIR-Compliant Repository Inventory
To help determine where repositories are challenged to meet the Enabling FAIR Data project criteria. We want to support closing the gap and this will be informative as to how to do that. To inform updates to re3data to enrich the existing metadata elements so the community can benefit from this information. To clarify through the process of doing the inventory and conducting interviews that we understand what it means to be a FAIR repository. Lead: Ray Plante, NIST
15
Introductory What is the name of your repository and how is it accessed (URL)? What organization is considered the publisher/operator of the repository? Has your repository attained (or are you working toward) any community certifications? if yes: which ones? Do you have (or are you working toward) repository certification (e.g. CoreTrustSeal, WDS, DSA, TRAC, NESTOR, ISO 16363)? From whom do you accept data for deposit, generally?
16
Findable Criteria (1 of 2)
F1. PIDs Do you assign a persistent identifier to data products in your repository? If so, which PID type/scheme (e.g. DOI)? Do you assign more than one type? What kinds of things do you assign PIDs to? What is the granularity (or granularities) of the things you assign PIDs to? Do you assign them to individual data values or items, to individual files, to coherent collections of files, and/or multiple granularities? F2. Data described by rich metadata Do your available data products come with metadata accessible or browsable by human users? Do you attempt to capture good coverage of what are known as Dublin Core concepts? Do you attempt to capture detailed metadata that is more specifically relevant to your user community (i.e., that goes beyond Dublin Core)? What standard or community metadata schemas do you support? Do you specifically support any of the following: If you support any, do you support specific profiles of these metadata schemas? Does your metadata include geolocation information? Does it include temporal information (e.g. coverage in time)? Does your repository accept metadata that is applicable to a specific discipline (and not just generally applicable to all disciplines)? Does your repository disallow or reject metadata that is specific to a particular discipline? Does your metadata include a concept of authors? Contact points? Are these separate metadata elements? Do you capture ORCID or other PIDs for authors? If yes: which? Researcher-ID? Scopus-ID?
17
Findable Criteria (2 of 2)
F3. Metadata includes identifier Does your exportable metadata include the data’s PID (e.g., the data DOI)? Does your exportable metadata include other persistent identifiers? ORCiDs? Literature (Crossref) DOIs? Sample IGSNs? Author contributions (CRediT)? F4. Metadata is registered or indexed Does your repository provide search capabilities of its contents? Do you make your metadata searchable and/or indexable by any external systems? Which ones? Do you export your metadata through any of the following mechanisms: OAI-PMH Linked Data Platform ResourceSync Landing Page meta tags or similar embedding mechanisms Have you reviewed and ensured the existence and accuracy of the re3data record for your repository?
18
Accessible Criteria A1. Support for standard data/metadata retrieval
Do you provide a landing page accessible by resolving a PID assigned by your repository? Do you support any machine-actionable data access mechanisms in which data can be retrieved given its identifier? Which standard mechanisms do you support? Are any considered specific to you your repository? Do you support access to metadata via URLs and content-negotiation? Do you embed machine-readable metadata in your Landing Pages? Do you embed via HTML <meta> tags? Do you embed JSON-LD data? XML? Does access to any data in your repository require authentication and authorization? Which machine-actionable access mechanisms support authentication? Do you support any open standards for authentication and authorization? A2. Accessible metadata in absence of data Do you expose publicly any metadata for data with restricted access—i.e. requiring authentication and authorization to access? Do you preserve access to metadata about data products that are no longer available?
19
Interoperable Criteria
I1. Data and metadata formats and schemas What formats is metadata from your repository available in? (e.g. XML, JSON, JSON-LD) Do you support metadata export expressed in any of the following community schemas? Do you make metadata available in a schema specific to your repository? I2. Vocabularies Do you leverage any community vocabularies in your metadata? Are these vocabularies registered in a community vocabulary registry? (e.g. Linked Open lov.okfn.org, etc.)? I3. Qualified references Do you support links between data in your repository and other data, either within your repository or external to it? Do you share literature links with scholix.org (either directly or indirectly)? Do you include references to literature in metadata submitted to DataCite?
20
Reusable Criteria R1. Reuse-relevant attributes
Do you allow authors to provide domain-specific metadata? Do you disallow them? Do you allow authors to provide documentation that aids in proper use of the data? Do you place specific requirements on authors for providing documentation? Do you provide any other services that help users make use of the data (e.g. higher level product generation services, subsetting, aggregation, coordinate transformations, mapping services, format conversion, data dictionaries, quality assurance, compliance services, help desk)? Do you provide data citation recommendations to users? Do you support exporting data citation information via any of the following? Formatted text BibTeX Endnote R1.1. Licenses Do you allow or require data in your repository to be associated with a license? Do you support data that is explicitly in the public domain? Do you display license information on Landing Pages? Do you encode license information in metadata? Do you assign a particular license, or does the provider specify a license? Do you restrict the set of allowed licenses? Which ones do you allow or support? R1.2. Provenance Do you capture data provenance information for data in your repository? Does that include history preceding its ingestion into your repository? Do you encode and export provenance metadata using a community standard schema/format (e.g. W3C’s PROV, etc.)? R1.3. Domain-specific standards Do you support any domain-specific metadata schemas, vocabularies, or formats? Do you support any domain-specific data export formats? Do you support any other domain-specific standards or protocols?
21
A Data Citation Roadmap for Scholarly Data Repositories
Video: Authors: Martin Fenner, Mercè Crosas, Jeffrey Grethe, David Kennedy, Henning Hermjakob, Philippe Rocca-Serra, Gustavo Durand, Robin Berjon, Sebastian Karcher, Maryann Martone, Timothy Clark doi:
22
LIBER Webinar: A Data Citation Roadmap for Scholarly Data Repositories, May 15, 2017
23
LIBER Webinar: A Data Citation Roadmap for Scholarly Data Repositories, May 15, 2017
24
How To Participate – Enabling FAIR Data
General Mailing List: send request to Review the Commitment Statement and consider being a signatory. Learn more about how to include your organization!! Send to Continue promoting the importance of Research Data Management!
25
Thank you! Shelley Stall Director, AGU Data Program @ShelleyStall
26
Enabling FAIR Data – Project Orientation Material
Article describing the Enabling FAIR Data Project: Outcome of the initial Stakeholder Meeting from Nov 16-17, 2017: DataONE webinar recording: Enabling FAIR Data (high-level) Project Site:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.