A Funder's Perspective on Sustainability of Digital Data Repositories

Slides:



Advertisements
Similar presentations
A centre of expertise in digital information management UKOLN is supported by: UK Perspectives on the Curation and Preservation of Scientific.
Advertisements

Global Alignment and Collaboration Jo
Data the NIH What is Happening & What is Coming A Conversation Philip E. Bourne, PhD, FACMI Associate Director for Data Science National Institutes.
New DFG Information Infrastructure Projects Dr. Stefan Winkler-Nees; Birmingham, 28. March 2011 New DFG Information Infrastructure Projects.
Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K) Biomedical Big Data Initiative (BD2K)
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Libraries as Partners in Research: the UC Curation Center’s Tools and Services UC3 Team University of California Curation Center California Digital Library.
Data! Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health.
Data Science for International Data Week 2016: Concept Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science.
Big Data to Knowledge (BD2K) Jennie Larkin, Ph.D. NIH RDA P5 March 10,2015.
NIH Activities Related to Big Data Jerry Sheehan Assistant Director for Policy Development National Library of Medicine Board on Research Data and Information.
FY13 Accomplishments 1 Update to the Board of Research Data on Information CENDI INCREASING THE IMPACT OF FEDERALLY FUNDED SCIENCE September 23, 2013 Jerry.
Data Sharing and Archiving: A Professional Society View Clifford S. Duke Ecological Society of America September 9, 2010.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
MPS Workshop 1: Gauging the Impact of Requirements for Public Access to Data November 19, 2015 Jennie Larkin, Ph.D. Office of the Associate Director for.
NIH BioCADDIE / Force11 Data Citation Pilot Kickoff Meeting Nine Zero Hotel, Boston MA, 3 February 2016 Introduction: Tim Clark, Maryann Martone and Joan.
NIH: DATA SCIENCE & BD2K Jennie Larkin, PhD Senior Advisor, Extramural Programs and Strategic Planning Office of the Associate Director for Data Science,
Building Capacities for Establishment of Social Science Digital Data Archives Aleksandra Bradić-Martinović, Institute of Economic Sciences, Belgrade Achievements.
OECD Global Science Forum Project on Sustainable Business Models for Data Repositories.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
Data NIH Philip E. Bourne, PhD Associate Director for Data Science National Institutes of Health Big Data Symposium, Lincoln,
INTRODUCTION TO BIBLIOMETRICS 1. History Terminology Uses 2.
Helmholtz Open Science Webinars on Research Data Webinar 34 – 6 / 11 April 2016 Dr. Birgit Schmidt Niedersächsische Staats- und Universitätsbibliothek.
NIH – A Vision Through 2020 Philip E. Bourne, PhD, FACMI Associate Director for Data Science
NRF Open Access Statement
Towards a unified MOD resource: An Overview
To develop the scientific evidence base that will lessen the burden of cancer in the United States and around the world. NCI Mission Key message:
Jennie Larkin, PhD Senior Advisor
Robert R. Downs1and Robert S. Chen2
Jisc Open Access Dashboard
Priorities for International Development of e-Infrastructure and Data Management in Global Change Research Presentation by Robert Gurney, University of.
Auditing of Trustworthy Data Repositories – Speakers
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
ELIXIR Core Data Resources and Deposition Databases
CyVerse Tools and Services
Tools and Services Workshop
EOSC MODEL Pasquale Pagano CNR - ISTI
Joslynn Lee – Data Science Educator
Donatella Castelli CNR-ISTI
Summit 2017 Breakout Group 2: Data Management (DM)
National planning for Open Research euroCRIS 2017, 30 May 2017
The Challenge.
Toward FAIR Semantic Resources
Changing Practices… Changing Values
ELIXIR Safeguarding the results of life science research in Europe
OpenAIRE Services for Open Science
Funding Sustainability and Domain Repositories
Access  Discovery  Compliance  Identification  Preservation
Preprints and Other Interim Research Products NIH perspectives
European Open Science Cloud All Hands Meeting Pisa 8-9 March 2018
Functional Annotation of the Horse Genome
EOSCpilot Skills Landscape & Framework
Scientific Data as Research Infrastructure
EOSCpilot All Hands Meeting 8 March 2018 Pisa
WG/IG Collaboration Meeting June Göteborg METADATA GROUPS PERSPECTIVE Keith G Jeffery & Rebecca Koskela.
Commons Credits Pilot – Overview
An EUDAT-based FAIR Data Approach for Data Interoperability
Common Solutions to Common Problems
Open Science: the crucial importance of metadata
Integrating social science data in Europe
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Bird of Feather Session
Data(trans)forming Roberto Barcellan European Commission NTTS2019
Donatella Castelli (CNR-ISTI) Project coordinator
Research data lifecycle²
Skills Framework FAIR 4S DI4R18 Lisbon
Supporting Open Research
Interoperability and data for open science
Presentation transcript:

A Funder's Perspective on Sustainability of Digital Data Repositories Allen Dearry, PhD Director, Office of Scientific Information Management National Institute of Environmental Health Sciences, NIH  SciDataCon, Denver, CO  September 13 , 2016 

The Challenge

This is expected to increase 50% this year alone. National Center for Biotechnology Information European Molecular Biology Laboratory-European Bioinformatics Institute US NCBI holds 20 PB of data. This is expected to increase 50% this year alone. EMBL-EBI estimates that biological data double every 12-18 months. Dark data—12% of data described in published papers is in recognized archives.

BD2K Sustainability Workgroup Function Develop an NIH vision for economic, technical, and social stewardship of biomedical data repositories. Goals Goal 1: Define metrics for evaluation of biomedical data repositories and assess value. Goal 2: Develop a sustainable lifecycle and coherent funding plan in support of biomedical research data.

What Solutions Are We Exploring? Metrics for review and evaluation Enhance efficiency and effectiveness of curation International collaboration on business models Pilots Model Organism Databases (MODs) Commons

Request for Information: Metrics to Assess Value of Biomedical Digital Repositories (NOT-OD-16-133) Which data should be preserved and for how long? Qualitative and quantitative metrics, such as Utilization at multiple levels Indicators of quality and impact Quality of service Infrastructure and governance Case studies demonstrating value, e.g., What would happen in the absence of the repository? Responses due September 30 Utilization at multiple levels (repository, dataset, data item) Size and demand of community served Indicators of repository quality and impact Publications, citations, altmetrics, patents Quality of service Data quality measures; user support and training Infrastructure and governance Advisory board; legal structure Qualitative metrics for the above categories, e.g., use cases/case studies Case studies demonstrating value If the repository weren’t available, how would that impact your work?

Interagency Workshop on Measuring the Impact of Data Repositories Organized by Big Data Interagency Work Group Planning group NIH, NSF, NIST, NARA, DOT, NTIS, DHS December 8 and 9, Washington DC Repository managers, data producers & users, funders, publishers, metrics/evaluation experts Workshop Objectives Identify current metrics, tools and methodologies for assessing and communicating impact of digital repositories. Identify technical, social and financial obstacles. Synthesize results into best practices for both near and long term success.

Big Data to Knowledge (BD2K) Enhancing the Efficiency and Effectiveness of Digital Curation for Biomedical Big Data (RFA-LM-17-001) Efficient Tools Automated or semi-automated approaches Improve speed and accuracy Support data annotation at points throughout the research lifecycle Distributed, crowdsource approaches to curation Tools and templates to facilitate consistent use of community-defined standards such as common data elements and standards used by archival resources such as GenBank, SRA, Biosample, etc. Automated or semi-automated approaches to merging (harmonizing) disparate or heterogeneous data sets for purposes of new research. Approaches that improve the speed and accuracy of extracting metadata information from text or other digital sources, and linking the information to a data set or other digital asset. Approaches that support data annotation at points throughout the research lifecycle (data gathering, preparation of data for sharing, public sharing of data sets, submission or review of articles supported by data sets, etc.). Distributed approaches to curation processes that increase the efficiency, completeness, accuracy or quality of the digital asset.

Sustainable Business Models for Data Repositories Organized by OECD Global Science Forum Cochair with Simon Hodson, CODATA, & Ingrid Dillo, DANS Landscape survey, July-September Broad spectrum of 60 worldwide repositories Characteristics, metrics, income streams, future funding, alternative income, cost optimization Workshops 1.1 Innovative income streams, November 3 Paris 1.2 Cost restraint, November 4 Paris 2.0 Business models, March 2017 Brussels Session at SciDataCon, September Denver Report and recommendations, April 2017

Future of Life Sciences and Biomedical Databases Organized by International Human Frontier Science Program Information gathering, June-September 20 life science repositories Funding model, user community, usage, metrics, sustainability challenges, contingency plans Workshop Life sciences data resources and the future,” November 18-19, Strasbourg International data resources, public & private funding agencies, scientific organizations, publishers White paper NIH, GA4GH, EBI, academic experts Data management/curation, QA, IP, commons, DOI, FAIR, sustainable funding, improved efficiencies

Example: Model Organism Databases Highly curated and valuable data Siloed /Not interoperable Cumbersome to compute over all the data Costly to maintain as individual resources I

Pilot a new infrastructure model SGD FlyBase WB MGD ZFIN GOC Alliance Genomic Resources SGD FlyBase WormBase MGD ZFin GO Consortium User confusion for lack of homogeneity User access interfaces need different navigation skills and data access approaches for each resource Semantic inconsistencies and different data structures for the same genomic entities Analyses human/model organism association for disease and phenotypes functional annotation Homology representation MODs support biomedical research across NIH and international biomedical research Aim is to support findable, accessible, interoperable and reusable (FAIR) model organism data facilitated by the NIH Commons platform Different user access & analyses Redundancy of operations Standardize interfaces Standardize curation, display of shared data

The Data Commons Framework Compute Platform: Cloud or HPC Services: APIs, Containers, Indexing, Software: Services & Tools scientific analysis tools/workflows Data “Reference” Data Sets User defined data Digital Object Compliance App store/User Interface Treats products of research - data, software, metadata, workflows, papers etc. - as digital objects Digital objects exist in a shared virtual space - Deposit, Manage, Find, Share, and Re-Use digital objects Conforms to FAIR principles: Findable Accessible Interoperable Reusable Detailed description of the Commons Framework can be found at : https://datascience.nih.gov/commons

Data Science at NIH Data Science at NIH allen.dearry@nih.gov  https://datascience.nih.gov  bd2k@nih.gov  @NIH_BD2K  #BD2K allen.dearry@nih.gov