BD2K @ NIH – A Vision Through 2020 Philip E. Bourne, PhD, FACMI Associate Director for Data Science philip.bourne@nih.gov
Yes these are uncertain times, but … First and foremost you should see this meeting as a celebration of the hard work of the past two years Yes these are uncertain times, but … There is a commitment to the BD2K program through 2020
BD2K cannot be viewed in isolation, but rather as part of a broader view of data science @ NIH … Particularly as funding is increasingly from the IC’s
A View Which Includes: A vibrant research program of: Fundamental developments in data science Application of those fundamental developments Flagship projects to which developments are applied: PMI, Brain, Moonshot, ECHO A sustainable data ecosystem Commons and the FAIR Principles adoption Cross-cutting activities Increased workforce training A changing governance model
A Strategic Response can be Modeled on Three Axes: Research Resources Outcomes
A Strategic Response Research Resources Outcomes Fundamental Machine learning Data mining Indexing Predictive modeling … Applied Sustainability, governance, economics of data Privacy and security Effective use of clouds … Research Resources Outcomes
A Strategic Response Research Resources Outcomes Fundamental Machine learning Data mining Indexing Predictive modeling … Applied Sustainability, governance, economics of data Privacy and security Effective use of clouds … Research Resources Standards Commons APIs Reference data sets Workflows Access & Authentication Workforce Outcomes
A Strategic Response Research Resources Outcomes Fundamental Machine learning Data mining Indexing Predictive modeling … Applied Sustainability, governance, economics of data Privacy and security Effective use of clouds … Research Evaluated pilots FAIR data Trained workforce Best practices Policies Effective use of clouds On-ramps for all IC’s Resources Standards Commons APIs Reference data sets Workflows Access & Authentication Workforce Outcomes
A View Which Includes: A vibrant research program of: Fundamental developments in data science Application of those fundamental developments Flagship projects to which developments are applied: PMI, Brain, Moonshot, ECHO A sustainable data ecosystem Commons and the FAIR Principles adoption Cross-cutting activities Increased workforce training A changing governance model
The Current Situation NIH Funded Data Dark Data Cost Total data from NIH-funded research currently estimated at 650 PB* 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10 PB this year Dark Data Only 12% of data described in published papers is in recognized archives – 88% is dark data^ Cost 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data archives $1.25bn per year to capture all data. After a significant effort at reduction, intramurally data is spread across > 60 data centers; imagine the extramural situation. * In 2012 Library of Congress was 3 PB ^ http://www.ncbi.nlm.nih.gov/pubmed/26207759
The Commons - Status Commons and FAIR principles* adopted across NIH Development and public release of a prototype Data Discovery Index DataMed Feb. v 1.0 Nov v 1.5 Cloud credits being issued for work in the Commons FOA’s for Commons Framework being issued Commons pilots under way * https://www.ncbi.nlm.nih.gov/pubmed/26978244
Sustainability – Sample Other Activities Request for Information: Metrics to Assess Value of Biomedical Digital Repositories (NOT-OD-16-133) To be discussed at Sustainability Session, Wed 1pm RFA to support community based standards work was released in the fall for May 2017 award, session today 1pm Funding opportunity announcement: (BD2K) Enhancing the Efficiency and Effectiveness of Digital Curation for Biomedical Big Data (RFA-LM-17-001) Applications due Dec 15
Sustainability – Looking Forward International collaboration on business models for sustainable data repositories Sustainable Business Models for Data Repositories (OECD Global Science Forum) Future of Life Sciences and Biomedical Databases (International Human Science Frontiers Program) NIH long-term data repository support Federal interagency Workshop on Measuring the Impact of Data Repositories, 2017 Recommend mechanism(s), review criteria, implementation plan
Example Cross-cutting Activities International partnerships Count everything – Secure count query framework California centers regional meetings GA4GH – Beacon project
A View Which Includes: A vibrant research program of: Fundamental developments in data science Application of those fundamental developments Flagship projects to which developments are applied: PMI, Brain, Moonshot, ECHO A sustainable data ecosystem Commons and the FAIR Principles adoption Cross-cutting activities Increased workforce training A changing governance model
NLM Working Group Report Patti Brennan – New NLM director http://acd.od.nih.gov/reports/Report-NLM-06112015-ACD.pdf Recommendation – NLM should become the programmatic epicenter for data science at NIH … Patti Brennan – New NLM director
What We Hope to See in 2020 New innovations bought about by large and complex data Evidence of translation i.e. real application at the point of care Broad Commons adoption leading to Improved sharing, reuse and hence cost effectiveness and reproducibility A balance between what is spent on data vs what is gained from that data Policies that are supportive of the above
… for your hard work and to the NIH staff from the ADDS office and from across the IC’s who have toiled to make BD2K a success