Data Science at the NIH Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health.

Slides:



Advertisements
Similar presentations
DIScovery SciEnce through Computational Thinking (DISSECT) Enrico Pontelli.
Advertisements

The Open Innovation Center Susie Stephens, Principal Research Scientist, Eli Lilly.
Belinda Seto, Ph.D. Deputy Director National Institute of Biomedical Imaging and Bioengineering Belinda Seto, Ph.D. Deputy Director National Institute.
“It is the responsibility of those of us involved in today’s biomedical research enterprise to translate the remarkable scientific innovations we are witnessing.
Data, Data Everywhere, But Not a Byte to Eat Michael F. Huerta, Ph.D. Associate Director, National Library of Medicine Director, Office of Health Information.
Data the NIH What is Happening & What is Coming A Conversation Philip E. Bourne, PhD, FACMI Associate Director for Data Science National Institutes.
George A. Komatsoulis, Ph.D. National Center for Biotechnology Information National Library of Medicine National Institutes of Health U.S. Department of.
Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K) Biomedical Big Data Initiative (BD2K)
The NIH Roadmap for Medical Research
Institute of Cancer Research - Institut du cancer ICR’s Activities in Cancer Imaging.
Terrie R. Wheeler, AMLS and Keith Cogdill, PhD, Division of Library Services, Office of Research Services, National Institutes of Health (NIH) Library,
Institute on Systems Science and Health- Federal Funding Panel Grace C.Y. Peng, Ph.D. May 25, 2011.
BD2K-LINCS-Perturbation Data Coordination & Integration Center Applicant Information Webinar for RFA-HG Ajay Pillai and Jennie Larkin January 13,
ACCELERATING CLINICAL AND TRANSLATIONAL RESEARCH
NCRR American Society of Plant Biologists NIH Science Education Partnership Award (SEPA)
NACHGR MAY 20, 2013 PAMELA SANKAR, PHD UNIVERSITY OF PENNSYLVANIA GENOMICS AND SOCIETY WORKING GROUP: UPDATE AND REPORT.
1 Robert S. Webb and Roger S. Pulwarty NOAA Climate Service.
FY Division of Human Resources Development Combined COV COV PRESENTATION TO ADVISORY COMMITTEE January 7, 2014.
Overview: FY12 Strategic Communications Plan Meredith Fisher Director, Administration and Communication.
Computational Sciences within NIGMS Protein Ontology Meeting, Georgetown, June 18, 2014 Veerasamy “Ravi” Ravichandran, Ph.D. Program Director Biomedical.
Data! Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health.
ASSOCIATION OF STATE PUBLIC HEALTH NUTRITIONISTS.
The Materials Genome Initiative and Materials Innovation Infrastructure Meredith Drosback White House Office of Science and Technology Policy September.
National Science Foundation 1 Evaluating the EHR Portfolio Judith A. Ramaley Assistant Director Education and Human Resources.
Michael F. Huerta, Ph.D. Associate Director for Program Development National Library of Medicine, NIH BD2K CDE Webinar – September 8, 2015 Common Data.
Big Data to Knowledge (BD2K) Jennie Larkin, Ph.D. NIH RDA P5 March 10,2015.
NIH Big Data to Knowledge (BD2K) March 4, 2014 Peter Lyster National Institute of General Medical Sciences (NIGMS) NIH.
NIH Activities Related to Big Data Jerry Sheehan Assistant Director for Policy Development National Library of Medicine Board on Research Data and Information.
Federal Networking and Information Technology R&D Program Big Data Senior Steering Group Wendy Wigen, Technical Coordinator April 13, 2012.
Cyberinfrastructure Planning at NSF Deborah L. Crawford Acting Director, Office of Cyberinfrastructure HPC Acquisition Models September 9, 2005.
National Centers for Biomedical Computing Software and Data Integration Working Group Peter Lyster (Chair) NCBC Workshop Wednesday December 16 (2005)
Second GPM Applications Workshop 9-10.June Partners in the GPM Constellation Second GPM Applications Workshop 9-10.June.2015.
ESIP Federation Air Quality Cluster Partner Agencies.
NIH Common Fund Library of Integrated Network- based Cellular Signatures LINCS Applicant Information Webinar for RFA-RM September 6, :00 –
Transforming the Tech Valley Workforce Region A Blueprint From Traditional Manufacturing to Globally Competitive Advanced Manufacturing and Technology.
Mapping New Strategies: National Science Foundation J. HicksNew York Academy of Sciences4 April 2006 Examples from our daily life at NSF Vision Opportunities.
The NCATE Journey Kate Steffens St. Cloud State University AACTE/NCATE Orientation - Spring 2008.
Kim Witmer Senior Vice President Chief Financial Officer Michael Nunn, Ph.D. Executive Director Research Development.
Block 7: Reports Back to Plenary Group on CE and CI Working Group Activities Tasks and Activities -- October 22 DataONE Kick-off Meeting October 20-22,
The Importance of a Strategic Plan to Eliminate Health Disparities 2008 eHealth Conference June 9, 2008 Yvonne T. Maddox, PhD Deputy Director Eunice Kennedy.
SMRB Working Group on Approaches to Assess the Value of Biomedical Research Supported by NIH SMRB Working Group on Approaches to Assess the Value of Biomedical.
Midwest Big Data Hub Edward Seidel Director, NCSA Founder Prof. of Physics, Prof of Astronomy On behalf of the Midwest Big Data Hub 1 Brian Athey Sarah.
ARL Workshop on New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe September 26-27, 2006 ARL Prue.
University of Kentucky Center for Clinical and Translational Science (CCTS) November 2015 Stephen W. Wyatt, DMD, MPH Senior Associate Director Center for.
MPS Workshop 1: Gauging the Impact of Requirements for Public Access to Data November 19, 2015 Jennie Larkin, Ph.D. Office of the Associate Director for.
NIH: DATA SCIENCE & BD2K Jennie Larkin, PhD Senior Advisor, Extramural Programs and Strategic Planning Office of the Associate Director for Data Science,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
Joint Health IT Committee Meeting Precision Medicine Task Force March 10, 2016 Leslie Kelly Hall, Co-Chair Andy Wiesenthal, Co-Chair.
Data NIH Philip E. Bourne, PhD Associate Director for Data Science National Institutes of Health Big Data Symposium, Lincoln,
The Vision for the NIH Philip E. Bourne, PhD, FACMI Associate Director for Data Science National Institutes of Health Bio-IT World, Boston April.
Science & Technology for National Progress in African Region: Highlights of Regional Strategy and Action Professor Gabriel B. Ogunmola, FAS President,
The opportunities and challenges of sharing genomics data with the pharmaceutical industry Shahid Hanif, Head of Health Data & Outcomes, ABPI DNA digest.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
David M. Murray, Ph.D. Associate Director for Prevention Director, Office of Disease Prevention Multilevel Intervention Research Methodology September.
National Institutes of Health U.S. Department of Health and Human Services Planning for a Team Science Evaluation ∞ NIEHS: Children’s Health Exposure Analysis.
National Network of Libraries of Medicine.  Feedback received during 2013 mid-contract site visits and 2014 RFI (request for information)  Feedback.
The NIH Data Commons: A Cloud-based Training Environment Philip E. Bourne, Ph.D. FACMI Associate Director for Data Science National Institutes of Health.
NIH – A Vision Through 2020 Philip E. Bourne, PhD, FACMI Associate Director for Data Science
Reproducibility: A Funder and Data Science Perspective
Jennie Larkin, PhD Senior Advisor
Making Cross-campus, Inter-institutional Collaborations Work
Commons Credit Model: Update to the BD2K AHM
Solutions to Clinical Data Visualization and Analysis
NLM: Meeting Challenges & Seizing Opportunities in & with Big Data
Computer Science Department, University of Missouri, Columbia
Summit 2017 Breakout Group 2: Data Management (DM)
Research Development Office
Topics Introduction to Research Development
Clinical and Translational Science Awards Program
BCoN Data Integration Workshop, University of Kansas, Feb 13-14, 2018
Presentation transcript:

Data Science at the NIH Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health

Data Science Timeline 6/12 Findings: Sharing data & software through catalogs Support methods and applications development Need more training Need campus-wide IT strategy Hire CSIO Continued support throughout the lifecycle

Data Science Timeline 6/12 U54 Centers of Excellence - under review U54 BD2K-LINCS– under review U24 Data Discovery Index– under review R01, R41, R42, R43, R44, U01 software and analysis methods grants – on-going T32, T15, K01, R25 and R26 training awards – under review 2/14

Data Science Timeline 6/12 U54 Centers of Excellence - under review U54 BD2K-LINCS– under review U24 Data Discovery Index– under review R01, R41, R42, R43, R44, U01 software and analysis methods grants – on-going T32, T15, K01, R25 and R26 training awards – under review 2/14 3/14

ADDS Activities Thus Far: Talked to Stakeholders (Examples) –20/27 IC Directors –Agencies NSF DOE DARPA NIST –Government OSTP HHS HDI ONC –Private sector Phrma Google Amazon –Organizations PCORI CCC CATS FASEB Biophysical Society Sloan Foundation Moore Foundation

ADDS Activities Thus Far: Some Initial Observations  Bad News –We do not yet have a data sustainability plan –OSTP have defined the why but not the how –We do not know how all the data we currently have are used –We can’t estimate future supply and demand –Hence we have not projected the resources that will be required to store and analyze data in the future  Good news –Genuine willingness to address the problem across IC’s –Efficiencies can be achieved –BD2K is the beginnings of a plan –We are beginning to quantify the issues –We have some of the best data scientists in the world to work on the problems

Based on this data gathering we have defined 5 thematic areas to pursue towards a vision…

Associate Director for Data Science Commons Training Center BD2K Modified Review Sustainability* Education* Innovation* Process Cloud – Data & Compute Search Security Reproducibility Standards App Store Coordinate Hands-on Syllabus MOOCs Community Centers Training Grants Catalogs Standards Analysis Data Resource Support Metrics Best Practices Evaluation Portfolio Analysis The Biomedical Research Digital Enterprise Communication Collaboration Programmatic Theme Deliverable Example Features IC’s Researchers Federal Agencies International Partners Computer Scientists Scientific Data CouncilExternal Advisory Board * Hires made

Some Goals of the Digital Enterprise  Cost savings through sharing of best practices associated with longitudinal clinical studies  Collaboration through identification of collaborators at the point of data collection not publication  Improved reproducibility through data and methods sharing  Integration of data types and data and literature to accelerate discovery  Availability of clinical data while respecting patient privacy

Associate Director for Data Science Commons Training Center BD2K Modified Review Sustainability* Education* Innovation* Process Cloud – Data & Compute Search Security Reproducibility Standards App Store Coordinate Hands-on Syllabus MOOCs Community Centers Training Grants Catalogs Standards Analysis Data Resource Support Metrics Best Practices Evaluation Portfolio Analysis The Biomedical Research Digital Enterprise Communication Collaboration Programmatic Theme Deliverable Example Features IC’s Researchers Federal Agencies International Partners Computer Scientists Scientific Data CouncilExternal Advisory Board * Hires made

The Commons (Vivien Bonnazi & George Komatsoulis (NCBI))  Public/private partnership  Work with IC’s, NCBI and CIT to identify and run pilots – cloud, HPC centers  Port DbGAP to the cloud ?Experiment with new funding strategies  Evaluate

Sustainability and Sharing: The Commons Data The Long Tail Core Facilities/HS Centers Clinical /Patient The Why: Data Sharing Plans The Commons Government The How: Data Discovery Index Sustainable Storage Quality Scientific Discovery Usability Security/ Privacy Commons == Extramural NCBI == Research Object Sandbox == Collaborative Environment The End Game: Knowledge NIH Awardees Private Sector Metrics/ Standards Rest of Academia Software Standards Index BD2K Centers Cloud, Research Objects, Business Models

What Does the Commons Enable?  Dropbox like storage  The opportunity to apply quality metrics  Bring compute to the data  A place to collaborate  A place to discover

Associate Director for Data Science Commons Training Center BD2K Modified Review Sustainability* Education* Innovation* Process Cloud – Data & Compute Search Security Reproducibility Standards App Store Coordinate Hands-on Syllabus MOOCs Community Centers Training Grants Catalogs Standards Analysis Data Resource Support Metrics Best Practices Evaluation Portfolio Analysis The Biomedical Research Digital Enterprise Communication Collaboration Programmatic Theme Deliverable Example Features IC’s Researchers Federal Agencies International Partners Computer Scientists Scientific Data CouncilExternal Advisory Board * Hires made

Training (Michelle Dunn)  Training Goals: –Develop a sufficient cadre of researchers skilled in the science of Big Data –Elevate general competencies in data usage and analysis across the biomedical research workforce –Combat the Google bus  How: –Traditional training grants –Work with IC’s on a needs assessment –Work with institutions on raising awareness –Training center(s)?

Associate Director for Data Science Commons Training Center BD2K Modified Review Sustainability* Education* Innovation* Process Cloud – Data & Compute Search Security Reproducibility Standards App Store Coordinate Hands-on Syllabus MOOCs Community Centers Training Grants Catalogs Standards Analysis Data Resource Support Metrics Best Practices Evaluation Portfolio Analysis The Biomedical Research Digital Enterprise Communication Collaboration Programmatic Theme Deliverable Example Features IC’s Researchers Federal Agencies International Partners Computer Scientists Scientific Data CouncilExternal Advisory Board * Hires made

BD2K Innovation (Jennie Larkin and Mark Guyer)  Data Discovery Index Coordination Consortium (U24) (under review)  Metadata standards (under development)  Targeted Software Development Development of Software and Analysis Methods for Biomedical Big Data in Targeted Areas of High Need (U01) –RFA-HG –Application receipt date June 20, 2014 –Topics: data compression/reduction, visualization, provenance, or wrangling. –Contact: Jennifer Couch (NCI) and Dave Miller (NCI)

BD2K Innovation (Jennie Larkin and Mark Guyer)  BISTI PARs –BISTI: Biomedical Information Science and Technology Initiative –Joint BISTI-BD2K effort –R01s and SBIRs –Contacts: Peter Lyster (NIGMS) and Jennifer Couch (NCI)  Workshops: –Software Index (Last week) Need to be able to find and cite software, as well as data, to support reproducible science. –Cloud Computing (Summer/Fall 2014) Biomedical big data are becoming too large to be analyzed on traditional localized computing systems. –Contact: Vivien Bonazzi (NHGRI)

BD2K Innovation (Jennie Larkin and Mark Guyer)  FY14  Investigator-initiated Centers of Excellence for Big Data Computing in the Biomedical Sciences (U54) RFA-HG (closed)  BD2K-LINCS-Perturbation Data Coordination and Integration Center (DCIC) (U54) RFA-HG (closed)

Associate Director for Data Science Commons Training Center BD2K Modified Review Sustainability* Education* Innovation* Process Cloud – Data & Compute Search Security Reproducibility Standards App Store Coordinate Hands-on Syllabus MOOCs Community Centers Training Grants Catalogs Standards Analysis Data Resource Support Metrics Best Practices Evaluation Portfolio Analysis The Biomedical Research Digital Enterprise Communication Collaboration Programmatic Theme Deliverable Example Features IC’s Researchers Federal Agencies International Partners Computer Scientists Scientific Data CouncilExternal Advisory Board * Hires made

Process (All / OD /CSR)  Goals: –Better data sharing e.g., genomic data sharing plan –Capture the best investigators  How: –Machine readable data sharing plans? –Open review? –Micro funding? –Standing data committees to explore best practices? –Crowd sourcing?

Associate Director for Data Science Commons Training Center BD2K Modified Review Sustainability* Education* Innovation* Process Cloud – Data & Compute Search Security Reproducibility Standards App Store Coordinate Hands-on Syllabus MOOCs Community Centers Training Grants Catalogs Standards Analysis Data Resource Support Metrics Best Practices Evaluation Portfolio Analysis The Biomedical Research Digital Enterprise Communication Collaboration Programmatic Theme Deliverable Example Features IC’s Researchers Federal Agencies International Partners Computer Scientists Scientific Data CouncilExternal Advisory Board * Hires made

Data Science Timeline FY15 6/14 9/14 Internal Retreat External Advisory 11/14 FY14 Awards Commons Pilots 11/15 FY15 Awards Commons Launch Clinical Hire Training Assessment Training Center Best Practices New Metrics

Some Acknowledgements  Eric Green & Mark Guyer (NHGRI)  Jennie Larkin (NHLBI)  Leigh Finnegan (NHGRI)  Vivien Bonazzi (NHGRI)  Michelle Dunn (NCI)  Mike Huerta (NLM)  David Lipman (NLM)  Jim Ostell (NLM)  Andrea Norris (CIT)  Peter Lyster (NIGMS)  All the over 100 folks on the BD2K team

NIH … Turning Discovery Into Health