Belinda Seto, Ph.D. Deputy Director National Institute of Biomedical Imaging and Bioengineering Belinda Seto, Ph.D. Deputy Director National Institute of Biomedical Imaging and Bioengineering NIH and Biomedical ‘Big Data’
Myriad Data Types Other ‘Omic ImagingPhenotypic Clinical Genomic Exposure
Data and Informatics Working Group acd.od.nih.gov/diwg.htm
At a pivotal point: Risk failing to capitalize on technology advances Bordering on “institutional malpractice” Cultural changes at NIH are essential Aim to develop new opportunities for: Data sharing Data analysis Data integration Long-term NIH commitment is required Overarching Themes
NIH is Tackling the ‘Big Data’ Problem 1. New NIH Leadership Position: Associate Director for Data Science (ADDS) 2. New Internal NIH Governing/Oversight Body: Scientific Data Council (SDC) 3. New Trans-NIH Initiative: Big Data to Knowledge (BD2K)
What’s in a Name? Computational Biology Big Data Information Science Bioinformatics Biomedical Informatics Quantitative Biology Data Science Biostatistics
NIH Data Science ‘Programmatic Czar’ (aka, Point Person, Strategic Leader, etc.) Reports to NIH Director Eric Green, Acting Search underway (Eric Green & Jim Anderson, Co-Chairs of Search Committee) Associate Director for Data Science: Overview
Principal advisor to NIH Director and NIH leadership Provides vision and leadership in data science Chair, Scientific Data Council (and thus chief steward of Scientific Data Council responsibilities) Program lead for Big Data to Knowledge (BD2K) Coordinates data science activities, both within and outside of NIH Leads long-term NIH strategic planning in data science NIH leader responsible for promoting trans-NIH, national, and global policies for data sharing Coordination with NIH Chief Information Officer Associate Director for Data Science: Responsibilities
High-level internal NIH group Chaired by Associate Director for Data Science Reports to NIH Steering Committee Trans-NIH representation Scientific Data Council: Overview
Acting Chair:Eric Green (Acting ADDS & NHGRI) Members: James Anderson (DPCPSI) Sally Rockey (OER) Michael Gottesman (OIR) Kathy Hudson (OD) Andrea Norris (CIT) Judith Greenberg (NIGMS) Betsy Humphreys (NLM) Douglas Lowy (NCI) John J. McGowan (NIAID) Alan Koretsky (NINDS) Michael Lauer (NHLBI) Belinda Seto (NIBIB) Acting Executive Secretary: Allison Mandich (NHGRI) Scientific Data Council: Membership
Trans-NIH programmatic leadership and coordination of data science activities Oversight of BD2K Trans-NIH intellectual and programmatic ‘Hub’ for data science (coordination and convening functions) Coordination with data science activities beyond NIH (e.g., other government agencies, other funding agencies, and private sector) Long-term NIH strategic planning in data science Major role in data sharing policy development and oversight Coordination with ‘parallel’ Administrative Data Council Scientific Data Council: Responsibilities
Big Data to Knowledge (BD2K): Overview Major trans-NIH initiative addressing an NIH imperative and key roadblock Aims to be catalytic and synergistic Overarching goal: By the end of this decade, enable a quantum leap in the ability of the biomedical research enterprise to maximize the value of the growing volume and complexity of biomedical data
I.Facilitating Broad Use of Biomedical Big Data II. Developing and Disseminating Analysis Methods and Software for Biomedical Big Data III. Enhancing Training for Biomedical Big Data IV. Establishing Centers of Excellence for Biomedical Big Data BD2K: Four Programmatic Areas
IA. Facilitating Broad Use of Biomedical Big Big Data -- Data Catalog RFI responses received – June responses received Data Catalog Workshop held Aug 21, 22 Fran Berman, chair Jenny Larkin (NHLBI), Ron Margolis (NIDDK), co-organizers BD2K: Four Programmatic Areas
IB. Facilitating Broad Use of Biomedical Big Big Data – Data/Metadata Standards Frameworks for Community-based Standards Efforts Workshop September 25,26 Susanna Santone & David Kennedy, co-chairs Mike Huerta (NLM), Leslie Derr (OD) co-org BD2K: Four Programmatic Areas
IC. Facilitating Broad Use of Biomedical Big Data - Enabling research use of clinical data Workshop September 11, 12 Robert Cardiff & Dan Masys, co-chairs Leslie Derr (OD), Jerry Sheehan (NLM) co-org Webcast w/ real-time, online discussion forum To identify actionable steps that NIH can take to accelerate the use of clinical data in research Near and long-term needs for research, infrastructure, standards and policies Organizers are collecting information about relevant initiatives BD2K: Four Programmatic Areas
II. Developing and Disseminating Analysis Methods and Software for Biomedical Big Data FOAs for BD2K-specific software needs in FY15 RFI issued August 8, responses due Sept 6 4 topic areas: data visualization, compression/reduction, provenance, wrangling Software Catalogue Workshop: Feb 18-19, 2014 Chairs: Asif Dhar and Owen White BD2K: Four Programmatic Areas
II. Developing and Disseminating Analysis Methods and Software for Biomedical Big Data Updated broad-based software development FOAs (“BISTI”), notice of intent to publish Cloud computing: joint BD2K-Infrastructure Plus working group initiated on-going discussion with NCI, joint survey results being written up on-going discussion with commercial providers. BD2K: Four Programmatic Areas
II. Developing and Disseminating Analysis Methods and Software for Biomedical Big Data Dynamic Community Engagement: micro-blog and twitter developed for BD2K workshops BD2K: Four Programmatic Areas
III. Enhancing Training for Biomedical Big Data RFI, >100 responses received Workshop held July 29, 30 Karen Bandeen-Roche, Zak Kohane, co-chairs Michelle Dunn (NCI), Bettie Graham (NHGRI), organizers Webcast, archived BD2K: Four Programmatic Areas
III. Enhancing Training for Biomedical Big Data – Workshop recommendations Opportunity for extraction of knowledge from Big Data is often highest at the interface of at least two disciplines; training programs should be designed to work at interfaces Training programs should be designed to provide skills to work effectively in Team Science Dual mentoring should be encouraged Flexibility needed to encourage innovation and to take best advantage of local expertise and talent Trainees need access to large data sets BD2K: Four Programmatic Areas
III. Enhancing Training for Biomedical Big Data – Workshop recommendations Training in quantitative science and experimental design will be increasingly important to clinical researchers and even clinicians Principles of reproducible research must be stressed There are training needs across the full spectrum of scientists, in terms of both experience and activities The jobs that need to be done in effective Big Data science may not correspond to traditional academic jobs A diverse workforce should be a major goal of data science training activities BD2K: Four Programmatic Areas
IV. Establishing Centers of Excellence for Biomedical Big Data Investigator-initiated centers FOA released July 22 Applications due November 20 Technical Information Webinar Sept 12 NIH-Initiated centers LINCS-BD2K Data Coordination and Integration Center (+ $2.5M from Common Fund) Principles being developed BD2K: Four Programmatic Areas
Nature | News & Views: Alzheimer's disease: From big data to mechanism Vivek SwarupVivek Swarup & Daniel H. GeschwindDaniel H. Geschwind This work is also exemplary in demonstrating the extraordinary value of publicly available data resources. Published data on human gene expression, Alzheimer's disease GWAS and neuroimaging provide the pillars of Rhinn and collaborators' paper. Integrative analyses of these data by the authors, and previously by others, weaken the view that substantive biological experimentation only takes place at the wet bench, and highlight the value of innovative re-analyses of existing data.
Nature | News & Views: Alzheimer's disease: From big data to mechanism Vivek SwarupVivek Swarup & Daniel H. GeschwindDaniel H. Geschwind This work is also exemplary in demonstrating the extraordinary value of publicly available data resources. Published data on human gene expression, Alzheimer's disease GWAS and neuroimaging provide the pillars of Rhinn and collaborators' paper. Integrative analyses of these data by the authors, and previously by others, weaken the view that substantive biological experimentation only takes place at the wet bench, and highlight the value of innovative re-analyses of existing data.
Nature | News & Views: Alzheimer's disease: From big data to mechanism Vivek SwarupVivek Swarup & Daniel H. GeschwindDaniel H. Geschwind This work is also exemplary in demonstrating the extraordinary value of publicly available data resources. Published data on human gene expression, Alzheimer's disease GWAS and neuroimaging provide the pillars of Rhinn and collaborators' paper. Integrative analyses of these data by the authors, and previously by others, weaken the view that substantive biological experimentation only takes place at the wet bench, and highlight the value of innovative re-analyses of existing data.
The biomedical research enterprise is undergoing a major ‘phase change’ with respect to Big Data and data science Trans-NIH problem needing trans-NIH solutions Solutions include multifaceted cultural changes New NIH plans are: Mission critical Transformational Transitional-- en route to longer-term commitment Closing Thoughts
Questions?