Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K) Biomedical Big Data Initiative (BD2K)

Slides:



Advertisements
Similar presentations
Enrich: Repository and Research System Integration William J Nixon Enrich Project Manager, University of Glasgow.
Advertisements

ESRS Data Policy ESDS role in its successful implementation Kristine Doronenkova,
Digital Repositories: interoperability & common services Closing Remarks Dr Liz Lyon, UKOLN, University of Bath, UK
Who are the Experts?Simon KampaSlide 1 Who are the Experts? Simon Kampa IAM Group University of Southampton
Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
Self-archiving at Southampton a case study University of Cambridge 10 January Wendy White Hartley Library University of.
Belinda Seto, Ph.D. Deputy Director National Institute of Biomedical Imaging and Bioengineering Belinda Seto, Ph.D. Deputy Director National Institute.
Data, Data Everywhere, But Not a Byte to Eat Michael F. Huerta, Ph.D. Associate Director, National Library of Medicine Director, Office of Health Information.
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
Data the NIH What is Happening & What is Coming A Conversation Philip E. Bourne, PhD, FACMI Associate Director for Data Science National Institutes.
Accelerate Business Success With CRM CRM Interoperability.
THE JOINED UP WORLD OF E-RESEARCH Professor Neil McLean National Technical Standards Adviser to the Department of Education Science and Training (DEST)
I:\Share\Bestuursinligting\OUDITfinaal\Portfolio\Statistics\BI UPSpace An institutional repository for the University of.
African Librarianship and the Academic Enterprise Prepared By: Kay Raseroka Director: Library Services University of Botswana.
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
Science as an Open Enterprise: Open Data for Open Science Professor Brian Collins CB, FREng UCL, June 2012 Emerging conclusions from a Royal Society Policy.
V. Chandrasekar (CSU), Mike Daniels (NCAR), Sara Graves (UAH), Branko Kerkez (Michigan), Frank Vernon (USCD) Integrating Real-time Data into the EarthCube.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
BD2K-LINCS-Perturbation Data Coordination & Integration Center Applicant Information Webinar for RFA-HG Ajay Pillai and Jennie Larkin January 13,
Libraries as Partners in Research: the UC Curation Center’s Tools and Services UC3 Team University of California Curation Center California Digital Library.
Computational Sciences within NIGMS Protein Ontology Meeting, Georgetown, June 18, 2014 Veerasamy “Ravi” Ravichandran, Ph.D. Program Director Biomedical.
Data! Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Making Connections: SHARE and the Open Science Framework Jeffrey Open Repositories 2015.
Web of Science® Krzysztof Szymanski October 13, 2010.
Big Data to Knowledge (BD2K) Jennie Larkin, Ph.D. NIH RDA P5 March 10,2015.
Data Science at the NIH Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health.
NIH Big Data to Knowledge (BD2K) March 4, 2014 Peter Lyster National Institute of General Medical Sciences (NIGMS) NIH.
NIH Activities Related to Big Data Jerry Sheehan Assistant Director for Policy Development National Library of Medicine Board on Research Data and Information.
Dr. Fran Berman, RPI Feedback from BRDI Sponsor Forum 11/11 January 29, 2012 Fran Berman.
Networked Information Resources SPARC, E-prints & Open Access initiatives.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
1 Direction scientifique Networks of Excellence objectives  Reinforce or strengthen scientific and technological excellence on a given research topic.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
08/05/06 Slide # -1 CCI Workshop Snowmass, CO CCI Roadmap Discussion Jim Bottum and Patrick Dreher Building the Campus Cyberinfrastructure Roadmap Campus.
What is CDR? – A Few Examples Water Resources in a Changing Climate – Idaho Climate Change Large CD consortia — not the case that everyone works on everything.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
“A Library outranks any other one thing a community can do to benefit its people.” --Andrew Carnegie.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
Nursing Informatics NI.
Challenges of Coping with Funding and Data Management in a Changing World Rick Lyons Director Infectious Disease Research Center.
Children’s Health Exposure Analysis Resource (CHEAR) CHEAR Center for Data Science Susan Teitelbaum, PhD November 4, 2015.
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
NIH and the Clinical Research Enterprise Third Annual Medical Research Summit March 6, 2003 Mary S. McCabe National Institute of Health.
MPS Workshop 1: Gauging the Impact of Requirements for Public Access to Data November 19, 2015 Jennie Larkin, Ph.D. Office of the Associate Director for.
NIH BioCADDIE / Force11 Data Citation Pilot Kickoff Meeting Nine Zero Hotel, Boston MA, 3 February 2016 Introduction: Tim Clark, Maryann Martone and Joan.
NIH: DATA SCIENCE & BD2K Jennie Larkin, PhD Senior Advisor, Extramural Programs and Strategic Planning Office of the Associate Director for Data Science,
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
Leveraging the Expertise of our Staff and the Information Resources We Manage MIT Libraries Visiting Committee April 13, 2005.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
Data NIH Philip E. Bourne, PhD Associate Director for Data Science National Institutes of Health Big Data Symposium, Lincoln,
| 1 Anita de Waard, VP Research Data Collaborations Elsevier RDM Services May 20, 2016 Publishing The Full Research Cycle To Support.
SCHOLARLY COMMUNICATION SARAH NORRIS AND LILY FLICK JUNE 16, 2016.
Kathleen Shearer Data management: The new frontier for libraries.
The NIH Data Commons: A Cloud-based Training Environment Philip E. Bourne, Ph.D. FACMI Associate Director for Data Science National Institutes of Health.
Enhancements to Galaxy for delivering on NIH Commons
Accessing the VI-SEEM infrastructure
Jennie Larkin, PhD Senior Advisor
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
Joslynn Lee – Data Science Educator
Summit 2017 Breakout Group 2: Data Management (DM)
UF Graduate Linguistics Society Seminar Series
Marketplace & service catalog concepts, first design analysis
Research Infrastructures: Ensuring trust and quality of data
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Bird of Feather Session
Supporting Open Research
Presentation transcript:

Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K) Biomedical Big Data Initiative (BD2K)

Myriad Data Types Other ‘Omic ImagingPhenotypic Clinical Genomic Exposure

Data and Informatics Working Group acd.od.nih.gov/diwg.htm

What Are the Big Problems to Solve? 1. Locating the data 2. Getting access to the data 3. Extending policies and practices for data sharing 4. Organizing, managing, and processing biomedical Big Data 5. Developing new methods for analyzing biomedical Big Data 6. Training researchers who can use biomedical Big Data effectively

Overarching Strategy and Goals Two initiatives being proposed to overcome roadblocks Big Data to Knowledge (BD2K) – enable the biomedical research enterprise to maximize the value of biomedical data InfrastructurePlus – create an adaptive environment at NIH to sustain world-class biomedical research

Big Data to Knowledge (BD2K): Overview  Major trans-NIH initiative addressing an NIH imperative and key roadblock  Aims to be catalytic and synergistic  Overarching goal: By the end of this decade, enable a quantum leap in the ability of the biomedical research enterprise to maximize the value of the growing volume and complexity of biomedical data

I.Facilitating Broad Use of Biomedical Big Data II. Developing and Disseminating Analysis Methods and Software for Biomedical Big Data III. Enhancing Training for Biomedical Big Data IV. Establishing Centers of Excellence for Biomedical Big Data BD2K: Four Programmatic Areas

Area 1: Data Sharing & Access A. Policies to Facilitate Data Sharing. B. Data Catalog: Data Discovery, Citation, Links to Literature. C. Frameworks for Community-Based Solutions to Developing Data Standards. D. Enabling Research Use of Clinical Data. Facilitating usage and sharing of biomedical big data  New Policies to Encourage Data & Software Sharing  Index of Research Datasets to Facilitate Data Location & Citation  Community-based Development of Data & Metadata Standards

Area 2: Software and Systems Development A. Grants for software development B. Software Registry: Making biomedical software findable and citable C. Cloud computing: Facilitating Data Analysis D. Dynamic Social Engagement via social media Development of analysis methods and software  Software to Meet Needs of the Biomedical Research Community  Facilitating Data Analysis: Access to Large-scale Computing  Dynamic Community Engagement of Users and Developers

Software Grants Current and emerging needs for using, managing, and analyzing the larger and more complex data sets inherent to biomedical Big Data  Compression/Reduction  Visualization  Provenance  Data Wrangling Area 2: Software and Systems Development

Big Data needs Big Computing Cloud Computing  Leveraging the cloud  Storing and analyzing huge data sets  Collaborative environment  Developing appropriate policies for use of controlled access data in the cloud (dbGaP)  Developing working relationships with major cloud providers  AWS, Google, Microsoft (Azure) HPC  More exploration with Supercomputing facilities Area 2: Software and Systems Development

Area 3: Training Enhancing computational training  Increase Number of Computationally Skilled Trainees  Strengthen the Quantitative Skills of All Researchers  Enhance NIH Review and Program Oversight

Area 4: Centers A. Investigator-initiated Centers B. NIH-specified Centers Establishing centers of excellence  Collaborative environments & technologies  Data integration  Analysis & modeling methods  Computer science & statistical approaches

Big Data to Knowledge (BD2K) bd2k.nih.gov

Biomedical Research as Part of the Digital Enterprise Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health

Myriad Data Types Other ‘Omic ImagingPhenotypic Clinical Genomic Exposure

Myriad Data Types Other ‘Omic ImagingPhenotypic Clinical Genomic Exposure

Components of The Academic Digital Enterprise  Consists of digital assets  E.g. datasets, papers, software, lab notes  Each asset is uniquely identified and has provenance, including access control  E.g. publishing simply involves changing the access control  Digital assets are interoperable across the enterprise

Let’s Break Down the Silos  New policies, regulations e.g. data sharing  Economic drivers  The promise of shared data

The NIH is Starting to Think About the Digital Enterprise Big Data to Knowledge (BD2K) bd2k.nih.gov

This is great, but BD2K is just a start, what will the end product look like?

To get to that end point we have to consider the complete research lifecycle

The Research Life Cycle will Persist IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION

Tools and Resources Will Continue To Be Developed IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication

Those Elements of the Research Life Cycle will Become More Interconnected Around a Common Framework IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication

New/Extended Support Structures Will Emerge IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication Commercial & Public Tools Git-like Resources By Discipline Data Journals Discipline- Based Metadata Standards Community Portals Institutional Repositories New Reward Systems Commercial Repositories Training

Thank You Questions?