Office of Science Office of Biological and Environmental Research Susan K. Gregurick, Ph.D. Program Manager Computational Biology & Bioinformatics Biological.

Slides:



Advertisements
Similar presentations
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Advertisements

U.S. Department of Energy’s Office of Science Basic Energy Sciences Advisory Committee Dr. Daniel A. Hitchcock October 21, 2003
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
Wrapup. NHGRI strategic plan What does the NIH think genomics should be for the next 10 years? [Nature, Feb. 2011]
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Peer Assessment of 5-year Performance ARS National Program 301: Plant, Microbial and Insect Genetic Resources, Genomics and Genetic Improvement Summary.
Structural Genomics – an example of transdisciplinary research at Stanford Goal of structural and functional genomics is to determine and analyze all possible.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Office of Science Office of Biological and Environmental Research J Michael Kuperberg, Ph.D. Dan Stover, Ph.D. Terrestrial Ecosystem Science AmeriFlux.
GTL User Facilities Facility II: Whole Proteome Analysis Michelle V. Buchanan.
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
Plan Introduction What is Cloud Computing?
 The institute started in 1989 as a UNDP funded project called the National Agricultural Genetic Engineering Laboratory (NAGEL).  The Agricultural.
The BIO Directorate Microbial Biology Emphasis BIO Advisory Committee April, 2005.
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
Genome-scale Metabolic Reconstruction and Modeling of Microbial Life Aaron Best, Biology Matthew DeJongh, Computer Science Nathan Tintle, Mathematics Hope.
Role of Deputy Director for Code Architecture and Strategy for Integration of Advanced Computing R&D Andrew Siegel FSP Deputy Director for Code Architecture.
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
U.S. Department of Energy Office of Science Advanced Scientific Computing Research Program NERSC Users Group Meeting Department of Energy Update September.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Sage Bionetworks A non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Office of Science Office of Biological and Environmental Research DOE Workshop on Community Modeling and Long-term Predictions of the Integrated Water.
ASCAC-BERAC Joint Panel on Accelerating Progress Toward GTL Goals Some concerns that were expressed by ASCAC members.
Bioinformatics Core Facility Guglielmo Roma January 2011.
IPG2P Working Group Update. iPG2P Final deliverable: – Procedure allowing an investigator to begin with trait of interest in species possessing limited.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Committee membership Chris Somerville (Chair) Michelle S. Broido (BERAC Chair) John Pierce Margaret Riley Mel Simon.
Genomes To Life Biology for 21 st Century A Joint Initiative of the Office of Advanced Scientific Computing Research and Office of Biological and Environmental.
Value-Based Prioritization. Why do we need a process for prioritization? Transparency Different target communities Different opinions Allows evaluation.
Data Integration and Management A PDB Perspective.
Sage Bionetworks A non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Digital Design of Materials: Some Thoughts for a Closing Group Discussion Andrei Ruckenstein Boston University.
Cyberinfrastructure: An investment worth making Joe Breen University of Utah Center for High Performance Computing.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure.
DuraCloud Open technologies and services for managing durable data in the cloud Michele Kimpton, CBO DuraSpace.
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
High throughput biology data management and data intensive computing drivers George Michaels.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Jennie Larkin, PhD Senior Advisor
CyVerse Tools and Services
Tools and Services Workshop
Joslynn Lee – Data Science Educator
GO-FAANG Workshop 7-8 October 2015
KnowEnG: A SCALABLE KNOWLEDGE ENGINE FOR LARGE SCALE GENOMIC DATA
Summit 2017 Breakout Group 2: Data Management (DM)
Joseph JaJa, Mike Smorul, and Sangchul Song
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
EOSCpilot All Hands Meeting 8 March 2018 Pisa
Cyberinfrastructure for the Life Sciences
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Presentation transcript:

Office of Science Office of Biological and Environmental Research Susan K. Gregurick, Ph.D. Program Manager Computational Biology & Bioinformatics Biological and Environmental Research DOE Systems Biology Knowledgebase: A community effort in microbial, plant and metagenomic sciences BERAC September 16-17, 2010

Department of Energy Office of Science Biological and Environmental Research 2 Kbase September 2010 A Systems Biology Knowledgebase for Energy and the Environment  Knowledgebase: Cyber infrastructure to integrate, search and visualize, in an open environment, experimental data, associated information (metadata), corresponding models and analysis tools. Enables researchers to i) ask questions about experiments and data, ii) construct new experiments or new models and simulations and iii) to facilitate collaboration with colleagues Enables DOE program(s) to maintain a digital archive of our research data and the corresponding information and analysis methods.  Unlike other database efforts, the DOE Systems Biology Knowledgebase is focused along DOE Science Objectives in Microbial, Plant and Community sciences.

Department of Energy Office of Science Biological and Environmental Research 3 Kbase September 2010 DOE Systems Biology Knowledgebase Tools Meta- data Data DOE Systems Biology Knowledgebase Establishing a systems biology modeling framework Open-Access Data and Information Exchange Flexible user interfaces Easy data retrieval Environment for in silico experimentation Open-Access Data and Information Exchange Flexible user interfaces Easy data retrieval Environment for in silico experimentation Open Development of Open-Source Software and Tools Analysis and visualization In silico experimentation Tracking and evaluation of tool use Open Development of Open-Source Software and Tools Analysis and visualization In silico experimentation Tracking and evaluation of tool use Community-Wide Stewardship User, Standards, and Advisory committees Value-added analysis Training, tutorials, and support Community-Wide Stewardship User, Standards, and Advisory committees Value-added analysis Training, tutorials, and support Data generators Data users Software and tool developers Seamless Submission and Incorporation of Diverse Data Standards for data, metadata Quality control and assurance Automated data handling Seamless Submission and Incorporation of Diverse Data Standards for data, metadata Quality control and assurance Automated data handling

Department of Energy Office of Science Biological and Environmental Research 4 Kbase September 2010 The Knowledgebase leverages Genomic Sciences as much as it serves Genomic Sciences JGI Sequencing Bioenergy Research Plant Feedstocks for Bioenergy Carbon Cycling Processes Genome Annotation Metabolic Modeling Computational Biology KBASE: Integrate Science Across Activities Foundational Research There is a tremendous wealth of data and information in the Genomic Sciences program. The Knowledgebase is an opportunity to integrate this data and information both within individual activities as well as to integrate together different activities.

Department of Energy Office of Science Biological and Environmental Research 5 Kbase September 2010 The Process to formulate an Implementation Plan for the Knowledgebase March 2009: DOE Systems Biology Knowledgebase for a New Era in Biology Workshop Report. This was a mission needs workshop establishing community need for a Knowledgebase July 2009: Recovery Act funds Knowledgebase R & D project to support the research and development of an implementation strategy for the Systems Biology Knowledgebase. Community Workshops ( participants each) Supercomputing, Nov Plant and Animal Genome XVIII, Jan DOE Genomic Science Grantee Workshop, Feb JGI Users Meeting, March 2010 Synthesis Workshop (80 participants) June, 2010 Pilot Projects and Infrastructure- Develop bioinformatics software and capabilities for the ASCR Magellan cloud architecture. Kandinsky, a cloud cluster as a test bed for storing and analyzing experimental data

Department of Energy Office of Science Biological and Environmental Research 6 Kbase September 2010 Outline for Overall Architecture of Knowledgebase: Science Objectives, Implementation and Computing Architecture During the Design Process for Kbase, Scientists were asked to: 1). Define a long term measure for science in their area 2). Define 6-8 key objectives that could be met in the near, mid and longer term 3). Prioritize these objectives from High to Moderate to Low 4). Develop a detailed implementation strategy for the high priority objectives  Biological scientists worked with computer scientists, data management and partner scientists to develop a correspondingly detailed computer architecture implementation strategy.

Department of Energy Office of Science Biological and Environmental Research 7 Kbase September 2010 Knowledgebase Science Objectives in Three Key Areas: Microbial Sciences Long Term Goal: Rapidly reconstruct metabolic and regulatory pathways for microbes with comparative reconstructions at 90% accuracy for growth and phenotypic characteristics. Integrate Data with Genomic Function: Represent experimental data to inferred knowledge about genes and genomes Reconstruction, Prediction, and Manipulation of Metabolic Networks: Integrate new experimental data and automatically create metabolic reconstructions Gene Expression Regulatory Networks: Enable automated inference of gene expression and regulatory networks and extend networks to include additional experimental data types

Department of Energy Office of Science Biological and Environmental Research 8 Kbase September 2010 Knowledgebase Science Objectives in Three Key Areas: Plant Sciences Long Term Goal: Integrate experimental data with key plant genomes, including real-time field data. Associate experimental data with plant phenotype and predict relationship between phenotype to genotype to environment Integrate experimental data with plant genomic sequences: Integrate key types of ‘omics data and associated quality and metadata to DOE priority plant genomes, including integration of field data. Assemble Regulatory ‘Omics Data: For target plant species enable analysis, comparisons and modeling Semi-automated inference and simulation of plant metabolic and regulatory networks

Department of Energy Office of Science Biological and Environmental Research 9 Kbase September 2010 Knowledgebase Science Objectives in Three Key Areas: Microbial Community Sciences (Integrated Meta ‘Omics) Long Term Goal: Integrate experimental ‘omics data with reference metagenomics sample sequences. Develop capabilities for metabolic reconstructions and modeling in natural microbial communities. Understand microbial diversity and poorly characterized genes: Link physiological and metabolic data sets to metagenome sequences Enable modeling of metabolic processes within a microbial community From partial single microbial genome found within microbial communities, predict isolated or community growth

Department of Energy Office of Science Biological and Environmental Research 10 Kbase September 2010 Knowledgebase Implementation Timeline Construct repository for experimental microbial data Develop workflows Analysis and programs repository Develop methods for grown simulations Integrate field data into Kbase (with iPlant) Develop reference metagenomic data sets repository Extend phylogenetic analysis methods for metagenomes Develop new methods for metabolic modeling of microbial communities Develop for metabolic and regulatory modeling of plants Extend data integration for plant phenotypes Develop on-the-fly data analysis capabilities data Extend repository for imaging and spectroscopic data Comparative data and analysis methods

Department of Energy Office of Science Biological and Environmental Research 11 Kbase September 2010 Critical Partnerships: Joint Genome Institute: DOE’s premier high throughput sequencing user facility for Energy and the Environment. Advanced Scientific Computing Research: DOE’s office of computing research places a high priority on computing at the exascale. National Center for Biotechnology Information: The major repository of primary sequence and related ‘omics and biomedical data. NSF-funded iPlant Collaborative (iPlant): A 5-year, $50 million project driven by the needs of the plant science research community. Discussion of involvement also includes NCI, Google and Amazon

Department of Energy Office of Science Biological and Environmental Research 12 Kbase September 2010 Knowledgebase Architecture Host and integrate diverse biological data sets Provide both high performance and scalable computational resources Support a large user community with tools and services To meet these requirements, the Kbase must be designed with a highly elastic architecture that enables continual expansion and scaling to accommodate new data, computational platforms and software innovations. User Environment Core Kbase Services Data Management Data Management Workflow Services Workflow Services Federated Kbase Computational Platform Federated Kbase Computational Platform Operations and Support Software Engineering

Department of Energy Office of Science Biological and Environmental Research 13 Kbase September 2010 Knowledgebase Architecture Milestones: Computational Platform, federated system from Cloud to HPC Data and Workflow Services, including data access and searching Core Kbase Services, Application Programming Interface (API) and tools for analysis User Environment, including linking to community analysis programs Operational Support and Maintenance FOA Enabling Methods and Pilots (release 2) (18 month release version 1)

Department of Energy Office of Science Biological and Environmental Research 14 Kbase September 2010 Knowledgebase Architecture Overview A summary of the Kbase Cloud: User Access through a Kbase Core Front End Kbase Core creates a Virtual Environment to allow users to work on different problems, seamlessly Cloud resources support data storage and analysis at many locations, independent of users. Leverage ASCR Magellan, HPC, NERSC and Amazon EC2 and S3 Kbase Core Front End User Access and Infrastructure Layer

Department of Energy Office of Science Biological and Environmental Research 15 Kbase September 2010 Description of Existing Pilots funded by Recovery Act Analysis Tools: Arkin, LBNL: Develop Microbes-On-Line metabolic modeling interface for analysis and visualization within the Google Framework (Google-line Application for Metabolic Maps (GLAMM)). Meyer, ANL: Benchmark bioinformatics analysis programs on HPC and Cloud systems. Markowitz, LBNL/JGI: Develop JGI Metagenomic analysis pipelines for HPC and Cloud systems Infrastructure Tools: Gorton, PNNL: Prototyping a Service Oriented Architecture (SOA) for storing and accessing biology data in a Cloud computing environment. Kleese van Dam, PNNL: Develop semantic technologies to ease, speed up and improve scientific workflows in systems biology

Department of Energy Office of Science Biological and Environmental Research 16 Kbase September 2010 DOE Office of Science FOA DE-FOA : Computational Biology and Bioinformatic Methods to Enable a Systems Biology Knowledgebase Total $15 million over three years, funds 11 projects, starting 9/15/2010 Annotation: New methods for computational gene annotation that include integration of data and information into the assignment of gene functions 'Omic Data Integration: New computational methods to integrate multiple data types including (meta)genomic, proteomic, metabolomic, transcriptomic, expression and phenotypic data Integrated Pathway Reconstructions: Significant improvements in methodologies to couple metabolic and regulatory pathways and including integration of data and information Whole Cellular Simulations: New methods to model complex cellular processes

Department of Energy Office of Science Biological and Environmental Research 17 Kbase September 2010 Better Interpretation and Design of Future Experiments Systems Biology Experiments Computational Bioinformatics Data Processed and Inferred Application Programming Interface (API) Bioinformatics Tool Development Knowledgebase Core Infrastructure Scientific Community

Office of Science Office of Biological and Environmental Research Thank you! Susan Gregurick