Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.

Slides:



Advertisements
Similar presentations
Unravelling the biochemical reaction kinetics from time-series data Santiago Schnell Indiana University School of Informatics and Biocomplexity Institute.
Advertisements

Biological pathway and systems analysis An introduction.
Darwinian Genomics Csaba Pal Biological Research Center Szeged, Hungary.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Gene Ontology John Pinney
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
System Biology Study Group Walker Research Group Spring 2007.
Office of Science Office of Biological and Environmental Research Susan K. Gregurick, Ph.D. Program Manager Computational Biology & Bioinformatics Biological.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Adaptive evolution of bacterial metabolic networks by horizontal gene transfer Chao Wang Dec 14, 2005.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
The diversity of genomes and the tree of life
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Ch10. Intermolecular Interactions and Biological Pathways
Cytoscape A powerful bioinformatic tool Mathieu Michaud
Development of Bioinformatics and its application on Biotechnology
Shankar Subramaniam University of California at San Diego Data to Biology.
DOE Genomics: GTL Program IT Infrastructure Needs for Systems Biology David G. Thomassen Office of Biological and Environmental Research DOE Office of.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
CSE 6406: Bioinformatics Algorithms. Course Outline
Problem Statement and Motivation Key Achievements and Future Goals Technical Approach Investigators: Yang Dai Prime Grant Support: NSF High-throughput.
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
Igor Ulitsky.  “the branch of genetics that studies organisms in terms of their genomes (their full DNA sequences)”  Computational genomics in TAU ◦
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Model-Driven Analysis Frameworks for Embedded Systems George Edwards USC Center for Systems and Software Engineering
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
An Automated System for Deep Proteome Annotation Gary Van Domselaar †, Savita Shrivastava, Paul Stothard and David S. Wishart ‡ Unannotated Protein Sequence.
Copyright © 2009 Pearson Education, Inc. Genomics, Bioinformatics, and Proteomics Chapter 21 Lecture Concepts of Genetics Tenth Edition.
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
ASCAC-BERAC Joint Panel on Accelerating Progress Toward GTL Goals Some concerns that were expressed by ASCAC members.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
IPG2P Working Group Update. iPG2P Final deliverable: – Procedure allowing an investigator to begin with trait of interest in species possessing limited.
Systems Biology ___ Toward System-level Understanding of Biological Systems Hou-Haifeng.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Protein and RNA Families
EB3233 Bioinformatics Introduction to Bioinformatics.
An overview of Bioinformatics. Cell and Central Dogma.
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Module 5: Future 1 Canadian Bioinformatics Workshops
BINF6201/8201: Molecular Sequence Analysis Dr. Zhengchang Su Office: 351 Bioinformatics Building Office hours: Tuesday and Thursday:
Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods > Capable of incorporating domain knowledge > Effective,
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
High throughput biology data management and data intensive computing drivers George Michaels.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
BME435 BIOINFORMATICS.
Metagenomic Species Diversity.
KnowEnG: A SCALABLE KNOWLEDGE ENGINE FOR LARGE SCALE GENOMIC DATA
Themes of Biology Chapter 1
Sequence based searches:
University of Pittsburgh
High-throughput Biological Data The data deluge
Model-Driven Analysis Frameworks for Embedded Systems
Genome Annotation Continued
Predicting Active Site Residue Annotations in the Pfam Database
PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics.
Presentation transcript:

Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an unprecedented collection of molecular and functional information for a wide range of model organisms. The collected output of these efforts will provide the basis for assembling a detailed description of the cell from which we may begin to build models for simulating intracellular molecular and biochemical processes to understand and predict the dynamic behavior of living cells. Previous work in biochemical network, genomic, and cell simulation has led to models of well-characterized biochemical pathways. These projects, however, have fallen short of developing a scalable hierarchical integrative model of the cell that incorporates gene regulation, metabolism, signaling, and transport in a spatial modeling framework designed to scale to petaflops computer platforms and beyond. We are developing an integrated environment that will enable the construction of multilevel computational metabolic models for prokaryotic organisms and microbial communities and will allow researchers to perform multilevel comparative and evolutionary analysis of biological data. This environment will contain data and computational tools required for all steps of metabolic modeling of microbial strains in silico. We believe that the development of a comprehensive model of biosystems and biosimulation requires the following: 1.model development coupled with experimental groups developing new cell and molecular biology analysis and assay methods; 2.multiple levels of abstraction; 3.interface definitions will be needed for sharing model components and descriptions; and 4.Model integration, with the various pieces coming from disparate labs and multiple disciplines. The goal of such effort is to gain a comprehensive understanding of microbial metabolism of a single organism and microbial communities. Development of such models is based on close interaction and extensive data exchange with the experimental component of the project.

Sequence Analysis Module Whole Genome Analysis and Architecture Module Experimentation Proteomics Networks Analysis Module Metabolic Simulation Phenotypes Module Metabolic Engineering Gene Functions Assignments Experimentation Conjectures about Gene Functions Gene Annotations Annotated Data Sets Visualization Genome Features Annotated Genome Maps Genomes Comparisons Visualization Metabolic Reconstructions (Annotated stoichiometric Matricies) Operons, regulons networks Predictions of Regulation Predictions of New pathways Functions of Hypotheticals Networks Comparisons Conserved Chromosomal Gene Clusters

GADU Framework Solution Implementation GADU – an Automated Pipeline to Support Analysis of the Genomes in GWiz Data Acquisition Module Data Analysis Module Data Storage Module Integrated environment for high-throughput analysis of genomes and reconstruction of genetic networks from the sequence data includes: 1. The supporting database containing: data obtained from various electronic data sources; computational models, results of computational and experimental (in the future) analyses of the genomes and gene networks 2. High-throughput genetic sequence analysis module consisting of: a computational infrastructure, tools and algorithms for high-throughput assignments of function to the genes in sequenced genomes (SVM- and HMM-based, voting algorithms, etc) The results of automated and interactive genetic sequence analysis of ~80 prokaryotic genomes 3. Metabolic and regulatory networks reconstruction module contains: Tools and algorithms for reconstruction, representation, navigation and analysis of metabolic and regulatory networks Reconstructions of metabolic and regulatory networks for at least 50 organisms 4. A library of hypotheses for experimental validation concerning functions of hypothetical proteins and architecture of metabolic and regulatory networks

Knowledge Base (ANL) Subunits DB (ANL) COGs BLAST Hobacgen PhyloBLOCKS (ANL) Sources of Protein Families SVM -Based Classification Characterization of Protein Families ( ANL/ORNL ) Motivation: The current resolution of tools such as BLOCKs and Pfam are unable to Discriminate closely related homologous sequence. Motivation: To develop a library of BLOCKs HMM profiles corresponding to particular enzymatic functions or evolutionary versions of enzymes. Output: 1. A library of SVM models for identification of certain enzymatic functions and 2. computational tools for predictions of protein functions based on these models. Applications: 1. Identification of conserved amino acid residues responsible for the functionality of a protein sequence. 2. Automated class- ification and prediction of protein function. Applications: 1. Identification,classification and characterization of proteins utilizing refined BLOCKs. 2. Phylogenetic analysis using BLOCKs distribution to Identify convergent/divergent evolution. Output: Refined BLOCKs specific for particular enzymatic functions. Tools and Algorithms for Genetic Sequence Analysis

The availability of phylogenetically diverse sequence data and the development of comprehensive bioinformatics methods now allow for the thorough investigations of evolutionary origins of metabolic pathways. A number of evolutionary mechanisms participate in establishing enzymatic functions. These include: divergent evolution or enzyme recruitment, convergent evolution or non-homologous replacement, horizontal transfer of genes from one organism to another and inheritance of a biological pathway from an ancestor. Uncovering and understanding the evolutionary history of metabolic pathways could provide information about past and present metabolic and evolutionary potential of a species. It can also help to guide engineering and the discovery of new metabolic activities. Challenges: 1. Incompleteness of the domains and motifs libraries. Currently available domains and motifs libraries (e.g. InterPro, BLOCKs) – while containing a wealth of information for characterization and identification of proteins, they are still incomplete and do not yet contain information for a large number of enzymatic functions. 2. Low resolution of some sequence profiles. Some of the sequence profiles from the domain libraries can identify large protein families (e.g. aminotransferases), but are unable to discern specific enzymatic functions.

Molecular Machines Gene & Chemical Networks Earth’s Macro cycles Whole Cells Cell-Cell Interactions Communities Hierarchical Simulation The simulation software will represent models at multiple scales and will provide toolkits for building corresponding simulations. There will be a systems interconnect to allow reuse of model components, reuse of simulation components, and workflow spanning bioinformatics (GWiz), simulation, analysis, and visualization.