National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana.

Slides:



Advertisements
Similar presentations
Genostar 2009 Genostar Bioinformatics Solutions Connecting, completing and exploring biochemical and genomic data with Metabolic Pathway Builder ChemAxon's.
Advertisements

SRI International Bioinformatics Comparative Analysis Q
An Introduction to “Bioinformatics to Predict Bacterial Phenotypes” Jerry H. Kavouras, Ph.D. Lewis University Romeoville, IL.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
What's going on in the environment? Getting a grip on microbial physiology with genomics and metagenomics Rob Edwards Fellowship.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Annotating Metagenomes Using the NMPDR Rob Edwards Department of Computer Sciences, San Diego State University Mathematics and Computer Sciences Division,
National Microbial Pathogen Data Resource About us NMPDR is a Bioinformatics Resource Center dedicated to the thorough understanding of core.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Annotations, Subsystems based approach Rob Edwards Argonne National Labs San Diego State University.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Influenza Research Database (IRD): A Web-based Resource for Influenza Virus Data and Analysis Victoria Hunt 1 *, R. Burke Squires 1, Jyothi Noronha 1,
Ch10. Intermolecular Interactions and Biological Pathways
Metagenomic Analysis Using MEGAN4
Tools for comparative genomics and expert annotations.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Genome-scale Metabolic Reconstruction and Modeling of Microbial Life Aaron Best, Biology Matthew DeJongh, Computer Science Nathan Tintle, Mathematics Hope.
ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University.
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Overviews and Omics Viewers. SRI International Bioinformatics Introduction Each overview is a genome-scale diagram of cellular machinery l Cellular Overview.
Chapter 13. The Impact of Genomics on Antimicrobial Drug Discovery and Toxicology CBBL - Young-sik Sohn-
Bioinformatics Dr. Víctor Treviño BT4007
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
Networks and Interactions Boo Virk v1.0.
Data and Dissemination Core 1. Overview and EFI Website – Heidi Imker, UIUC 2. EFI LabDB LIMS – Wladek Minor, UVA 3. SFLD – Patsy Babbitt, UCSF (post lunch)
Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.
Subsystem: Succinate dehydrogenase The super-macromolecular respiratory complex II (succinate:quinone oxidoreductase) couples the oxidation of succinate.
Top Four Essential TAIR Resources Debbie Alexander Metabolic Pathway Databases for Arabidopsis and Other Plants Peifen Zhang.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein and RNA Families
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Integration of Host Factor Data into the Virus Pathogen Database and Analysis Resource (ViPR) and the Influenza Research Database (IRD) Brett E. Pickett.
Operated by Los Alamos National Security, LLC for NNSA Bioscience Discovering virulence genes present in novel strains and metagenomes Chris Stubben IC.
SRI International Bioinformatics 1 Genome Browser Markus Krummenacker Bioinformatics Research Group SRI, International Q
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
EB3233 Bioinformatics Introduction to Bioinformatics.
Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic.
Copyright OpenHelix. No use or reproduction without express written consent1.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
SUPPLEMENTAL FIGURES AND TABLES. Supplementary Table 1: List of new and improved features in GSEA-P version 2 Java software. Examples and screenshots.
Annotation. Traditional genome annotation BLAST Similarities.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Module 5: Future 1 Canadian Bioinformatics Workshops
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Recent Developments and Future Directions in Pathway Tools Peter D. Karp SRI International.
Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment Raja Jothi, Teresa.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
The Integrated Microbial Genome (IMG) systems
Comparative Analysis in BioCyc
Bioinformatics Research Group
Basics of Comparative Genomics
Predicting Active Site Residue Annotations in the Pfam Database
Comparative Analysis Q
Basics of Comparative Genomics
Annotations, Subsystems based approach
Presentation transcript:

National Microbial Pathogen Data Resource Connecting Bioinformatics to the Bench Leslie Klis McNeil NCSA, University of Illinois, Urbana

NMPDR is a BRC NIAID Bioinformatic Resource Centers  common goals  different focus organisms Provide annotations and tools to develop diagnostics and therepeutics against Priority Pathogens NMPDR core organisms, all category B:  Campylobacter jejuni  Listeria monocytogenes  Staphylococcus aureus  Strepcococcus pyogenes and pneumoniae  Vibrio cholerae, vulnificus, parahaemolyticus

Sister BRCs focus on other priority pathogens Unified port of entry at Eight BRCs curate viruses, protozoa, and bacteria, or insect vectors of diseaseBRCs

Who is NMPDR Fellowship for Interpretation of Genomes  Primary software developers  Curators who do manual annotation Computation Institute at University of Chicago  Software developers  Hardware managers Argonne National Laboratory  Software developers NCSA University of Illinois at Urbana  Education, outreach, training

What is NMPDR Genome database with value added  Manual annotation in context of systems biology  Comparative analysis tools Bidirectional Best Hits—select and align Functional clusters—genes with conserved proximity Compare regions—adjust size of region, number of genomes Pinned regions—phylogenetic comparison with all genomes Signature genes—find genes in common or that distinguish user- selected groups of genomes; groups may contain one or many  Essential genes page  Drug target discovery and in silico screening  Organism pages with phenotype information

Pathogen-specific gateways to data

Outreach services in the user interface User forum links to iLabs with Inquiry Units for teaching and trainingUser forumInquiry Units PathInfo—VBI’s PIML project, info about  General info and strain descriptions  Lab handling and safety  Epidemiology Journals button opens most recent, relevant ASM articles Google news—RSS feed of popular press Links to resources such as strain collectionsresources

Annotation Status Table Immediate access to genes whose functions are known with some degree of certainty  Named genes in subsystems  Named genes not in subsystems  Hypothetical genes in subsystems Gateway to genes about which nothing is known  Hypothetical genes not in subsystems List of genes with links to NMPDR analysis tools Exploration in comparative framework first step to formulating working hypotheses about functions

Pathways to Data Start with keyword search for name of gene or proteinkeyword search Start with sequence of your gene or protein and blast against any complete genomeblast Start by browsing an organism of interestbrowsing an organism  View lists of proteins with/without functional names; included/not in biological subsystem. Choose one from the list to investigate with comparative tools. Start from subsystems tree to view the phylogenetic distribution of an interesting biological processsubsystems tree Start from essential genes page to view essential genes in model organisms and to project essentiality to closely or distantly related organismsessential genes Start from virtual structural proteomes to investigate proteins about which structural information is available in PDBvirtual structural proteomes

Subsystems approach to genome annotation Subsystems annotation provides researchers with corrected functional annotations in a structured biological context Consistency across genomes achieved by vertical annotation of functions rather than horizontal focus on single genomes More than 500 distinct subsystems have been developed  Metabolic pathways  Complex structures  Genotype – phenotype associations Subsystems integrate genomic and functional contexts of genes in metabolic reconstructions or populated subsystem spreadsheets Metabolic reconstructions summarize all subsystems in a given genome Populated subsystems compare all genomes in a given subsystem

What is a Subsystem? Subsystem is a generalization of pathway  Collection of functional roles jointly involved in a biological process or complex metabolic, signaling, regulatory, structural Functional Role is the abstract biological function of a gene product  Atomic or fundamental; examples: 6-phosphofructokinase (EC ) LSU ribosomal protein L31p cell division protein FtsZ

Expert-Defined Subsystems Curator is researcher with first-hand knowledge of biological system Functional roles defined and grouped into subsystem and subsets by curator  universal groups of roles include all organisms  functional variants are subsets of roles found in a limited number of organisms often represent alternative paths

Populated Subsystems Two-dimensional integration of functional roles with genomes  universal groups of roles include all organisms  functional variants are subsets of roles found in a limited number of organisms Spreadsheet  Columns of functional roles  Rows of organisms  Cells of annotated genes Table of functional roles with GO terms Diagram

Simple Example: Histidine Degradation Subsystem Conversion of histidine to glutamate is organizing principle Functional roles defined in table:

Subsystem Diagram Three functional variants Universal subset has three roles, followed by three alternative paths from IV to VI

Subsystem Spreadsheet Column headers taken from table of functional roles Rows are selected genomes, or organisms Cells are populated with specific, annotated genes Shared background color indicates proximity of genes Functional variants defined by the annotated roles Variant code -1 indicates subsystem is not functional OrganismVariant HutHHutUHutIGluFHutGNfoDForI Bacteroides thetaiotaomicron 1 Q8A4B3Q8A4A9Q8A4B1Q8A4B0 Desulfotela psychrophila 1 gi gi gi gi Halobacterium sp. 2 Q9HQD5Q9HQD8Q9HQD6Q9HQD7 Deinococcus radiodurans 2 Q9RZ06Q9RZ02Q9RZ05Q9RZ04 Bacillus subtilis 2 P10944P25503P42084P42068 Caulobacter crescentus 3 P58082Q9A9MIP58079Q9A9M0 Pseudomonas putida 3 Q88CZ7Q88CZ6Q88CZ9Q88D00 Xanthomonas campestris 3 Q8PAA7P58988Q8PAA6Q8PAA8 Listeria monocytogenes Subsystem Spreadsheet

Missing Genes Noticed by Subsystems Annotation No genes were annotated “ForI (EC ) Formiminoglutamic iminohydrolase” when the Histidine Degradation subsystem was populated Organisms missing ForI convert His to Glu Candidate genes that could perform the role “ForI” must be identified Strategy for finding genes is based on chromosomal clustering and occurrence profiling

Finding Genes that Cluster with NfoD Green gene is NfoD of XanthomonasNfoD of Xanthomonas Blue genes within 10 kb of NfoD in at least four other species finds biggest clusters in other species fc-sc shows table of homologous pairs in other genomes displays homologous regions in other genomes

What are Pinned Regions? Focus gene is number 1, colored red Most frequently co-localized homolog numbered 2, colored green Homologous genes presented in the same color with the same numerical label Numerical labels correspond to rank ordered frequency of co-localization with the focus gene  Focus gene labeled 1  Gene 17 is homolog 16th most frequently co- localized with focus gene

Homologous regions around NfoD, red, centerHomologous Same color indicates homology BLAST cutoff 1e-20 HutH, the first functional role in the subsystem, is green, 2 Candidate ForI is pink, 4, “conserved hypothetical” Candidate ForI in Context with NfoD

Annotation of ForI EC Metabolic context proves need for role  Organisms missing annotated ForI degrade His to Glu Chromosomal context points to candidate  Clusters with NfoD and other genes in subsystem Occurrence context supports candidate  Organisms containing NfoD lack GluF and HutG, required for functional variants 1 and 2, respectively  Organisms containing candidate ForI also contain NfoD, indicating functional variant 3 Phylogenetic trees of candidate ForI genes are coherent

Conjectures archived in HOPS Hypotheses and Open Problems identified by Subsystems  HOPS linked from NMPDR’s FAQ HOPS Subsystems point to missing or alternative genes Bioinformatic predictions need to be tested at the bench ForI candidate now verified experimentallyverified Connections forged between bench and bioinformatics

Bioinformatics to Bench Essential genes page at NMPDR  Click bar to search for essential genes  Follow NMPDR link to compare with other genomes

Candidate Drug Targets First-draft table (manually derived) links to biochemical data in BRENDA or TCDB Candidate proteins  essential in at least one of the NMPDR pathogens  included in subsystems by our curators  orthologs in the Protein Data Bank  orthologs in a substantial number of bacterial priority pathogens curated in the BRC system Second-draft table to be automatically generated  annotations include essential for growth or virulence  PDB and pathogen orthologs  No good hit in host  targets without crystallized orthologs suggested to HTS project at Argonne National Laboratory

NMPDR efforts feed into high-throughput structure project at Argonne

In Silico Screening Targets docked with 10 K random compounds as training set Neural network program tracks 9 properties of compounds to learn characteristics of those that bind and those that do not ZINC compound db screened to find 10K likely binders predicted to be ligandsZINC Targets docked against 10K predicted ligands on BlueGene with Dock5 Top 1000 docked compounds soon to be linked to NMPDR

IBM BlueGene Supercomputer World’s fastest Supercomputer 280 TeraFLOPS

Live Demo of NMPDR From essential genes, click H.pylori, then click NMPDR for first proteinessential genes Show compare regions  Possible to increase/decrease size of region  Possible to “walk” chromosome  Possible to include more genomes--type in 10 and click resubmit Click on the homologous gene 1 in the second genome, Campylobacter Ask, is this function also essential in Campy,is this a good drug target? Investigate the campy homolog by using Pins, Compare Regions, find best clusters (CL) What is the pathway or biological system that this protein is essential for?  IF not included in a subsystem by NMPDR curators, follow alias link to KEGG Pathway is lysine biosynthesis—Ask:  Does this protein catalyze the rate-limiting step?  Is this the best function in this pathway to target for inhibition by a drug?  Does this protein have a close structural/functional homolog in human or PDB? Use BLAST to find homologs.  Is this a broad or narrow spectrum target? Show all homologs using Bidirectional Best Hits button.