An Automated System for Deep Proteome Annotation Gary Van Domselaar September 27, 2003.

Slides:



Advertisements
Similar presentations
The use of Ontology in Organising and Managing Protein Family Resources Katy Wolstencroft, University Of Manchester.
Advertisements

Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g 詹濠先.
Curation of the EcoCyc Database: The EcoCyc Update Project Martha Arnaud Scientific Database Curator Bioinformatics Research Group SRI International
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Archives and Information Retrieval
Proteome Analyst Transparent High-throughput Protein Annotation: Function, Localization and Custom Predictors.
Protein Databases EBI – European Bioinformatics Institute
The Cell, Central Dogma and Human Genome Project.
MCSG Site Visit, Argonne, January 30, 2003 Genome Analysis to Select Targets which Probe Fold and Function Space  How many protein superfamilies and families.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
The Protein Data Bank (PDB)
Protein Modules An Introduction to Bioinformatics.
Bioinformatics. Analysis of proteomic data. Dr Richard J Edwards 28 August 2009; CALMARO workshop. ©Gary Larson (In not much detail)
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
On line (DNA and amino acid) Sequence Information
Ch10. Intermolecular Interactions and Biological Pathways
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
Archives and Information Retrieval
Tomato genome annotation pipeline in Cyrille2
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
© Wiley Publishing All Rights Reserved. Protein and Specialized Sequence Databases.
Biological Databases By : Lim Yun Ping E mail :
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
BioHealthBase: A Web-based Database and Analysis Resource for Francisella Shubhada Godbole 1, Jyothi Noronha 1, Burke Squires 1, Victoria Hunt 1, Ed Klem.
Part I: Identifying sequences with … Speaker : S. Gaj Date
An Automated System for Deep Proteome Annotation Gary Van Domselaar †, Savita Shrivastava, Paul Stothard and David S. Wishart ‡ Unannotated Protein Sequence.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
Savita Shrivastava Feb 25 th, 2005 Lab Presentation BASys A Web Server for Automated Bacterial Annotation.
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Copyright OpenHelix. No use or reproduction without express written consent1.
Labeling and Enhancing Life Science Links S. Heymann*, F. Naumann*, L. Raschid +, P. Rieger * * Humboldt Universität zu Berlin + University of Maryland.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
A collaborative tool for sequence annotation. Contact:
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
Detecting Protein Function and Protein-Protein Interactions from Genome Sequences TuyetLinh Nguyen.
High throughput biology data management and data intensive computing drivers George Michaels.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
Bioinformatics Overview
Networks and Interactions
Archives and Information Retrieval
Display of Near Optimal Sequence Alignments
Mangaldai College, Mangaldai
INFORMATION FLOW AARTHI & NEHA.
TAMU Bovine QTL db and viewer
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Overview of Enzyme, Protein and Network Databases
Presentation transcript:

An Automated System for Deep Proteome Annotation Gary Van Domselaar September 27, 2003

The Problem Most existing biological databases have a narrow biological aspect. –PDB: biomolecular coordinate data –Ensembl: human gene predictions –GO: Genome Ontology (process, function, location) Each has a custom interface Each can answer questions in its own domain but cannot answer question that span multiple domain boundaries. ‘Which human gene products located in the endoplasmic reticulum have experimental coordinate data?’

The Solution: Integrated Biological Databases. 3 main approaches: 1.Link Integration. Researchers begin their query with one data source, then follow hypertext links to related information in other data sources. Example: DAS, NCBI link out. 2.View Integration. A ‘super interface’ is created that makes the source databases appear as one. Example: Kleisli. 3.Data Warehousing. All the data is brought under one roof. Example: Genecards, GeneMine, Cybercell database.

An Automated Proteome Annotation System for Proteome Analyst Proteome Analyst provides annotations in the form of a ‘PA Card’

An Automated Proteome Annotation System for Proteome Analyst

Proteome Analyst provides annotations in the form of a ‘PA Card’ This system will provide a much fuller set of annotations

Annotations 2D_Gel_Image Accession_No. Alternate_Names Availability Centisome Position Cofactors Copy Number Cys/Met_Content EC_Number Entry_ID Following_Gene Gene_Name Gene_Ontology Gene_Position General_Function General_Reaction Gene_Sequence Quaternary_Structure Resolution Riley_Cell_Function Riley_Gene_Function RNA_Copy_No. Secondary_Structure Sequence Similarity Specific_Activity Specific_Function Specific_Reaction Structure_CLASS Substrates SWISS_PROT_(AC_&_ID) Theoretical_pI Transmembrane Upstream_100_bases Homologues Important_Sites Inhibitor Interacting_Partners Kcat_Value_[1/min] Km_Value_[mM] Location Metabolic_Importance Metals_Ions Molecular_Weight No._of_Amino_Acids Other_Databases Paralogues Pfam_Domain/Function Preceding_Gene Products PROSITE_Motif

Concept Genomic Sequence Data

Concept Genomic Sequence Data Genomic data analysis must be tailored to the major kingdoms: viruses prokaryotes Eukaryotes - Genscan } Glimmer Genomic Sequence Data

Concept Genomic Sequence Data Proteomic Sequence Data Gene Identification and Translation

Concept Genomic Sequence Data Proteomic Sequence Data Proc essin g Gene Identification and Translation

Concept Genomic Sequence Data Proteomic Sequence Data Proc essin g Gene Identification and Translation Inter nal Proce ssing

Concept Genomic Sequence Data Proteomic Sequence Data Proc essin g Gene Identification and Translation Inter nal Proce ssing

Concept Genomic Sequence Data Proteomic Sequence Data Proc essin g Gene Identification and Translation Secondary Structure Homology Modeling Mol. Wt pI Etc. Inter nal Proce ssing

Concept Genomic Sequence Data Proteomic Sequence Data Proc essin g Inter nal Proce ssing Gene Identification and Translation Internal DBs

Concept Genomic Sequence Data Proteomic Sequence Data Proc essin g Inter nal Proce ssing Gene Identification and Translation CCDB: a deeply annotated database for E. coli. CCDB++ other deeply annotated model organisms from each kingdom SWISS-PROT PDB Internal DBs

Cybercell (CCDB) A comprehensive collection of detailed enzymatic, biological, chemical, genetic, and molecular biological data about E. coli (strain K12, MG1655).

Concept External DBs Genomic Sequence Data Proteomic Sequence Data Proc essin g Exter nal Proce ssing Inter nal Proce ssing Internal DBs Gene Identification and Translation

Data Sources GenBank SwissProt Prosite pI/MW Tool Geneiz PIR PEC/Shigen Echobase Wisconsin ExpressDB GeneOntology GenProtEC EcoGene PsiPred EcoCyc PDB CATH Swiss2D PAGE SwissModel BRENDA TargetDB Rosetta PsortB KEGG Chemfinder Babel

Concept External DBs Genomic Sequence Data Proteomic Sequence Data Proc essin g Exter nal Proce ssing Inter nal Proce ssing Internal DBs Annotated Proteomic Sequence Data Gene Identification and Translation

Concept External DBs Genomic Sequence Data Proteomic Sequence Data Proc essin g Exter nal Proce ssing Inter nal Proce ssing Internal DBs Annotated Proteomic Sequence Data Viewing and Mining Software Gene Identification and Translation

Concept External DBs Genomic Sequence Data Proteomic Sequence Data Proc essin g Exter nal Proce ssing Inter nal Proce ssing Internal DBs Annotated Proteomic Sequence Data Viewing and Mining Software Gene Identification and Translation Proteome Analyst Multiple Protein Extraction and Report System

Data Mining and Visualization

Concept External DBs Genomic Sequence Data Proteomic Sequence Data Proc essin g Exter nal Proce ssing Inter nal Proce ssing Internal DBs Annotated Proteomic Sequence Data Viewing and Mining Software Gene Identification and Translation Discoveries

Progress Curently working on H. Influenzae reference genome. Written modules for generating protein sequence data from gene predictions (using glimmer). Currently writing the analysis modules and automation scripts.

Progress

Acknowledgments P.I.s David Wishart Dwayne Szaffron Paul Lu Russel Greiner CyberCell Database Shan Sundararaj An Chi Guo Bahram Habibi Nazhad Proteome Analyst Alona Fyshe David Meeuwis Roman Eisner Brett Poulin Zhiyong Lu John Anvik Cam Macdonnel

An Automated System for Deep Proteome Annotation Gary Van Domselaar September 27, 2003