Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Automated System for Deep Proteome Annotation Gary Van Domselaar September 27, 2003.

Similar presentations


Presentation on theme: "An Automated System for Deep Proteome Annotation Gary Van Domselaar September 27, 2003."— Presentation transcript:

1 An Automated System for Deep Proteome Annotation Gary Van Domselaar September 27, 2003

2 The Problem Most existing biological databases have a narrow biological aspect. –PDB: biomolecular coordinate data –Ensembl: human gene predictions –GO: Genome Ontology (process, function, location) Each has a custom interface Each can answer questions in its own domain but cannot answer question that span multiple domain boundaries. ‘Which human gene products located in the endoplasmic reticulum have experimental coordinate data?’

3 The Solution: Integrated Biological Databases. 3 main approaches: 1.Link Integration. Researchers begin their query with one data source, then follow hypertext links to related information in other data sources. Example: DAS, NCBI link out. 2.View Integration. A ‘super interface’ is created that makes the source databases appear as one. Example: Kleisli. 3.Data Warehousing. All the data is brought under one roof. Example: Genecards, GeneMine, Cybercell database.

4 An Automated Proteome Annotation System for Proteome Analyst Proteome Analyst provides annotations in the form of a ‘PA Card’

5 An Automated Proteome Annotation System for Proteome Analyst

6 Proteome Analyst provides annotations in the form of a ‘PA Card’ This system will provide a much fuller set of annotations

7 Annotations 2D_Gel_Image Accession_No. Alternate_Names Availability Centisome Position Cofactors Copy Number Cys/Met_Content EC_Number Entry_ID Following_Gene Gene_Name Gene_Ontology Gene_Position General_Function General_Reaction Gene_Sequence Quaternary_Structure Resolution Riley_Cell_Function Riley_Gene_Function RNA_Copy_No. Secondary_Structure Sequence Similarity Specific_Activity Specific_Function Specific_Reaction Structure_CLASS Substrates SWISS_PROT_(AC_&_ID) Theoretical_pI Transmembrane Upstream_100_bases Homologues Important_Sites Inhibitor Interacting_Partners Kcat_Value_[1/min] Km_Value_[mM] Location Metabolic_Importance Metals_Ions Molecular_Weight No._of_Amino_Acids Other_Databases Paralogues Pfam_Domain/Function Preceding_Gene Products PROSITE_Motif

8 Concept Genomic Sequence Data

9 Concept Genomic Sequence Data Genomic data analysis must be tailored to the major kingdoms: viruses prokaryotes Eukaryotes - Genscan } Glimmer Genomic Sequence Data

10 Concept Genomic Sequence Data Proteomic Sequence Data Gene Identification and Translation

11 Concept Genomic Sequence Data Proteomic Sequence Data Proc essin g Gene Identification and Translation

12 Concept Genomic Sequence Data Proteomic Sequence Data Proc essin g Gene Identification and Translation Inter nal Proce ssing

13 Concept Genomic Sequence Data Proteomic Sequence Data Proc essin g Gene Identification and Translation Inter nal Proce ssing

14 Concept Genomic Sequence Data Proteomic Sequence Data Proc essin g Gene Identification and Translation Secondary Structure Homology Modeling Mol. Wt pI Etc. Inter nal Proce ssing

15 Concept Genomic Sequence Data Proteomic Sequence Data Proc essin g Inter nal Proce ssing Gene Identification and Translation Internal DBs

16 Concept Genomic Sequence Data Proteomic Sequence Data Proc essin g Inter nal Proce ssing Gene Identification and Translation CCDB: a deeply annotated database for E. coli. CCDB++ other deeply annotated model organisms from each kingdom SWISS-PROT PDB Internal DBs

17 Cybercell (CCDB) A comprehensive collection of detailed enzymatic, biological, chemical, genetic, and molecular biological data about E. coli (strain K12, MG1655).

18 Concept External DBs Genomic Sequence Data Proteomic Sequence Data Proc essin g Exter nal Proce ssing Inter nal Proce ssing Internal DBs Gene Identification and Translation

19 Data Sources GenBank SwissProt Prosite pI/MW Tool Geneiz PIR PEC/Shigen Echobase Wisconsin ExpressDB GeneOntology GenProtEC EcoGene PsiPred EcoCyc PDB CATH Swiss2D PAGE SwissModel BRENDA TargetDB Rosetta PsortB KEGG Chemfinder Babel

20 Concept External DBs Genomic Sequence Data Proteomic Sequence Data Proc essin g Exter nal Proce ssing Inter nal Proce ssing Internal DBs Annotated Proteomic Sequence Data Gene Identification and Translation

21 Concept External DBs Genomic Sequence Data Proteomic Sequence Data Proc essin g Exter nal Proce ssing Inter nal Proce ssing Internal DBs Annotated Proteomic Sequence Data Viewing and Mining Software Gene Identification and Translation

22 Concept External DBs Genomic Sequence Data Proteomic Sequence Data Proc essin g Exter nal Proce ssing Inter nal Proce ssing Internal DBs Annotated Proteomic Sequence Data Viewing and Mining Software Gene Identification and Translation Proteome Analyst Multiple Protein Extraction and Report System

23 Data Mining and Visualization

24

25 Concept External DBs Genomic Sequence Data Proteomic Sequence Data Proc essin g Exter nal Proce ssing Inter nal Proce ssing Internal DBs Annotated Proteomic Sequence Data Viewing and Mining Software Gene Identification and Translation Discoveries

26 Progress Curently working on H. Influenzae reference genome. Written modules for generating protein sequence data from gene predictions (using glimmer). Currently writing the analysis modules and automation scripts.

27 Progress

28 Acknowledgments P.I.s David Wishart Dwayne Szaffron Paul Lu Russel Greiner CyberCell Database Shan Sundararaj An Chi Guo Bahram Habibi Nazhad Proteome Analyst Alona Fyshe David Meeuwis Roman Eisner Brett Poulin Zhiyong Lu John Anvik Cam Macdonnel

29 An Automated System for Deep Proteome Annotation Gary Van Domselaar September 27, 2003


Download ppt "An Automated System for Deep Proteome Annotation Gary Van Domselaar September 27, 2003."

Similar presentations


Ads by Google