1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics.

Slides:



Advertisements
Similar presentations
Transmembrane Protein Topology Prediction Using Support Vector Machines Tim Nugent and David Jones Bioinformatics Group, Department of Computer Science,
Advertisements

Using Support Vector Machines for transmembrane protein topology prediction Tim Nugent.
Progress in Transmembrane Protein Research 12 Month Report Tim Nugent.
Support Vector Machine-based Transmembrane Protein Topology Prediction Tim Nugent.
Targeting and assembly of proteins destined for chloroplasts and mitochondria How are proteins targeted to chloroplasts and mitochondria from the cytoplasm?
Corrections. SEQUENCE 4 >seq4 MSTNNYQTLSQNKADRMGPGGSRRPRNSQHATASTPSASSCKEQQKDVEH EFDIIAYKTTFWRTFFFYALSFGTCGIFRLFLHWFPKRLIQFRGKRCSVE NADLVLVVDNHNRYDICNVYYRNKSGTDHTVVANTDGNLAELDELRWFKY.
Molecular Basis for Relationship between Genotype and Phenotype DNA RNA protein genotype function organism phenotype DNA sequence amino acid sequence transcription.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Table 2 shows that the set TFsf-TGblbs of predicted regulatory links has better results than the other two sets, based on having a significantly higher.
Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship.
Prediction of protein localization and membrane protein topology Gunnar von Heijne Department of Biochemistry and Biophysics Stockholm Bioinformatics Center.
Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship.
An Introduction to Bioinformatics Protein Structure Prediction.
1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
PREDICTION OF PROTEIN FEATURES Beyond protein structure (TM, signal/target peptides, coiled coils, conservation…)
Protein-Protein Interaction Screens. Bacterial Two-Hybrid System selectable marker RNA polymerase DNA binding protein bait target sequence target.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Proteome.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
PAT project Advanced bioinformatics tools for analyzing the Arabidopsis genome Proteins of Arabidopsis thaliana (PAT) & Gene Ontology (GO) Hongyu Zhang,
Sequence Analysis with Artemis & Artemis Comparison Tool (ACT) South East Asian Training Course on Bioinformatics Applied to Tropical Diseases (Sponsored.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Tomato genome annotation pipeline in Cyrille2
Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
From Genomic Sequence Data to Genotype: A Proposed Machine Learning Approach for Genotyping Hepatitis C Virus Genaro Hernandez Jr CMSC 601 Spring 2011.
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Localization prediction of transmembrane proteins Stefan Maetschke, Mikael Bodén and Marcus Gallagher The University of Queensland.
Genomics of Microbial Eukaryotes Igor Grigoriev Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Name: Date: Period: Chapter 7 Study Guide 1.What determines the structure (shape) of a cell? 2.Explain why cells were unknown to science until the mid-1600’s.
Central dogma: the story of life RNA DNA Protein.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Bioinformatics Project BB201 Metabolism A.Nasser
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Introduction to Plant Biology. First off: What is a plant? Domain? Kingdom? Categories?
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
1 Computational Approaches(1/7)  Computational methods can be divided into four categories: prediction methods based on  (i) The overall protein amino.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
Comparative Genome Analysis and Genome Evolution of Members of the Magnaporthaceae Family of Fungi.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
Prediction of protein features. Beyond protein structure
Protein Families, Motifs & Domains.
S1 Table. The protein sequences Glycine max St8 MER3 and 18 homologous proteins used for phylogenetic analysis. S. No. Gene Name/ ID Protein type 1 Glyma.06G
Interrogation of cross talk between proteins and gene regulatory networks in breast cancer Chambers, Teressa Lee Hiren Karathia Sridhar Hannenhalli.
7.3 Translation udent_view0/chapter3/animation__how_translation_work s.html.
Copyright Pearson Prentice Hall
Combining HMMs with SVMs
Can you name the 5 kingdoms of life?
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
COMPARING Prokaryotes & Eukaryotes
Prokaryotic and Eukaryotic cells
Predicted location and functional classification of differentially expressed transcripts. Predicted location and functional classification of differentially.
Nature of Bacteria and Fungi
Protein information in the Human Protein Atlas.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Nature of Bacteria and Fungi
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

1 Eukaryotic Secretome Prediction and Knowledge-Base Development Xiang-Jia “Jack” Min Ph.D., Assistant Professor 2 nd International Conferences on Proteomics & Bioinformatics. Las Vegas, July 2 - 4, 2012

2 DNARNAphenotypeprotein

3 Genome Transcriptome Proteome Secretome mRNA (protein-coding DNA sequences) Protein sequences Proteins with secretory signal peptide Transcription Translation Secretion

4 Günter Blobel

5

6

7

8 Biomaterials Small molecules Fungi secreted enzymes Yeasts Moulds Mushrooms Biomaterials Bio-fuels Enzymes

9 How to identify secreted proteins? Genome Transcriptome Proteome Secretome Transcription Translation Secretion (1) Direct identification using proteomics methods (Tsang et al. 2009) (2) Computational prediction from predicted proteome (3) EST data mining

10 Secreted Proteins Classical secreted proteins have a signal peptide at N-terminus; Not all proteins have a signal peptide are secreted: Signal peptide = secreted protein

11 SignalP: a program to predict if a protein contains a signal peptide. Phobius: signal peptide and transmembrane domain predicton. WolfPsort: a multiple subcellular location predictor TargetP: detect proteins targeted to mitochondria. TMHMM: transmembrane domain prediction. PS-Scan: detection ER- retention signals

12

13

14 Human cytochrome C oxidase subunit 1 (COX1)

15

16 Data SecretedNon-secreted Fungi 241 5,992 Animals5,56819,048 Plants 216 7,528 Protists 32 1,979

17 Method Sensitivity (%) = TP/(TP + FN) x 100 Specificity (%) = TN/(TN + FP) x 100 Mathews’ Correlation Coefficient (MCC) MCC (%) = (TP x TN – FP x FN) x 100 /((TP + FP) (TP + FN) (TN + FP) (TN + FN)) 1/2

18 TPFPTNFN Sn (%)Sp (%) MCC (%) SignalP Phobius TargetP WolfPsort SignalP/TMHMM Phobius/TMHMM TargetP/TMHMM WolfPsort/TMHMM SignalP/TMHMM/WolfPsort SignalP/TMHMM//WolfPsort/Phobius SignalP/TMHMM/WolfPsort/Phobius/PS-Scan SignalP/TMHMM/WolfPsort/Phobius/TargetP/PS- Scan TP: true positives; FP: false positives; TN: true negatives; FN: false negatives. Sn: sensitivity; Sp:specificity;MCC: Mathews' correlation coefficient. Table 1. Prediction accuracies of secreted proteins in fungi Min XJ (2010) JPB 3:

19 Table 2. Prediction accuracies of secreted proteins in animals TPFPTNFN Sn (%)Sp (%) MCC (%) SignalP Phobius TargetP WolfPsort SignalP/TMHMM Phobius/TMHMM TargetP/TMHMM WolfPsort/TMHMM Phobius/WolfPsort Phobius/WolfPsort/TMHMM Phobius/WolfPsort/TMHMM/SignalP Phobius/WolfPsort/TMHMM/TargetP Phobius/WolfPsort/TMHMM/TargetP/PS-Scan Phobius/WolfPsort/TMHMM/TargetP/PS- Scan/SignalP TP: true positives; FP: false positives; TN: true negatives; FN: false negatives. Sn: sensitivity; Sp:specificity; MCC: Mathews' correlation coefficient. Min XJ (2010) JPB 3:

20 Table 3. Prediction accuracies of secreted proteins in plants TP: true positives; FP: false positives; TN: true negatives; FN: false negatives. Sn: sensitivity; Sp:specificity; MCC: Mathews' correlation coefficient. TPFPTNFNSn (%)Sp (%) MCC (%) SignalP Phobius TargetP WolfPsort SignalP/TMHMM Phobius/TMHMM TargetP/TMHMM WolfPsort/TMHMM SignalP/HMM/TargetP Phobius/TargetP/TMHMM SignalP/TMHMM/WolfPsort SignalP/TMHMM/Phobius SignalP/HMM/Phobius/TargetP SignalP/HMM/Phobius/TargetP/PS-Scan SignalP/HMM/Phobius/TargetP/WolfPsort/PS-Scan Min XJ (2010) JPB 3:

21 Summary Different prediction tools have different accuracies for prediction of secretomes in different kingdoms of species; Combining these tools often increases the prediction accuracy. However, differential combination are needed for species in different kingdoms. Optimal methods are proposed.

22

23

24

25 Views gi accession UniProt ID Keywords Species User Inputs Manual Curation Subcellular Location FunSecKB fragAnchor PS-SCAN TMHMM TargetP WolfPsort Phobius SignalP Database RefSeq UniProt Prediction Tools External Links Lum G & Min XJ (2011) Database.

26 Summary of FunSecKB Currently the database contains a total of 478,073 fungal protein sequences 23,878 predicted and / or curated secreted proteins A total of 118 fungal species including 52 fungal species having a complete proteome

27 Lum G & Min XJ (2011) Database.

28 Lum G & Min XJ (2011) Database.

29 Lum G & Min XJ (2011) Database.

30

31

32

33

34

35

36 Plant secretomes and other subcellular proteins Vitis vinifera (%) Populus trichocarpa (%) Arabidopsis thaliana (%) Oryza sativa (%) Sorghum Bicolor (%) Total proteins Secreted proteins 1892 (6.3)2487 (6.0)2835 (8.8)3085 (7.7)2394 (7.3) Mitochondria Membrane 490 (1.6)566 (1.4)415 (1.3)832 (2.1)666 (2.0) Non- membrane 3877 (13.0)5238 (12.5)3729 (11.6)7187 (18.0)5768 (17.6) Chloroplast Membrane 565 (1.9)601 (1.4)671 (2.1)720 (1.8)610 (1.9) Non- membrane 3675 (12.3)4850 (11.6)4865 (15.1)6318 (15.8)5385 (16.4) ER proteins 29 (0.1)37 (0.1)60 (0.2)32 (0.1)25 (0.1) Other membrane proteins 3251 (10.9)4532 (10.8)3649 (11.3)3672 (9.2)2900 (8.8) Others (unknown) (53.8)23483 (56.2)15990 (49.64)18151 (45.4)15048 (45.9)

37

38

39 Acknowledgements Gengkon Lum(M. S. Graduate) Jessica Orr (Undergraduate) Docylyne Shelton (Undergraduate) Braden Walters (Undergraduate )