Presentation is loading. Please wait.

Presentation is loading. Please wait.

Proteome Analyst Transparent High-throughput Protein Annotation: Function, Localization and Custom Predictors.

Similar presentations


Presentation on theme: "Proteome Analyst Transparent High-throughput Protein Annotation: Function, Localization and Custom Predictors."— Presentation transcript:

1 Proteome Analyst Transparent High-throughput Protein Annotation: Function, Localization and Custom Predictors

2 Proteome Analyst Duane Szafron, Paul Lu, Russell Greiner, David Wishart, Zhiyong Lu, Brett Poulin, Roman Eisner, John Anvik,Cam Macdonell

3 Proteome Analyst Proteome one of many ‘-omes’ set of all proteins in an organism Analysis prediction of protein function or localization from sequence data

4 Analyze a Protein We have examples of annotated proteins in various protein classes. We have more examples of unannotated proteins.

5 Analyze a Protein We have examples of annotated proteins in various protein classes. We have more examples of unannotated proteins. What do we do?

6 Analyze a Protein We have examples of annotated proteins in various protein classes. We have more examples of unannotated proteins. What do we do? Find homologues to each protein and assume similar function.

7 Analyze a Protein We have examples of annotated proteins in various protein classes. We have more examples of unannotated proteins. What do we do? Find homologues to each protein and assume similar function. Find characteristics of each protein that affect function.

8 Analyzing Proteins One Protein?

9 Analyzing Proteins One Protein? Just do it.

10 Analyzing Proteins One Protein? Just do it. 5 Proteins?

11 Analyzing Proteins One Protein? Just do it. 5 Proteins? Post-doc familiar with protein classes.

12 Analyzing Proteins One Protein? Just do it. 5 Proteins? Post-doc familiar with protein classes. 50 Proteins?

13 Analyzing Proteins One Protein? Just do it. 5 Proteins? Post-doc familiar with protein classes. 50 Proteins? grad student

14 Analyzing Proteins One Protein? Just do it. 5 Proteins? Post-doc familiar with protein classes. 50 Proteins? grad student 5000 proteins?

15 Analyzing Proteins One Protein? Just do it. 5 Proteins? Post-doc familiar with protein classes. 50 Proteins? grad student 5000 proteins? summer students

16 Proteome Analyst

17 High-throughput Transparent Prediction of Protein Function Protein Localization Custom Classification

18 Machine Learning Task Training INPUT: sequences, classes OUTPUT: Classifier Analysis INPUT: sequences, Classifier OUTPUT: classes

19 Machine Learning Task Training INPUT: sequences, classes OUTPUT: Classifier Analysis INPUT: sequences, Classifier OUTPUT: classes, explanation

20 Training INPUT sequences, classes PA Tools sequences  features ML Algorithm features, classes  Classifier OUTPUT Classifier

21 Training: INPUT >class A<Training Seq 1 MVGSGLLWLALVSCILTQASAVQRGYGN PIEASSYGL... >class B<Training Seq 2 LLDEPFRSTENSAGSQGCDKNMSGWYRF VGEGGVRMS... >class B<Training Seq 3 EVIAYLRDPNCSSILQTEERNWVSVTSP VQASACRNI....

22 Training: INPUT >class A<Training Seq 1 MVGSGLLWLALVSCILTQASAVQRGYGN PIEASSYGL... >class B<Training Seq 2 LLDEPFRSTENSAGSQGCDKNMSGWYRF VGEGGVRMS... >class B<Training Seq 3 EVIAYLRDPNCSSILQTEERNWVSVTSP VQASACRNI.... classes protein sequences

23 Training: PA Tools sequences  features

24 Training: PA Tools sequences  features Homology Tools (BLAST) sequence  homologues homologues  annotations annotations  features

25 Homology Tool sequence  features sequence homologues annotationsfeatures seq DB BLAST retrieve parse

26 Homology Tool sequence  features sequence homologues annotationsfeatures seq DB BLAST retrieve parse DBSOURCE swissprot: locus MPPB_NEUCR,... xrefs (non-sequence databases):... InterProIPR001431,... KEYWORDS Hydrolase; Metalloprotease; Zinc; Mitochondrion; Transit peptide; Oxidoreductase; Electron transport; Respiratory chain.

27 Homology Tool sequence  features sequence homologues annotationsfeatures seq DB BLAST retrieve parse

28 Training: PA Tools sequences  features Homology Tools (BLAST) sequence  homologues homologues  annotations annotations  features Pattern Tools (PFAM, ProSite, …) sequences  motifs motifs  features

29 Pattern Tool sequence  features sequence patterns features pattern DB find parse

30 Pattern Tool sequence  features sequence patterns features pattern DB find parse Pfam; PF00234; tryp_alpha_amyl; 1. PROSITE; PS00940; GAMMA_THIONIN; 1. PROSITE; PS00305; 11S_SEED_STORAGE; 1.

31 Pattern Tool sequence  features not included in current results sequence patterns features pattern DB find parse

32 Training: ML Algorithm features, classes  Classifier

33 Training: ML Algorithm features, classes  Classifier any ML Algorithm may be used default = naïve Bayes consistently near-best accuracy (SVM, ANN slightly better) efficient (for high-throughput) easy to interpret

34 Training: OUTPUT Classifier

35 Analysis (Classification) INPUT sequences PA Tools sequences  features Classifier features  classes, explanation OUTPUT classes

36 Analysis: INPUT >Seq 1 DTILNINFQCAYPLDMKVSLQAALQPIV SSLNVSVDG... >Seq 2 AVELSVESVLYVGAILEQGDTSRFNLVL RNCYATPTE... >Seq 3 HVEENGQSSESRFSVQMFMFAGHYDLVF LHCEIHLCD....

37 Analysis: INPUT >Seq 1 DTILNINFQCAYPLDMKVSLQAALQPIV SSLNVSVDG... >Seq 2 AVELSVESVLYVGAILEQGDTSRFNLVL RNCYATPTE... >Seq 3 HVEENGQSSESRFSVQMFMFAGHYDLVF LHCEIHLCD.... protein sequences

38 Analysis: PA Tools sequences  features

39 Analysis: PA Tools sequences  features Homology Tools (BLAST) sequence  homologues homologues  annotations annotations  features Pattern Tools (PFAM, ProSite, …) sequences  motifs motifs  features

40 Analysis: Classification features  classes

41 Analysis: Classification features  classes naïve Bayes returns probabilities of each class for each sequence efficient (for high-throughput) easy to interpret

42 Analysis: Classification features  classes, explanation

43 Analysis: Classification features  classes, explanation

44 Analysis: Classification features  classes, explanation

45 Analysis: Classification features  classes, explanation

46 Analysis: Classification features  classes, explanation

47 Results: General Function GeneQuiz classification 5-fold x-val accuracy on 14 classes

48 Results: General Function GeneQuiz classification 5-fold x-val accuracy on 14 classes E. Coli (2370) 82.5% Yeast (2359) 78.8% Fly (3842) 76.6%

49 Results: Specific Function K+ Ion Channel Proteins 5-fold x-val accuracy on 78 sequences, 4 classes

50 Results: Specific Function K+ Ion Channel Proteins 5-fold x-val accuracy on 78 sequences, 4 classes Accuracy 1st effort97.4% 2nd effort100%

51 Results: Localization Sub-cellular localization prediction 3146 sequences from 10 classes

52 Results: Localization Sub-cellular localization prediction 3146 sequences from 10 classes AccuracyCoverage Nair and Rost81.5%36.9% Proteome Analyst87.8%100%

53 Results Sub-cellular localization prediction 3146 sequences from 10 classes AccuracyCoverage Nair and Rost81.5%36.9% Proteome Analyst87.8%100%

54 Proteome Analyst High-throughput Transparent Prediction of Protein Function Protein Localization Custom Classification

55 Acknowledgements Student developers Cynthia Luk Samer Nassar Kevin McKee Biologists Warren Gallin Kathy Magor Data Nair and Rost

56 Acknowledgements Funding PENCE – Protein Engineering Network of Centres of Excellence NSERC - National Science and Engineering Research Council Sun Microsystems AICML - Alberta Ingenuity Centre for Machine Learning

57 Acknowledgements Many ‘-ome’ jokes my wife, Jen

58 Contact http://www.cs.ualberta.ca/~bioinfo/PA poulin@cs.ualberta.ca


Download ppt "Proteome Analyst Transparent High-throughput Protein Annotation: Function, Localization and Custom Predictors."

Similar presentations


Ads by Google