Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Neural Network Predictor for Peptide Fragmentation in Mass Spectrometry Arunima Ram Advisor : Dr. Predrag Radivojac Co-Advisor : Dr. Haixu Tang Co-Advisor.

Similar presentations


Presentation on theme: "A Neural Network Predictor for Peptide Fragmentation in Mass Spectrometry Arunima Ram Advisor : Dr. Predrag Radivojac Co-Advisor : Dr. Haixu Tang Co-Advisor."— Presentation transcript:

1 A Neural Network Predictor for Peptide Fragmentation in Mass Spectrometry Arunima Ram Advisor : Dr. Predrag Radivojac Co-Advisor : Dr. Haixu Tang Co-Advisor : Dr. Randy J. Arnold Indiana University, Bloomington, Indiana

2 Outline  Introduction to Proteomics  Introduction to Neural Networks  Objective  Data and Process  Results  Future Work  Acknowledgments

3 Introduction to Proteomics  Proteins are molecules of life, made up of chain of amino acids. There are 20 known amino acids and each are represented by a letter  Proteome is sum of all proteins in an organism, tissue or sample under study Amino Acid

4 Introduction to Proteomics  Proteomics is study of protein composition of an organelle, or cell or entire organism to discover the following goals  Identification  Quantification  Expression changes  Modifications  Interaction with other proteins and molecules  Mass Spectrometers are machines used for proteomics study.

5 Introduction to Proteomics  Protein separation  Protein digestion by specific enzyme trypsin into peptides  Peptides are separated and charged  Mass Spectrometer selects peptide based on mass  Mass Spectrum (MS) of peptides is recorded  Each peptide is fragmented and sent through a second MS to record MS/MS data Ruedi Aebersold & Matthias Mann, NATURE :VOL 422, 198=207

6 Introduction to Proteomics  Fragmentation of peptides follows certain rules  b ion – N terminal fragment  y ion – C terminal fragment  Most abundant are b and y ions  Multiply charged peptide can generate multiply charged fragment ions  Certain residues lose water or ammonia or both to generate less abundant ions http://www.ionsource.com/tutorial/DeNovo/nomenclature.htm

7 b ions Introduction to Proteomics http://www.ionsource.com/tutorial/DeNovo/nomenclature.htm

8 y ions Introduction to Proteomics http://www.ionsource.com/tutorial/DeNovo/nomenclature.htm

9  Neural networks are composed of interconnected neurons working in unison to solve specific problems analogous to animal brains  ANN’s learn from examples to extract patterns and detect trends too complex to be noticed otherwise  Benefits  Can learn real-valued, discrete-valued function  Robust to noise in data Components of a Neuron cell Single Neuron Introduction to Artificial Neural Networks http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html

10 Introduction to Artificial Neural Networks  Training examples fed to input layer  Weights associated with each input in each layer  Weighted input combined at each layer to give an output  Hidden layer computes output using a logistic function and feeds to output layer  Determine error between output of network and desired output  Accordingly adjust weights in each layer and iterate through 2 – layered Feed-forward Neural Network

11 Objective  Matching fragmentation spectrum of peptides through “Database Matching” uses ad-hoc rules or probabilistic models and cannot match proteins not present in database  Aim – Use Machine Learning to learn Peptide Fragmentation rules from set of examples and predict the Fragmentation spectra and use that to better identify Peptides and Proteins

12 Dataset Organism Charge 1 Total/Unique Charge 2 Total/Unique Charge 3 Total/Unique Search Engine Shewanella7175 / 717517647 / 176473489 / 3489Sequest Rat150 / 583047 / 1782421 / 305Mascot Human4433 / 47263012 / 226114384 / 775Sequest Drosophila-------2331 / 123428 / 25Mascot Mouse1562 / 41977030 / 877931974 / 3961Sequest

13 Process  202 features extracted for 8 ions in charge 1 and 10 ions in charge 2 and charge 3 L N V W G K Cleavage point( b-3 ion ) Amino acids in the peptide Amino acids at both side of cleavage Amino acid at C- Terminal Amino acid at N- Terminal Number of Arginine and Lysine in peptide Basicity, Hydrophobicity, Isoelectric point, helix propensity for peptide and for the charged ion and for neighboring amino acids Mass of Peptide and Mass of Fragment Ion R. J. Arnold, N. Jayasankar, D. Aggarwal, H. Tang, P. Radivojac. A machine learning approach to predicting peptide fragmentation spectra. PSB 2006, pp. 219-230

14 Process  Target values intensity >= 1% of total intensity = 1 intensity < 1% of total intensity = 0 Number of Positives much smaller than Negatives, hence create class-balanced dataset  10 fold Cross validation Input data partitioned into 10 disjoint sets One set becomes test set and rest 9 become training set  Feature Set Reduction Unrelated features removed using T-Test Correlated features removed using Principal Component Analysis Dimensionality Reduction  Learning task reduced to classification problem – ion exists or not

15 Process  Train 10 ensemble neural network with the best performing hidden neurons for EACH ion in EACH charge  Report statistics on each cross-fold and average across each cross fold  Sensitivity - % of correctly identified positive examples  Specificity - % of correctly identified negative examples  Accuracy – ( Sn + Sp ) / 2  AUC – Area under the ROC curve Ensemble of Neural Networks Acta Chim. Slov. 2005, 52, 440–449

16 Process - Predictor  Final training done with ALL data  Neural Net architecture saved for future use  Steps – Input Peptide with charge to predictor Peptide decomposed into features Extract saved ANN architecture for each ion in each charge Predict on 10 ensembles and output the averaged prediction Score intensities as such – p = prior probability o = predicted output R. J. Arnold, N. Jayasankar, D. Aggarwal, H. Tang, P. Radivojac. A machine learning approach to predicting peptide fragmentation spectra. PSB 2006, pp. 219-230

17 Reproducibility Analysis  Among Mouse liver replicates, pick one as actual spectrum and other as predictions and compute AUC values  Determines maximum accuracy that can be achieved by any fragmentation predictor

18 Results - Reproducibility Analysis Ion Name Charge 1 AUCCharge 2 AUCCharge 3 AUC b 97.9190.9095.53 b-H 2 O 96.9589.6695.45 b-NH 3 93.3887.9793.65 b-H 2 O-NH 3 96.7091.3596.26 b++ ---------93.8092.13 y 97.2792.9095.76 y-H 2 O 91.8989.5793.21 y-NH 3 94.4186.7991.11 y-H 2 O-NH 3 96.4193.6796.94 y++---------93.6693.18

19 Results – Cross validation Accuracies Ion Charge 1 Sn / Sp Acc / AUC Charge 2 Sn / Sp Acc / AUC Charge 3 Sn / Sp Acc / AUC precursor-H 2 O57.6 / 60.2 58.9 / 61.164.9 / 66.1 65.5 / 71.459.6 / 58.5 59 / 62.7 b83.2 / 78.8 81 / 8978.7 / 75.4 77 / 84.681.1 / 75.8 78.4 / 85.9 b-H 2 O79.7 / 76.4 78.1 / 86.176.8 / 75.8 76.3 / 83.982.5 / 65.5 74 / 81.6 b-NH 3 77.2 / 75.1 76.1 / 83.573.1 / 76.9 75 / 82.683.2 / 61.2 72.2 / 78.8 b-H 2 O-NH 3 74.3 / 76.1 75.2 / 8272.3 / 64.2 68.2 / 75.284.2 / 59.2 71.7 / 77.8 b++-----------------------------77.4 / 72.8 75.1 / 83.180.6 / 72.2 76.4 / 84.4 y82.6 / 82.3 82.4 / 90.184.4 / 79.6 82 / 89.784.4 / 81.1 82.8 / 90 y-H 2 O79.1 / 77.8 78.4 / 86.177.8 / 73.2 75.5 / 82.682.5 / 64.7 73.6 / 80.5 y-NH 3 76.5 / 68.3 72.4 / 79.569.6 / 66.9 68.3 / 75.381.7 / 61.9 71.8 / 78.4 y-H 2 O-NH 3 70.4 / 76.4 73.4 / 80.774.2 / 64.3 69.3 / 75.782.5 / 62.4 72.4 / 77.3 y++------------------------------86.6 / 81.5 84 / 90.985.9 / 74.9 80.4 / 87.9 Sensitivity-Specificity and Accuracy-AUC for all charges in all ions on Cross Validation

20 Results – Cross Testing Accuracies on Drosophila data for charge 2 IonSnSpACCAUCCombined AUC b69.478.573.982.184.6 b-H 2 O79.472.876.183.683.9 b-NH 3 71.175.073.181.882.6 b-H 2 O-NH 3 69.966.067.976.875.2 b++65.777.071.477.283.1 y68.086.877.486.889.7 y-H 2 O63.070.466.772.582.6 y-NH 3 54.175.264.672.775.3 y-H 2 O-NH 3 53.970.262.167.575.7 y++90.382.286.292.690.9

21 MassAnalyzer – Peptide Fragmentation tool  Uses Mathematical model to predict fragmentation  Uses one model for charge 1 and charge 2 and a separate model for higher charges Z. Zhang, Anal. Chem. 2004, 76(14),3908-3922 Z. Zhang, Anal. Chem. 2005, 77(19),6634-6373

22 Results – Prediction Comparison Charge 1Charge 2Charge 3 Ion Name AUC MAAUC ANNAUC MAAUC ANNAUC MAAUC ANN b 90.4491.7485.6186.249090.24 b-H 2 O 89.3791.8786.8485.8588.9986.97 b-NH 3 85.8289.0185.4085.1783.6186.51 b-H 2 O-NH 3 61.3190.1271.2080.1577.4585.58 b++ -----------------85.3487.2086.0188.56 y 86.6588.2085.9888.5791.7291.82 y-H 2 O 85.9681.3877.5872.9582.8887.89 y-NH 3 87.6278.4378.3075.7385.1085.03 y-H 2 O-NH 3 64.8277.5069.5177.2876.5584.47 y++-------- 90.9293.3485.6087.55

23 Results – Prediction Comparison ROC figures – Charge 2

24 Results – Spectrum Comparison

25 Future Work  Reproducibility analysis on various other datasets and incorporating for replicate size( number of replicates for each spectrum )  Use Predicted Spectra to build another Predictor that would learn to score the given spectrum

26 Acknowledgements  Dr. Predrag Radivojac  Dr. Haixu Tang  Dr. Randy J. Arnold  Lab mates –  Amrita Mohan  Nils Schimmelmann  Wyatt Clark  Yong Li  Linda Hostetter  Bioinformatics faculty at SOI  School of Informatics

27


Download ppt "A Neural Network Predictor for Peptide Fragmentation in Mass Spectrometry Arunima Ram Advisor : Dr. Predrag Radivojac Co-Advisor : Dr. Haixu Tang Co-Advisor."

Similar presentations


Ads by Google