A Neural Network Predictor for Peptide Fragmentation in Mass Spectrometry Arunima Ram Advisor : Dr. Predrag Radivojac Co-Advisor : Dr. Haixu Tang Co-Advisor.

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

Data Mining Classification: Alternative Techniques
Lipids Analytical Tool (LipidAT): automated analysis of lipidomic mass spectrometry data Jun Ma Advisor: Dr. Haixu Tang Co-Advisor: Dr. David Wild Co-Advisor.
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen.
Methods for Improving Protein Disorder Prediction Slobodan Vucetic1, Predrag Radivojac3, Zoran Obradovic3, Celeste J. Brown2, Keith Dunker2 1 School of.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca.
ProReP - Protein Results Parser v3.0©
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
My contact details and information about submitting samples for MS
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Artificial Neural Networks for Secondary Structure Prediction CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (slides by J. Burg)
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Chapter 9 Neural Network.
PROTEIN QUANTIFICATION AND PTM JUN SIN HSS.I. PROJECT 1.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Common parameters At the beginning one need to set up the parameters.
COMPARISON OF IMAGE ANALYSIS FOR THAI HANDWRITTEN CHARACTER RECOGNITION Olarik Surinta, chatklaw Jareanpon Department of Management Information System.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Laxman Yetukuri T : Modeling of Proteomics Data
PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.
Ensemble Methods: Bagging and Boosting
Handwritten Recognition with Neural Network Chatklaw Jareanpon, Olarik Surinta Mahasarakham University.
Temple University MASS SPECTROMETRY INTRODUCTION Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.
Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
Central dogma: the story of life RNA DNA Protein.
INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.
ADVANCEMENT IN PROTEIN INFERENCE FROM SHOTGUN PROTEOMICS USING PEPTIDE DETECTABILITY PEDRO ALVES Advisor: Predrag Radivojac School of Informatics BLOOMINGTON.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Big data classification using neural network
Bottom-Up Proteomics Data collection
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
Final Year Project Presentation --- Magic Paint Face
Bioinformatics Solutions Inc.
Proteomics Informatics David Fenyő
network of simple neuron-like computing elements
Generalizations of Markov model to characterize biological sequences
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Somi Jacob and Christian Bach
Proteomics Informatics David Fenyő
Kuen-Pin Wu Institute of Information Science Academia Sinica
Presentation transcript:

A Neural Network Predictor for Peptide Fragmentation in Mass Spectrometry Arunima Ram Advisor : Dr. Predrag Radivojac Co-Advisor : Dr. Haixu Tang Co-Advisor : Dr. Randy J. Arnold Indiana University, Bloomington, Indiana

Outline  Introduction to Proteomics  Introduction to Neural Networks  Objective  Data and Process  Results  Future Work  Acknowledgments

Introduction to Proteomics  Proteins are molecules of life, made up of chain of amino acids. There are 20 known amino acids and each are represented by a letter  Proteome is sum of all proteins in an organism, tissue or sample under study Amino Acid

Introduction to Proteomics  Proteomics is study of protein composition of an organelle, or cell or entire organism to discover the following goals  Identification  Quantification  Expression changes  Modifications  Interaction with other proteins and molecules  Mass Spectrometers are machines used for proteomics study.

Introduction to Proteomics  Protein separation  Protein digestion by specific enzyme trypsin into peptides  Peptides are separated and charged  Mass Spectrometer selects peptide based on mass  Mass Spectrum (MS) of peptides is recorded  Each peptide is fragmented and sent through a second MS to record MS/MS data Ruedi Aebersold & Matthias Mann, NATURE :VOL 422, 198=207

Introduction to Proteomics  Fragmentation of peptides follows certain rules  b ion – N terminal fragment  y ion – C terminal fragment  Most abundant are b and y ions  Multiply charged peptide can generate multiply charged fragment ions  Certain residues lose water or ammonia or both to generate less abundant ions

b ions Introduction to Proteomics

y ions Introduction to Proteomics

 Neural networks are composed of interconnected neurons working in unison to solve specific problems analogous to animal brains  ANN’s learn from examples to extract patterns and detect trends too complex to be noticed otherwise  Benefits  Can learn real-valued, discrete-valued function  Robust to noise in data Components of a Neuron cell Single Neuron Introduction to Artificial Neural Networks

Introduction to Artificial Neural Networks  Training examples fed to input layer  Weights associated with each input in each layer  Weighted input combined at each layer to give an output  Hidden layer computes output using a logistic function and feeds to output layer  Determine error between output of network and desired output  Accordingly adjust weights in each layer and iterate through 2 – layered Feed-forward Neural Network

Objective  Matching fragmentation spectrum of peptides through “Database Matching” uses ad-hoc rules or probabilistic models and cannot match proteins not present in database  Aim – Use Machine Learning to learn Peptide Fragmentation rules from set of examples and predict the Fragmentation spectra and use that to better identify Peptides and Proteins

Dataset Organism Charge 1 Total/Unique Charge 2 Total/Unique Charge 3 Total/Unique Search Engine Shewanella7175 / / / 3489Sequest Rat150 / / / 305Mascot Human4433 / / / 775Sequest Drosophila / / 25Mascot Mouse1562 / / / 3961Sequest

Process  202 features extracted for 8 ions in charge 1 and 10 ions in charge 2 and charge 3 L N V W G K Cleavage point( b-3 ion ) Amino acids in the peptide Amino acids at both side of cleavage Amino acid at C- Terminal Amino acid at N- Terminal Number of Arginine and Lysine in peptide Basicity, Hydrophobicity, Isoelectric point, helix propensity for peptide and for the charged ion and for neighboring amino acids Mass of Peptide and Mass of Fragment Ion R. J. Arnold, N. Jayasankar, D. Aggarwal, H. Tang, P. Radivojac. A machine learning approach to predicting peptide fragmentation spectra. PSB 2006, pp

Process  Target values intensity >= 1% of total intensity = 1 intensity < 1% of total intensity = 0 Number of Positives much smaller than Negatives, hence create class-balanced dataset  10 fold Cross validation Input data partitioned into 10 disjoint sets One set becomes test set and rest 9 become training set  Feature Set Reduction Unrelated features removed using T-Test Correlated features removed using Principal Component Analysis Dimensionality Reduction  Learning task reduced to classification problem – ion exists or not

Process  Train 10 ensemble neural network with the best performing hidden neurons for EACH ion in EACH charge  Report statistics on each cross-fold and average across each cross fold  Sensitivity - % of correctly identified positive examples  Specificity - % of correctly identified negative examples  Accuracy – ( Sn + Sp ) / 2  AUC – Area under the ROC curve Ensemble of Neural Networks Acta Chim. Slov. 2005, 52, 440–449

Process - Predictor  Final training done with ALL data  Neural Net architecture saved for future use  Steps – Input Peptide with charge to predictor Peptide decomposed into features Extract saved ANN architecture for each ion in each charge Predict on 10 ensembles and output the averaged prediction Score intensities as such – p = prior probability o = predicted output R. J. Arnold, N. Jayasankar, D. Aggarwal, H. Tang, P. Radivojac. A machine learning approach to predicting peptide fragmentation spectra. PSB 2006, pp

Reproducibility Analysis  Among Mouse liver replicates, pick one as actual spectrum and other as predictions and compute AUC values  Determines maximum accuracy that can be achieved by any fragmentation predictor

Results - Reproducibility Analysis Ion Name Charge 1 AUCCharge 2 AUCCharge 3 AUC b b-H 2 O b-NH b-H 2 O-NH b y y-H 2 O y-NH y-H 2 O-NH y

Results – Cross validation Accuracies Ion Charge 1 Sn / Sp Acc / AUC Charge 2 Sn / Sp Acc / AUC Charge 3 Sn / Sp Acc / AUC precursor-H 2 O57.6 / / / / / / 62.7 b83.2 / / / / / / 85.9 b-H 2 O79.7 / / / / / / 81.6 b-NH / / / / / / 78.8 b-H 2 O-NH / / / / / / 77.8 b / / / / 84.4 y82.6 / / / / / / 90 y-H 2 O79.1 / / / / / / 80.5 y-NH / / / / / / 78.4 y-H 2 O-NH / / / / / / 77.3 y / / / / 87.9 Sensitivity-Specificity and Accuracy-AUC for all charges in all ions on Cross Validation

Results – Cross Testing Accuracies on Drosophila data for charge 2 IonSnSpACCAUCCombined AUC b b-H 2 O b-NH b-H 2 O-NH b y y-H 2 O y-NH y-H 2 O-NH y

MassAnalyzer – Peptide Fragmentation tool  Uses Mathematical model to predict fragmentation  Uses one model for charge 1 and charge 2 and a separate model for higher charges Z. Zhang, Anal. Chem. 2004, 76(14), Z. Zhang, Anal. Chem. 2005, 77(19),

Results – Prediction Comparison Charge 1Charge 2Charge 3 Ion Name AUC MAAUC ANNAUC MAAUC ANNAUC MAAUC ANN b b-H 2 O b-NH b-H 2 O-NH b y y-H 2 O y-NH y-H 2 O-NH y

Results – Prediction Comparison ROC figures – Charge 2

Results – Spectrum Comparison

Future Work  Reproducibility analysis on various other datasets and incorporating for replicate size( number of replicates for each spectrum )  Use Predicted Spectra to build another Predictor that would learn to score the given spectrum

Acknowledgements  Dr. Predrag Radivojac  Dr. Haixu Tang  Dr. Randy J. Arnold  Lab mates –  Amrita Mohan  Nils Schimmelmann  Wyatt Clark  Yong Li  Linda Hostetter  Bioinformatics faculty at SOI  School of Informatics