CSE182 CSE182-L12 Mass Spectrometry Peptide identification.

Slides:



Advertisements
Similar presentations
Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,
Advertisements

Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
De Novo Sequencing and Homology Searching with De Novo Sequence Tags.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
CSE182 CSE182-L13 Mass Spectrometry Quantitation and other applications.
Protein Sequencing and Identification by Mass Spectrometry.
Fa 05CSE182 CSE182-L7 Protein sequencing and Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.
Fa 06CSE182 CSE182-L13 Mass Spectrometry Quantitation and other applications.
Mass Spectrometry Peptide identification
The restriction mapping problem revisited Gopal Pandurangan and H. Ramesh Journal of Computer and System Sciences 526~544(2002)
Fa 06CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
6/29/20151 Efficient Algorithms for Motif Search Sudha Balla Sanguthevar Rajasekaran University of Connecticut.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Previous Lecture: Regression and Correlation
1 An Algorithmic Approach to Peptide Sequencing via Tandem Mass Spectrometry Ming-Yang Kao Department of Computer Science Northwestern University Evanston,
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
A combination of the words Proteomics and Genomics. Proteogenomics commonly refer to studies that use proteomic information, often derived from mass spectrometry,
CSE182-L12 Mass Spectrometry Peptide identification CSE182.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
ATOMIC MASS & AVERAGE ATOMIC MASS
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
CSE182 CSE182-L13 Mass Spectrometry Quantitation and other applications.
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
1 Ion Information and Elemental Patterns Chapter 6 (Hoffmann & Stroobant) Chapter 1 (Johnstone and Rose, 1996)
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Wi’07Bafna Proteomics via Mass Spectrometry (a bioinformatics perspective) Vineet Bafna
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Common parameters At the beginning one need to set up the parameters.
1 Chemical Analysis by Mass Spectrometry. 2 All chemical substances are combinations of atoms. Atoms of different elements have different masses (H =
CH 908: Mass Spectrometry Lecture 4 Interpreting Electron Impact Mass Spectra – Continued… Recommended: Read chapters 8-9 of McLafferty Prof. Peter B.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
INF380 - Proteomics-51 INF380 – Proteomics Chapter 5 – Fundamentals of Mass Spectrometry Mass spectrometry (MS) is used for measuring the mass-to-charge.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Peptide Identification via Tandem Mass Spectrometry Sorin Istrail.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
Protein Identification Using Tandem Mass Spectrometry Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College.
De Novo Peptide Sequencing via Probabilistic Network Modeling PepNovo.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
Using Scaffold OHRI Proteomics Core Facility. This presentation is intended for Core Facility internal training purposes only.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
A Database of Peak Annotations of Empirically Derived Mass Spectra
MassMatrix Search Results Explained
De novo interpretation of peptide mass spectra
Structure Determination: Mass Spectrometry and Infrared Spectroscopy
General Overview of the module and the methods
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra I
A perspective on proteomics in cell biology
Proteomics Informatics –
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
Presentation transcript:

CSE182 CSE182-L12 Mass Spectrometry Peptide identification

CSE182 General isotope computation Definition: –Let p i,a be the abundance of the isotope with mass i Da above the least mass –Ex: P 0,C : abundance of C-12, P 2,O : O-18 etc. –Let N a denote the number of atome of amino- acid a in the sample. Goal: compute the heights of the isotopic peaks. Specifically, compute P i = Prob{M+i}, for i=0,1,2…

CSE182 Characteristic polynomial We define the characteristic polynomial of a peptide as follows:  (x) is a concise representation of the isotope profile

CSE182 Characteristic polynomial computation Suppose carbon was the only atom with an isotope C- 13.

CSE182 General isotope computation Definition: –Let p i,a be the abundance of the isotope with mass i Da above the least mass –Ex: P 0,C : abundance of C-12, P 2,O : O-18 etc. Characteristic polynomial Prob{M+i}: coefficient of x i in  (x) (a binomial convolution)

CSE182 Isotopic Profile Application In DxMS, hydrogen atoms are exchanged with deuterium The rate of exchange indicates how buried the peptide is (in folded state) Consider the observed characteristic polynomial of the isotope profile  t1,  t2, at various time points. Then The estimates of p 1,H can be obtained by a deconvolution Such estimates at various time points should give the rate of incorporation of Deuterium, and therefore, the accessibility. Not in Syllabus

CSE182 Quiz  How can you determine the charge on a peptide?  Difference between the first and second isotope peak is 1/Z  Proposal:  Given a mass, predict a composition, and the isotopic profile  Do a ‘goodness of fit’ test to isolate the peaks corresponding to the isotope  Compute the difference

CSE182 Ion mass computations Amino-acids are linked into peptide chains, by forming peptide bonds Residue mass –Res.Mass(aa) = Mol.Mass(aa)- 18 –(loss of water)

CSE182 Peptide chains MolMass(SGFAL) = resM(S)+…res(L)+18

CSE182 M/Z values for b/y-ions Singly charged b-ion = ResMass(prefix) + 1 Singly charged y-ion= ResMass(suffix)+18+1 What if the ions have higher units of charge? R NH + 3 -CH-CO-NH-CH-COOH R NH + 2 -CH-CO-NH-CH-CO R H+ R NH 2 -CH-CO-………-NH-CH-COOH R Ionized Peptide

CSE182 De novo interpretation Given a spectrum (a collection of b-y ions), compute the peptide that generated the spectrum. A database of peptides is not given! Useful? –Many genomes have not been sequenced –Tagging/filtering –PTMs

CSE182 De Novo Interpretation: Example S G E K b-ions y-ions b y y M/Z b Ion Offsets b=P+1 y=S+19=M-P+19

CSE182 Computing possible prefixes We know the parent mass M=401. Consider a mass value 88 Assume that it is a b-ion, or a y-ion If b-ion, it corresponds to a prefix of the peptide with residue mass 88-1 = 87. If y-ion, y=M-P+19. –Therefore the prefix has mass P=M-y+19= =332 Compute all possible Prefix Residue Masses (PRM) for all ions.

CSE182 Putative Prefix Masses Prefix Mass M=401 b y S G E K Only a subset of the prefix masses are correct. The correct mass values form a ladder of amino- acid residues

CSE182 Spectral Graph Each prefix residue mass (PRM) corresponds to a node. Two nodes are connected by an edge if the mass difference is a residue mass. A path in the graph is a de novo interpretation of the spectrum G

CSE182 Spectral Graph Each peak, when assigned to a prefix/suffix ion type generates a unique prefix residue mass. Spectral graph: –Each node u defines a putative prefix residue M(u). –(u,v) in E if M(v)-M(u) is the residue mass of an a.a. (tag) or 0. –Paths in the spectral graph correspond to a interpretation S G E K

CSE182 Re-defining de novo interpretation Find a subset of nodes in spectral graph s.t. –0, M are included –Each peak contributes at most one node (interpretation)(*) –Each adjacent pair (when sorted by mass) is connected by an edge (valid residue mass) –An appropriate objective function (ex: the number of peaks interpreted) is maximized S G E K G

CSE182 Two problems Too many nodes. –Only a small fraction are correspond to b/y ions (leading to true PRMs) (learning problem) Multiple Interpretations –Even if the b/y ions were correctly predicted, each peak generates multiple possibilities, only one of which is correct. We need to find a path that uses each peak only once (algorithmic problem). –In general, the forbidden pairs problem is NP-hard S G E K

CSE182 Too many nodes We will use other properties to decide if a peak is a b-y peak or not. For now, assume that  (u) is a score function for a peak u being a b-y ion.

CSE182 Multiple Interpretation Each peak generates multiple possibilities, only one of which is correct. We need to find a path that uses each peak only once (algorithmic problem). In general, the forbidden pairs problem is NP-hard However, The b,y ions have a special non- interleaving property Consider pairs (b 1,y 1 ), (b 2,y 2 ) –If (b 1 y 2

CSE182 Non-Intersecting Forbidden pairs S G E K If we consider only b,y ions, ‘forbidden’ node pairs are non-intersecting, The de novo problem can be solved efficiently using a dynamic programming technique

CSE182 The forbidden pairs method Sort the PRMs according to increasing mass values. For each node u, f(u) represents the forbidden pair Let m(u) denote the mass value of the PRM. Let  (u) denote the score of u Objective: Find a path of maximum score with no forbidden pairs u f(u)

CSE182 D.P. for forbidden pairs Consider all pairs u,v –m[u] M/2 Define S(u,v) as the best score of a forbidden pair path from –0->u, and v->M Is it sufficient to compute S(u,v) for all u,v? uv

CSE182 D.P. for forbidden pairs Note that the best interpretation is given by uv

CSE182 D.P. for forbidden pairs Note that we have one of two cases. 1.Either u > f(v) (and f(u) < v) 2.Or, u v) Case 1. –Extend u, do not touch f(v) uf(v)v

CSE182 The complete algorithm for all u /* increasing mass values from 0 to M/2 */ for all v /* decreasing mass values from M to M/2 */ if (u < f[v]) else if (u > f[v]) If (u,v)  E /* maxI is the score of the best interpretation */ maxI = max {maxI,S[u,v]}

CSE182 De Novo: Second issue Given only b,y ions, a forbidden pairs path will solve the problem. However, recall that there are MANY other ion types. –Typical length of peptide: 15 –Typical # peaks? ? –#b/y ions? –Most ions are “Other” a ions, neutral losses, isotopic peaks….

CSE182 De novo: Weighting nodes in Spectrum Graph Factors determining if the ion is b or y –Intensity (A large fraction of the most intense peaks are b or y) –Support ions –Isotopic peaks

CSE182 De novo: Weighting nodes A probabilistic network to model support ions (Pepnovo)

CSE182 De Novo Interpretation Summary The main challenge is to separate b/y ions from everything else (weighting nodes), and separating the prefix ions from the suffix ions (Forbidden Pairs). As always, the abstract idea must be supplemented with many details. –Noise peaks, incomplete fragmentation –In reality, a PRM is first scored on its likelihood of being correct, and the forbidden pair method is applied subsequently. In spite of these algorithms, de novo identification remains an error-prone process. When the peptide is in the database, db search is the method of choice.

CSE182 The dynamic nature of the cell The proteome of the cell is changing Various extra-cellular, and other signals activate pathways of proteins. A key mechanism of protein activation is PT modification These pathways may lead to other genes being switched on or off Mass Spectrometry is key to probing the proteome

CSE182 What happens to the spectrum upon modification? Consider the peptide MSTYER. Either S,T, or Y (one or more) can be phosphorylated Upon phosphorylation, the b-, and y-ions shift in a characteristic fashion. Can you determine where the modification has occurred? If T is phosphorylated, b 3, b 4, b 5, b 6, and y 4, y 5, y 6 will shift

CSE182 Effect of PT modifications on identification The shifts do not affect de novo interpretation too much. Why? Database matching algorithms are affected, and must be changed. Given a candidate peptide, and a spectrum, can you identify the sites of modifications

CSE182 Db matching in the presence of modifications Consider MSTYER The number of modifications can be obtained by the difference in parent mass. If 1 phoshphorylation, we have 3 possibilities: –MS*TYER –MST*YER –MSTY*ER Which of these is the best match to the spectrum? If 2 phosphorylations occurred, we would have 6 possibilities. Can you compute more efficiently?

CSE182 Scoring spectra in the presence of modification Can we predict the sites of the modification? A simple trick can let us predict the modification sites? Consider the peptide ASTYER. The peptide may have 0,1, or 2 phosphorylation events. The difference of the parent mass will give us the number of phosphorylation events. Assume it is 1. Create a table with the number of b,y ions matched at each breakage point assuming 0, or 1 modifications Arrows determine the possible paths. Note that there are only 2 downward arrows. The max scoring path determines the phosphorylated residue A S T Y E R 0101

CSE182 Modifications Modifications significantly increase the time of search. The algorithm speeds it up somewhat, but is still expensive