Protein sequencing and Mass Spectrometry
Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation
Single Stage MS Mass Spectrometry LC-MS: 1 MS spectrum / second
Tandem MS Secondary Fragmentation Ionized parent peptide
The peptide backbone H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH R i-1 RiRi R i+1 AA residue i-1 AA residue i AA residue i+1 N-terminus C-terminus The peptide backbone breaks to form fragments with characteristic masses.
Ionization H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH R i-1 RiRi R i+1 AA residue i-1 AA residue i AA residue i+1 N-terminus C-terminus The peptide backbone breaks to form fragments with characteristic masses. Ionized parent peptide H+H+
Fragment ion generation H...-HN-CH-CO NH-CH-CO-NH-CH-CO-…OH R i-1 RiRi R i+1 AA residue i-1 AA residue i AA residue i+1 N-terminus C-terminus The peptide backbone breaks to form fragments with characteristic masses. Ionized peptide fragment H+H+
Tandem MS for Peptide ID 147 K 1166 L E D E E L F G S y ions b ions [M+2H] 2+ m/z % Intensity
Peak Assignment 147 K 1166 L E D E E L F G S y ions b ions y2y2 y3y3 y4y4 y5y5 y6y6 y7y7 b3b3 b4b4 b5b5 b8b8 b9b9 [M+2H] 2+ b6b6 b7b7 y9y9 y8y8 m/z % Intensity Peak assignment implies Sequence (Residue tag) Reconstruction!
Database Searching for peptide ID For every peptide from a database –Generate a hypothetical spectrum –Compute a correlation between observed and experimental spectra –Choose the best Database searching is very powerful and is the de facto standard for MS. –Sequest, Mascot, and many others
Spectra: the real story Noise Peaks Ions, not prefixes & suffixes Mass to charge ratio, and not mass –Multiply charged ions Isotope patterns, not single peaks
Peptide fragmentation possibilities (ion types) -HN-CH-CO-NH-CH-CO-NH- RiRi CH-R’ aiai bibi cici x n-i y n-i z n-i y n-i-1 b i+1 R” d i+1 v n-i w n-i i+1 low energy fragmentshigh energy fragments
Ion types, and offsets P = prefix residue mass S = Suffix residue mass b-ions = P+1 y-ions = S+19 a-ions = P-27
Mass-Charge ratio The X-axis is (M+Z)/Z –Z=1 implies that peak is at M+1 –Z=2 implies that peak is at (M+2)/2 M=1000, Z=2, peak position is at 501 –Suppose you see a peak at 501. Is the mass 500, or is it 1000?
Spectral Graph Each prefix residue mass (PRM) corresponds to a node. Two nodes are connected by an edge if the mass difference is a residue mass. A path in the graph is a de novo interpretation of the spectrum G
Spectral Graph Each peak, when assigned to a prefix/suffix ion type generates a unique prefix residue mass. Spectral graph: –Each node u defines a putative prefix residue M(u). –(u,v) in E if M(v)-M(u) is the residue mass of an a.a. (tag) or 0. –Paths in the spectral graph correspond to a interpretation S G E K
Re-defining de novo interpretation Find a subset of nodes in spectral graph s.t. –0, M are included –Each peak contributes at most one node (interpretation)(*) –Each adjacent pair (when sorted by mass) is connected by an edge (valid residue mass) –An appropriate objective function (ex: the number of peaks interpreted) is maximized S G E K G
Two problems Too many nodes. –Only a small fraction are correspond to b/y ions (leading to true PRMs) (learning problem) –Even if the b/y ions were correctly predicted, each peak generates multiple possibilities, only one of which is correct. We need to find a path that uses each peak only once (algorithmic problem). –In general, the forbidden pairs problem is NP-hard S G E K
However,.. The b,y ions have a special non-interleaving property Consider pairs (b 1,y 1 ), (b 2,y 2 ) –If (b 1 y 2
Non-Intersecting Forbidden pairs S G E K If we consider only b,y ions, ‘forbidden’ node pairs are non-intersecting, The de novo problem can be solved efficiently using a dynamic programming technique
The forbidden pairs method There may be many paths that avoid forbidden pairs. We choose a path that maximizes an objective function, –EX: the number of peaks interpreted
The forbidden pairs method Sort the PRMs according to increasing mass values. For each node u, f(u) represents the forbidden pair Let m(u) denote the mass value of the PRM u f(u)
D.P. for forbidden pairs Consider all pairs u,v –m[u] M/2 Define S(u,v) as the best score of a forbidden pair path from 0- >u, v->M Is it sufficient to compute S(u,v) for all u,v? uv
D.P. for forbidden pairs Note that the best interpretation is given by uv
D.P. for forbidden pairs Note that we have one of two cases. 1.Either u v) 2.Or, u > f(v) (and f(u) < v) Case 1. –Extend u, do not touch f(v) u f(u) v
The complete algorithm for all u /* increasing mass values from 0 to M/2 */ for all v /* decreasing mass values from M to M/2 */ if (u > f[v]) else if (u < f[v]) If (u,v) E /* maxI is the score of the best interpretation */ maxI = max {maxI,S[u,v]}