Download presentation
Presentation is loading. Please wait.
Published byAlivia Goodison Modified over 10 years ago
1
Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby, Chengzhi Liang, Ming Li The peptide de novo sequencing from MS/MS spectrum
2
Introduction 4 Tandem mass spectrometry (MS/MS) now plays a very important role in protein identification due to its fastness and its high sensitivity. 4 The derivation of the peptide sequence from its MS/MS spectrum is an important task in proteomics. 4 The derivation without the help from a protein database is called the de novo sequencing which is especially important in the identification of unknown protein.
3
Introduction (2) 4 The basic lab experimental steps of this method are the following: 4 1. The proteins are digested with an enzyme to produce peptides; 4 2. The peptides are charged (ionized) and separated according to their different mass to charge (m/z) ratios; 4 3. Each peptide is fragmented into fragment ions and the m/z values of the fragment ions are measured.
4
Introduction (3) 4 Both step 2 and 3 are performed within a tandem mass spectrometer. 4 Since there are many copies of each peptide being fragmented and the fragmentation can occur anywhere along the peptide, a spectrum of the observed m/z values is obtained.
5
Mass spectrum 4 For each possible fragment ion there could be a peak at the corresponding m/z value. 4 The height of the peak is proportional to the frequency of the m/z value begin observed by the mass spectrometer. 4 In general proteins consist of 20 different types of amino acids, of which most have different masses (except for one pair Leucine and Isoleucine).
6
Mass spectrum (2) 4 Consequently different peptides usually produce different spectra. 4 It is therefore possible, and now a common practice, to use the spectrum of a peptide to determine its sequence.
7
Peptide fragmentation 4 A charged peptide may be fragmented into two pieces in three ways, which may produce a pair of a- and x-ions, a pair of b- and y-ions, or a pair of c- and z-ions. 4 Theoretically, a fragmentation can occur at any place in a peptide and a spectrum is expected to contain all the possible ion peaks. 4 In practice, due to uneven strength of the bonds at different positions, different ions occur with different frequencies.
8
Peptide fragmentation (2)
9
Peptide fragmentation (3) 4 The most abundant ions are y-ions, which often form the complete series in a spectrum. 4 The next are a- and b-ions, of which many are not observed. 4 The c-, x-, and z-ions occur much less frequently. 4 In addition, these ions can often form new ions due to loss of water or loss of ammonia.
10
The approximate masses of some atoms that appear in peptides, where C 13 is the isotope of C 4 Atom C C 13 H O N 4 Mass(Dalton) 12 13 1 16 14
11
Mass of an amino acid 4 For any amino acid a, we use ||a|| to denote the mass of C 2 H 2 RNO, i.e., the amino acid a with loss of a water. 4 For P=a 1 a 2 … a k being a sequence of amino acids, let ||P|| = 1 j k ||a j ||. 4 Therefore the actual mass of peptide P is 18+||P|| because the extra H 2 O in it.
12
The approximate masses of the 20 amino acids 4 Amino acid A R N D 4 Mass (Dalton) 71.04 156.10 114.04 115.03 4 Amino acid C E Q G 4 Mass(Dalton) 103.01 129.04 128.06 57.02 4 Amino acid H I L K 4 Mass (Dalton) 137.06 113.08 113.08 128.09 4 Amino acid M F P S 4 Mass (Dalton) 131.04 147.07 97.05 87.03 4 Amino acid T W Y V 4 Mass (Dalton) 101.05 186.08 163.06 99.07
13
The hypothetical spectrum of P 4 Let A=a 1 a 2 … a n be a sequence of amino acids, we introduce two notations: ||A|| b = 1+||A|| ||A|| y =19+||A||
14
The hypothetical spectrum of P (2) 4 Let b i be the mass of the b-ion of P with i amino acids, then b i = ||a 1 a 2 …a i || b (1 i < k). 4 Let y i be the mass of the y-ion of P with i amino acids, then y i =||a k-i+1 …a k || y (1 i < k). Clearly, y k-i +b i =20+||P||
15
The hypothetical spectrum of P (3) 4 Around each y-ion peak, it is possible to have other peaks. 4 For each y-ion with mass x, the corresponding x- ion and z-ion weigh x+26 and x-17. 4 An ion may loss a water to generate a peak at mass x-18. 4 An ion with mass x usually has a peak at x+1 corresponding to the isotopic ion which contains a C 13 in it.
16
The hypothetical spectrum of P (4) 4 Therefore, for each y-ion with mass x, there are possible peaks at the masses in the following set. 4 Y(x)={x-18,x-17,x,x+1,x+26} 4 Similarly for each b-ion with mass x, the possible masses are from the following set. 4 B(x)={x-28,x-18,x,x+1,x+17}
17
The hypothetical spectrum of P (5) 4 Therefore, the hypothetical spectrum of the peptide P has peaks at each mass in the following set. 4 S(P)= 0<i< n B(b i ) Y(y i )
18
The de novo sequencing problem 4 Let P be a peptide and M=||P||+20. 4 Given a solution containing peptide P, a tandem mass spectrometer can measure a peak list L. 4 L is a set of 2-mers {(x i,h i )| 0 < i < n+1} where 0 < x 1 < … < x n are the masses and h i is the intensity of the peak at x i. 4 The total mass of P=M-2 can also be measured.
19
The de novo sequencing problem (2) 4 The masses given by the spectrometer are not accurate. 4 The maximum error varies from 0.01 dalton to 0.5 dalton depending on the type of spectrometer used.
20
The de novo sequencing problem (3) 4 Let be the error of the spectrometer. 4 Let S be a set of masses, we say a peak (x,h) in L is supported by S if there is a y in S such that |x-y| < . 4 The subset of peaks in L supported by S is denoted by L S. 4 L S ={(x,h) L|there is y S s.t. |x-y|< }
21
The de novo sequencing problem (4) 4 Therefore L S(P) consists of all the peaks in L that are supported by the masses of the hypothetical ions of P 4 The more peaks with high intensity are in L S(P), the more likely L is the mass spectrum of P.
22
The de novo sequencing problem (5) 4 For any peak list L’, we define h(L’)= (x,h) L’ h 4 The de novo sequencing problem is defined as the follows. 4 Given a mass spectrum L, a positive number M, and an error bound , to construct a peptide P so that | ||P||+20-M | < and h(L S(P) ) is maximized.
23
Algorithms 4 There are two major difficulties of the de novo sequencing problem. 4 First, each fragmentation may produce a pair of ions. 4 This means that both ends of the spectrum must be consider at the same time.
24
Algorithms (2) 4 Second, the types of the peaks is unknown and a peak may be matched by zero, one or two different types of ions. 4 When a peak is matched by two ions, the height of the peak can only be counted once
25
Algorithms (3) 4 The straightforward approach to “grow” the peptide from one terminal to the other does not work. 4 We use a more sophisticated dynamic programming algorithm for the de novo sequencing problem. 4 Our algorithm gradually “grow” a prefix and a suffix of the optimal solution in a carefully designated pathway until the prefix and the suffix are sufficiently long to form the optimal solution.
26
Experiments 4 Our model and algorithm account for most of the ion types that have been observed in practice. 4 Overlap of two different ions are correctly modeled. 4 Tolerant the mass error and handle the missing ions in the spectrum.
27
Experiments (2) 4 Experimental results demonstrated that our algorithm performed extremely well. 4 The program has been integrated into a software package, peaks, which is now online accessible at http://www.BioinformaticsSolutions.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.