Presentation is loading. Please wait.

Presentation is loading. Please wait.

PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.

Similar presentations


Presentation on theme: "PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario."— Presentation transcript:

1 PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario

2 Outline Background Sandwich algorithm for de novo sequencing Software implementation – PEAKS

3 Background Diseases are closely related to the abnormal proteins. Given a tissue, the identification of the proteins (and their posttranslational modifications) in it is a fundamental problem in proteomics. MS/MS is the most common way for protein identification.

4 Sample Preparation tissue fraction gel peptides Add trypsin MPSER …… GTDIMR PAK …… HPLC To MS/MS

5 Tandem Mass Spectrometer Quadrupole mass analyzer collision parent ionsfragment ions MPSER SG… + PAK + + P + AK PAK + + PA + K AK + P K + PA P + K + + AK + PAK + + peptide sequencing … TOF mass analyzer ions detector ESI QTOF

6 LGSSEVEQVQLVVDGVKpeptide sequence: tandem mass spectrometry: MS/MS spectrum de novo sequencing: LGSSEVEQVQLVVDGVK database

7 How Does a Peptide Fragment? m(y 1 )=19+m(A 4 ) m(y 2 )=19+m(A 4 )+m(A 3 ) m(y 3 )=19+m(A 4 )+m(A 3 )+m(A 2 ) m(b 1 )=1+m(A 1 ) m(b 2 )=1+m(A 1 )+m(A 2 ) m(b 3 )=1+m(A 1 )+m(A 2 )+m(A 3 )

8 Matching Sequence with Spectrum

9 De Novo Sequencing (Dancik et al., JCB 6:327-342.) –Given a spectrum, a mass value M, compute a sequence P, s.t. m(P)=M, and the matching score is maximized. We consider the matching score of P is the sum of the scores of the matched peaks. –We use intensity of a peak as its score to illustrate PEAKS’ algorithm. De Novo Sequencing

10 Spectrum Graph Approach Convert the peak list to a graph. A peptide sequence corresponds to a path in the graph. Bartels (1990), Biomed. Environ. Mass Spectrom 19:363-368. Taylor and Johnson (1997). Rapid Comm. Mass Spec. 11:1067-1075. (Lutefisk) Dancik et al. (1999), JCB 6:327-342. Chen et al. (2001), JCB 8:325-337. ……

11 Warm up – Counting Only Y-ions

12 The Score of a Suffix y1y1 y2y2 y3y3 score(Q) are the sum of scores of those y-ions of Q. Let Q be a suffix of the peptide. It can determine some y-ions. 19

13 Recursive Computation of DP(m) Q’ Do not know a? a Suppose Q is such that DP(m)=score(Q). 19 score(Q’)=DP(m(Q’))

14 Dynamic Programming 1.for m from 0 to M 2.backtracking

15 Counting Both y and b Ions

16 Good News y1y1 y2y2 y3y3 b n-1 b n-2 b n-3

17 Bad News

18 Ions Determined By a Pair P=LGEY Q=LLVR score(P,Q) is the sum of matched peak intensities. A peak can only count once.

19 Chummy Pairs Two strings P and Q are called chummy pairs, iff. either of the following two is true: (C1) (C2)

20 Recursive Computation of score(P,Q) P=LGEY Q=LLVR u=m(P), v=m(Q)

21 Chummy pairs Lemma 1 – Suppose P and Q are a chummy pair. u=m(P), v=m(Q). If (C1) is true, If (C2) is true,

22 Chummy Pairs Lemma 2 – Let (P,Q) be a chummy pair, a be a letter. –(C1)  (P,aQ) is a chummy pair but (Pa,Q) is not. –(C2)  (Pa,Q) is a chummy pair but (P,aQ) is not. Lemma 3 – Let S be the optimal solution. Then there is a chummy pair (P,Q) and a letter a such that S=PaQ. Also, there is a chummy pair series such that

23 Dynamic Programming Combining Lemma 1, 2, 3, we can compute Suppose (P,Q) is the pair maximizing DP(u,v) under the condition m(P)+m(Q)+m(a)=M. Then PaQ is the optimal peptide.

24 Algorithm Sandwich DP(0,0) = 0; DP(u,v) = -infinity for (u,v)!=(0,0); for u from 1 to M/2 step d do for v from u-m(W) to u+m(W) step d do for a in Σ do if u<v then else find u,v,a, s.t. u+v+m(a)=M and DP(u,v) maximized; backtracking; Time:

25 PEAKS – The Software

26 Comparison LCQ data (Iontrap instrument): –Generously provided by Dr. Richard Johnson. 144 spectra. Micromass Q-Tof data: –Measured in UWO’s Protein ID lab. 61 spectra Sciex Q-Star data: –Provided by U. Victoria’s Genome BC Proteomics Centre. 13 good/okay spectra.

27 PEAKS v.s. Lutefisk completely correct sequences: –38/144 v.s. 15/144 correct amino acids: –1067/1702 v.s. 767/1702 v.s. partially correct sequences with 5 or more contiguous correct amino acids: –94/144 v.s. 64/144

28 PEAKS v.s. Micromass PLGS completely correct sequences: –13/61 v.s. 7/61 correct amino acids: –456/764 v.s. 232/764 partially correct sequences with 5 or more contiguous correct amino acids: –38/61 v.s. 24/61

29 PEAKS v.s. Sciex BioAnalyst completely correct sequences: –7/13 v.s. 1/13 correct amino acids: –115/150 v.s. 86/150 partially correct sequences with 5 or more contiguous correct amino acids: –12/61 v.s. 7/61

30 Users The company logos have been deleted from the original presentation. Please visit http://www.bioinformaticssolutions.com for a list of users.http://www.bioinformaticssolutions.com

31 Other Techniques Used by PEAKS Preprocess the MS/MS spectra –Deconvolution, noise reduction, and signal enhancement. –It does a better job than spectrometer vendor’s software. Recalibration –compress/stretch the spectrum for calibration error Positional Confidence –Estimate the confidence level of individual amino acids.

32 Sophisticated Ion Matching Score Score of one peak matching b ion

33 PEAKS 2.x’s Additional Feature Identify the proteins by matching the de novo (partial) sequences. Then further match the spectra with the peptides of the proteins.

34 Collaborators and References Sandwich algorithm: –B. Ma, K. Zhang, C. Liang, CPM’03. (sandwich algorithm) PEAKS: –B. Ma, K. Zhang, C. Hendrie, C. Liang, M. Li, A. Doherty-Kirby, G. Lajoie, Rapid Comm. Mass Spec. (software feature, score function, experiments) Acknowledgement: –PEAKS development team. (Bioinformatics Solutions Inc.).

35


Download ppt "PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario."

Similar presentations


Ads by Google