Algorithmic Problems in Peptide Sequencing

Slides:



Advertisements
Similar presentations
Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,
Advertisements

Protein Sequencing and Identification by Mass Spectrometry.
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
1336 SW Bertha Blvd, Portland OR 97219
Proteomics Informatics – Protein identification III: de novo sequencing (Week 6)
De Novo Sequencing and Homology Searching with De Novo Sequence Tags.
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland,
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
Protein Sequencing and Identification by Mass Spectrometry.
Fa 05CSE182 CSE182-L7 Protein sequencing and Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.
Pox Proteomics: Identification of the Proteins Associated with the Membranous Fraction of Vaccinia Virus Cliff Gagnier Dr. Dennis Hruby Department of Microbiology.
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.
Mass Spectrometry Peptide identification
The restriction mapping problem revisited Gopal Pandurangan and H. Ramesh Journal of Computer and System Sciences 526~544(2002)
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012.
Analysis of tandem mass spectra - I Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Previous Lecture: Regression and Correlation
De Novo Sequencing of MS Spectra
1 An Algorithmic Approach to Peptide Sequencing via Tandem Mass Spectrometry Ming-Yang Kao Department of Computer Science Northwestern University Evanston,
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
Scaffold Download free viewer:
My contact details and information about submitting samples for MS
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
PROTEIN QUANTIFICATION AND PTM JUN SIN HSS.I. PROJECT 1.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.
Temple University MASS SPECTROMETRY FURTHER INVESTIGATIONS Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.
Temple University MASS SPECTROMETRY INTRODUCTION Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Peptide Identification via Tandem Mass Spectrometry Sorin Istrail.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Aggressive Enumeration of Peptide Sequences for MS/MS Peptide Identification Nathan Edwards Center for Bioinformatics and Computational Biology.
Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
Protein Identification Using Tandem Mass Spectrometry Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College.
De Novo Peptide Sequencing via Probabilistic Network Modeling PepNovo.
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
B Monoisotopic mass of neutral peptide M r (calc): Fixed modifications: Carbamidomethyl Ions score: 45 † Expect: ‡ Matches (red): 18/50.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra I
Proteomics Informatics –
Protein Identification Using Tandem Mass Spectrometry
Shotgun Proteomics in Neuroscience
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
(Journal of Computational Biology, 2001) (SODA, 2000)
Presentation transcript:

Algorithmic Problems in Peptide Sequencing

De Novo Sequencing for Peptide Identificaiton Outline Basics of Proteomics Roles and Anatomy of Proteins Tandem Mass Spectrometry Algorithms for Peptide Identifications De Novo Sequencing An Algorithm for Perfect Spectra Peptide Identification in Real World Discussions De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton Briefings We mainly focus on the following result: Ting Chen, Ming-Yang Kao, Matthew Tepel, John Rush and George Church, A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry, Journal of Computational Biology, 8(3): 325-337, 2001. Its preliminary version also appears in The 11th Annual SIAM-ACM Symposium on Discrete Algorithms (SODA 2000), page 389-398, 2000. One of the most-cited algorithm articles in the computational proteomics community. De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton Outline Basics of Proteomics Roles and Anatomy of Proteins Tandem Mass Spectrometry Algorithms for Peptide Identifications De Novo Sequencing An Algorithm for Perfect Spectra An Improved Version Peptide Identification in Real World Discussions De Novo Sequencing for Peptide Identificaiton

Anatomy of Protein Molecules Neutral peptide Residue (of the peptides) H H O H O NH C C OH NH C C Rx Rx Stable state in nature Basic building blocks De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton Proteins and Peptides H O C O N C R4 H O H H H H H2 N C C N C C N C N C COOH R1 H R2 O R3 H R5 arginine (R) or lysine (K) trypsin + H2O K 146.19 128.17 N COOH C R5 H R3 O R4 R 174.13 156.11 如無意外,應該要切在 K\R N C R1 H2 H O R2 OH Rectangles stand for amino acid residues De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton Amino Acid Molecules Please visit http://www.ionsource.com/ for more information. De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton Outline Basics of Proteomics Roles and Anatomy of Proteins Tandem Mass Spectrometry Algorithms for Peptide Identifications De Novo Sequencing An Algorithm for Perfect Spectra Peptide Identification in Real World Discussions De Novo Sequencing for Peptide Identificaiton

Tandem Mass Spectrometry Mass Spectrometers measure the mass of charged ions. A mass spectrometer has 3 major components. Ionizer Sample + _ Mass Analyzer Detector Adapted from Nathan Edwards’ slides De Novo Sequencing for Peptide Identificaiton

Proteomics via Mass Spectrometers Enzymatic Digest and Fractionation First stage MS Nobel Prize in 2002. MS/MS Precursor selection and dissociation Adapted from Nathan Edwards’ slides De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton Outline Basics of Proteomics Roles and Anatomy of Proteins Tandem Mass Spectrometry Algorithms for Peptide Identification De Novo Sequencing An Algorithm for Perfect Spectra Peptide Identification in Real World Discussions De Novo Sequencing for Peptide Identificaiton

Peptide Identification Given: A MS/MS spectrum (m/z, intensity, possibly along with its retention time) The precursor mass Output: The amino-acid sequence of the peptide Imagine a deck of cards that you can cut many times and obtains the sums of the upper or lower half De Novo Sequencing for Peptide Identificaiton

Peptide Fragmentation Mechanism N-Terminus C-Terminus b-ions y-ions B: 1+w(aa), Y = 19+w(aa) m/z L G E R b-ions y-ions De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton Peaks in a Spectrum Peptide: L – G – E – R Weight Ion Amino Acids 114.2 b1 L GER y3 361.3 171.2 b2 LG ER y2 304.3 300.3 b3 LGE R y1 175.2 De Novo Sequencing for Peptide Identificaiton

Manual De Novo Sequencing 667.27-536.24=131.03 Molecular weight of M 128.09 ≈147.11-19 Molecular weight of K De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton Outline Basics of Proteomics Roles and Anatomy of Proteins Tandem Mass Spectrometry Algorithms for Peptide Identification De Novo Sequencing An Algorithm for Perfect Spectra Peptide Identification in Real World Discussions De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton De Novo: From the beginning in Latin. Database search tools match against known peptides. Problem Definitions: Given a spectrum ( a set of real intervals ), a mass value M, compute a sequence P, ( a set of real number with specific order) s.t. m(P)=M, and the matching score is maximized. m(P) is the sum of residue mass. M De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing: An Ideal Case An ideal tandem mass spectrum is noise-free and contains only b- and y-ions, and every mass peak has the same height. The task is to find paths connecting two endpoints on a directed acyclic graph. The problem is : how to construct the ion ladder? We can model this problem as a partial digest problem. M De Novo Sequencing for Peptide Identificaiton

Ion Ladders in an Ideal Case Based on an ideal ion ladder, we can determine the sequence by concatenating prefixes (or suffixes) in order. However, we cannot determine the ion type of a peak before identifying it. m/z y1 y2 y3 L G E R Given only L+ , ER+, LGE+, R+ De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton NC-Spectrum Model We generate a (superset of ) ladder of ions. A Trick: Even if we cannot determine the ion types, we know that an ion is either b-ion or y-ion. Assume that we want to generate b-ion ladder. If a peak is a b-ion, add the peak value to the list. If a peak is a y-ion, add the complementary b-ion value to the list. This phase doubles the number of peaks. De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton NC-Spectrum Model For the peptide sequence LGRE, we construct all possible b-ions with respect to current spectrum. {P1, Q3, P4} or {P2, P3, Q1} are both complete ladders. m P1 P2 P3 P4 L R ER LGE Q2 Q1 Q4 Q3 m/2 LG GER Pi: observed peaks Qi: artificial peaks De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton NC-Spectrum Model Given a peak list = {P1,P2,P3, … , Pk} The coordinates of all points along the line: Pk – 1 Qk = M – Pk+1 (why?) We still have to add two endpoints: M – 18 Since the ion loses a Hydrogen (M – (Pk – 1 ) ) - 1 Please recall peptide fragmentation mechanism De Novo Sequencing for Peptide Identificaiton

NC Spectrum Model: A Summary We are given k peaks. Now we have at most 2k+2 vertices. Two vertices are adjacent if their coordinates differ by the weight of some amino acid. The spectrum graph can be constructed in O(n2). (Why?) The de novo sequencing is to search a path (or paths) representing a good path from coordinate 0 to M-18. Such a path is not necessarily an ion ladder, though. 我們只能說, prefix 中間攙雜 suffix 或 suffix 摻雜 prefix 的話,成功接起來成為一條 path 的機會比較不那麼高。但也不能避免歪打正著。 De Novo Sequencing for Peptide Identificaiton

Dynamic Programming Strategy Dynamic Programming can solve this problem efficiently. Uni-directional (forward) DP does not work since it could produce a solution containing both candidates for each peak. Q2 Q1 Q4 Q3 m m/2 P1 P2 P3 P4 De Novo Sequencing for Peptide Identificaiton

Dynamic Programming Strategy (Cont’d) Dynamic Programming can solve this problem efficiently using a different encoding scheme. We approach the middle part from both end sides. Q2 Q1 Q4 Q3 m m/2 P1 P2 P3 P4 De Novo Sequencing for Peptide Identificaiton

Dynamic Programming Strategy (Cont’d) Mass(b-ion) + Mass(y-ion) = PrecursorMass +2 These b-ion candidates are nested pairs in the spectrum graph. m m/2 De Novo Sequencing for Peptide Identificaiton

Relabeling the Vertices To encode the spectrum graph by the nested pairs, we need to relabel the vertex number. {0 = x0, x1, x2, …, xk, yk, …, y2, y1, y0 = m} xi and yi are both generated from the same peak. We go one level further in each iteration. 另一個好處是,你只要避免在內縮的過程中同時選到 xk 和 yk, 就可以滿足一對只取一個的條件了。 m m/2 x0 xk yk y0 De Novo Sequencing for Peptide Identificaiton

How Dynamic Programming Works We design the |V|×|V| matrix M for representing partial path candidates. M(i, j) = 1 iff [xo, xi] and [yj, yo] can occur simultaneouly in a legal path. For 1≦ s ≦ i, 1 ≦ s ≦ j, s occurs exactly once in the determined partial path. ? xi yj m m/2 De Novo Sequencing for Peptide Identificaiton

How Dynamic Programming Works (Cont’d) x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 m/2 m M(0,0) = 1 x0 y0 M(0,1) = 1 x0 y1 y0 M(1,0) = 1 x0 x1 y0 De Novo Sequencing for Peptide Identificaiton

How Dynamic Programming Works (Cont’d) x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 m/2 m x0 y0 y1 x1 M(0,1) = 1 M(1,0) = 1 M(2,0) = 0 x0 x1 x2 y0 M(1,0) =1 , but we cannot reach x2 from x0 nor x1. M(2,1) = 1 x0 x2 y1 y0 M(0,1) =1 , and we can reach x2 from x0. De Novo Sequencing for Peptide Identificaiton

How Dynamic Programming Works (Cont’d) x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 m/2 m x0 y0 y1 x1 M(0,1) = 1 M(1,0) = 1 M(0,2) = 0 x0 y2 y1 y0 M(0,1) =1 , but we cannot reach y2 from y0 nor y1. M(1, 2) = 1 x0 x1 y2 y0 M(1,0) =1 , and we can reach y2 from y0. De Novo Sequencing for Peptide Identificaiton

Dynamic Programming: Preview In the i-th iteration, we determine and record all possible (partial) paths in [0, xi] and [ yi, m]. m/2 m 第 j 個 iteration 等於把所有邊界為 xj 或者 yj 的 path 都檢查一次 … … x0 xi-1 yt y0 xi or yi? t < i-1 … … x0 xi-1 yt y0 xi yi De Novo Sequencing for Peptide Identificaiton

Dynamic Programming: Preview(Cont’d) Path extension How can we reach yi? To calculate M(xj, yi) for all j < i, For every j < i, check if yi is adjacent to yt and M(xj, yt) = 1, for some t < i Then M(xj, yi) = 1. Otherwise, it is 0. … … x0 xj yi yt y0 … … x0 xj yi yt y0 De Novo Sequencing for Peptide Identificaiton

Dynamic Programming: Preview(Cont’d) Path extension Similarly, how can we reach xi? To calculate M(xi, yj) for all j < i, For every j < i, check if xi is adjacent to xt and M(xt, yj) = 1, for some t < i Then define M(xi, yj) =1. … … x0 xt xi yj y0 … … x0 xt xi yj y0 De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton Dynamic Programming m/2 m x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 M y0 y1 y2 y3 y4 x0 x1 x2 x3 x4 De Novo Sequencing for Peptide Identificaiton

Dynamic Programming: Initialization x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 M y0 y1 y2 y3 y4 x0 1 x1 x2 x3 x4 De Novo Sequencing for Peptide Identificaiton

Dynamic Programming: 1st iteraton We then compute M(1,0) and M(0,1). m/2 m x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 M y0 y1 y2 y3 y4 x0 1 x1 x2 x3 x4 Check the arcs (x0, x1) and (y1, y0) De Novo Sequencing for Peptide Identificaiton

Dynamic Programming: Recursion (a) For j = 2 to k For i = 0 to j-2 (a) If M(i, j-1) = 1 and edge(Xi, Xj) = 1, then M(j, j-1) = 1. m/2 m x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 從 y-ion 那邊的觀點,逐一檢查能不能把 x 那一端的界線拉到 (j) M y0 y1 y2 y3 y4 x0 1 x1 x2 x3 x4 Can we adjust the leftmost endpoint to xj? De Novo Sequencing for Peptide Identificaiton

Dynamic Programming: Recursion (b) For j = 2 to k For i = 0 to j-2 (b) If M(i, j-1) = 1 and edge(Yj, Yj-1) = 1, then M(i, j) = 1. m/2 m x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 從 y-ion 那邊的觀點,逐一檢查能不能把 y 那一端既有的界線從 j-1 拉到 (j) M y0 y1 y2 y3 y4 x0 1 x1 x2 x3 x4 Can we adjust the rightmost endpoint to yj? De Novo Sequencing for Peptide Identificaiton

Dynamic Programming: Recursion (c) For j = 2 to k For i = 0 to j-2 (c) If M(j-1,i) = 1 and edge(Xj-1, Xj) = 1, then M(j, i) = 1. m/2 m x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 從b-ion 那邊的觀點,逐一檢查能不能把 b 這一端既有的界線自 j-1 拉到 (j) M y0 y1 y2 y3 y4 x0 1 x1 x2 x3 x4 Can we adjust the leftmost endpoint to xj? De Novo Sequencing for Peptide Identificaiton

Dynamic Programming: Recursion (d) For j = 2 to k For i = 0 to j-2 (d) If M(j-1, i) = 1 and edge(Yi, Yj) = 1, then M(j-1, j) = 1. m/2 m x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 從 b-ion 那邊的觀點,逐一檢查能不能把 y 那一端的界線拉到 (j) M y0 y1 y2 y3 y4 x0 1 x1 x2 x3 x4 Can we adjust the rightmost endpoint to yj? De Novo Sequencing for Peptide Identificaiton

Dynamic Programming (Cont’d) Now for j = 3 m/2 m x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 從 y-ion 那邊的觀點,逐一檢查能不能把 x 那一端的界線拉到 (j) M y0 y1 y2 y3 y4 x0 1 x1 x2 x3 x4 De Novo Sequencing for Peptide Identificaiton

Dynamic Programming (Cont’d) Now for j = 4 m/2 m x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 從 y-ion 那邊的觀點,逐一檢查能不能把 x 那一端的界線拉到 (j) M y0 y1 y2 y3 y4 x0 1 x1 x2 x3 x4 De Novo Sequencing for Peptide Identificaiton

Dynamic Programming: Constructing the Answer Legal path: Starting our search from the outermost regions ( the last row/column): [x4, y4] -> [x3, y3] -> [x2, y2] ->[x1, y1] We backtrack M to search each edge corresponding to the feasible solution m/2 m x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 如果 M(k,k-1) = 1, 向上找 M(i, k-1) =1 的最大 I; 如果 M(k, j) 才等於1, j < k-1, 請記得,從 k 到 j 之間這些層還是要走的,所以:先退一圈。M(k-1, j) = 1 M y0 y1 y2 y3 y4 x0 1 x1 x2 x3 x4 De Novo Sequencing for Peptide Identificaiton

Dynamic Programming: Review Chen et al. create a new NC-specturm graph G=(V, E), where V=2k+2 and k is the number of mass peaks (ions). Given the NC-spectrum graph, we can solve the ideal de novo peptide sequencing problem in O(|V|2) time and O(|V|2) space. M construction : O(|V|2) time Constructing a feasible solution : O(|V|) time Therefore we find a feasible solution in O(|V|2) time and O(|V|2) space. De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton Outline Basics of Proteomics Roles and Anatomy of Proteins Tandem Mass Spectrometry Algorithms for Peptide Identification De Novo Sequencing An Algorithm for Perfect Spectra Peptide Identification in Real World Discussions De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton Noises in Real Spectra The de novo strategy is too fragile to handle frequent errors. False negative peaks Missing ions will break the path. The algorithms may find wrong paths by concatenating two partial paths. False positive peaks The main critique of de novo strategy Peak value is not the ion mass Peak values represent the mass over charge value of ions. It relies on the vendor. (Applied Biosystem) De Novo Sequencing for Peptide Identificaiton

False Positives in Real Spectra Different types of ions a-x, b-y, c-z Internal fragments/immonium ions Neutral losses Neutral loss of water (~18Da) Neutral loss of ammonia (~17Da) PTM (like adding new letters) Phosphorylation, glycopeptides Isotopes Unpurified samples 前幾點或許還可以靠信號處理幫忙。但最後一點真的很麻煩。 De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton Database Search Tools MASCOT: http://www.matrixscience.com/ The de facto identification tool De Novo Sequencing for Peptide Identificaiton

Database Search Tools (Cont’d) Brian Searle of Proteome Software informs us: De Novo Sequencing for Peptide Identificaiton

Peptide and Protein Identification A brief comparison of popular tools Scoring Strategy Representatives Correlation, Z-score, posterior probabilities SEQUEST, MS-Tag, Scope, CIDentify, Popitam, ProbID, and PepSearch Statistical significance: E-values or P- values Mascot, Sonar, InsPecT, OMSSA, and X!Tandem De Novo Sequencing Pseudo-peaks PEAKS Spectrum graphs Lutefisk, PepNovo, AUDENS Statistical models NovoHMM De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton Outline Basics of Proteomics Roles and Anatomy of Proteins Tandem Mass Spectrometry Algorithms for Peptide Identification De Novo Sequencing An Algorithm for Perfect Spectra An Improved Version Peptide Identification in Real World Discussions De Novo Sequencing for Peptide Identificaiton

De Novo Sequencing for Peptide Identificaiton Wrap Up The MS/MS measures the mass of fragment ions. A single stage MS measures a collection of peptide. We generate ion ladders to reconstruct peptide sequence. Masses are more reliable than intensities. De novo sequencing is an elegant strategy, but not robust. We need some signal preprocessing strategies. Database search tools cannot handle novel proteins, and results from different tools are often inconsistent. Integration of the two above methods may be a possible way. De Novo Sequencing for Peptide Identificaiton

Some Guys You May Wish to Know Affiliation Principal Investigators Topics ETH at Zurich Ruedi Aebersold Peptide-atlas, statistical significance estimation UCSD Pavel Pevzner, Vineet Bafna De novo sequencing: Multi-spectra alignment Waterloo Bin Ma De novo sequencing: SPIDER, PEAKS NIH Yi-Kuo Yu Signal calibration, statistical significance estimation Xerox Andrew Goldberg, Marshall Bern PTM Georgetown Nathan Edwards Peptide identification USC Tim Chen De Novo Sequencing De Novo Sequencing for Peptide Identificaiton