PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,
Part II. Tandem MS. Mass filter; complete spectrum is obtained by scanning whole range Ions are lost Mass range 10- 4,000 Da Mass Analyzer (2) – Quadrupole.
Protein Sequencing and Identification by Mass Spectrometry.
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
De Novo Sequencing and Homology Searching with De Novo Sequence Tags.
Database Searches. Peptide mass fingerprinting digestMS Search HIT SCORE Protein X 1000 Protein Y 50 Protein Z 5 Protein X theoretical digestProtein Y.
PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland,
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
Protein Sequencing and Identification by Mass Spectrometry.
Fa 05CSE182 CSE182-L7 Protein sequencing and Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.
The restriction mapping problem revisited Gopal Pandurangan and H. Ramesh Journal of Computer and System Sciences 526~544(2002)
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
ProReP - Protein Results Parser v3.0©
Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012.
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
De Novo Sequencing of MS Spectra
1 An Algorithmic Approach to Peptide Sequencing via Tandem Mass Spectrometry Ming-Yang Kao Department of Computer Science Northwestern University Evanston,
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
Facts and Fallacies about de Novo Sequencing & Database Search.
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
Proteomics Josh Leung Biology 1220 April 13 th, 2010.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Comparison of chicken light and dark meat using LC MALDI-TOF mass spectrometry as a model system for biomarker discovery WP 651 Jie Du; Stephen J. Hattan.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Acknowledgements This work is supported by NSF award DBI , and National Center for Glycomics and Glycoproteomics, funded by NIH/NCRR grant 5P41RR
Common parameters At the beginning one need to set up the parameters.
Algorithmic Problems in Peptide Sequencing
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
CS 461b/661b: Bioinformatics Tools and Applications Software Algorithm Mathematical Models Biology Experiments and Data.
Temple University MASS SPECTROMETRY INTRODUCTION Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Peptide Identification via Tandem Mass Spectrometry Sorin Istrail.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
De Novo Peptide Sequencing via Probabilistic Network Modeling PepNovo.
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
Minimize Database-Dependence in Proteome Informatics Apr. 28, 2009 Kyung-Hoon Kwon Korea Basic Science Institute.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Yonsei Proteome Research Center Peptide Mass Finger-Printing Part II. MALDI-TOF 2013 생화학 실험 (1) 6 주차 자료 임종선 조교 내선 6625.
Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.
Post translational modification n- acetylation Peptide Mass Fingerprinting (PMF) is an analytical technique for identifying unknown protein. Proteins to.
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
Bioinformatics Solutions Inc.
Proteomics Informatics David Fenyő
NoDupe algorithm to detect and group similar mass spectra.
Shotgun Proteomics in Neuroscience
Proteomics Informatics David Fenyő
Kuen-Pin Wu Institute of Information Science Academia Sinica
Presentation transcript:

PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario

Outline Background Sandwich algorithm for de novo sequencing Software implementation – PEAKS

Background Diseases are closely related to the abnormal proteins. Given a tissue, the identification of the proteins (and their posttranslational modifications) in it is a fundamental problem in proteomics. MS/MS is the most common way for protein identification.

Sample Preparation tissue fraction gel peptides Add trypsin MPSER …… GTDIMR PAK …… HPLC To MS/MS

Tandem Mass Spectrometer Quadrupole mass analyzer collision parent ionsfragment ions MPSER SG… + PAK + + P + AK PAK + + PA + K AK + P K + PA P + K + + AK + PAK + + peptide sequencing … TOF mass analyzer ions detector ESI QTOF

LGSSEVEQVQLVVDGVKpeptide sequence: tandem mass spectrometry: MS/MS spectrum de novo sequencing: LGSSEVEQVQLVVDGVK database

How Does a Peptide Fragment? m(y 1 )=19+m(A 4 ) m(y 2 )=19+m(A 4 )+m(A 3 ) m(y 3 )=19+m(A 4 )+m(A 3 )+m(A 2 ) m(b 1 )=1+m(A 1 ) m(b 2 )=1+m(A 1 )+m(A 2 ) m(b 3 )=1+m(A 1 )+m(A 2 )+m(A 3 )

Matching Sequence with Spectrum

De Novo Sequencing (Dancik et al., JCB 6: ) –Given a spectrum, a mass value M, compute a sequence P, s.t. m(P)=M, and the matching score is maximized. We consider the matching score of P is the sum of the scores of the matched peaks. –We use intensity of a peak as its score to illustrate PEAKS’ algorithm. De Novo Sequencing

Spectrum Graph Approach Convert the peak list to a graph. A peptide sequence corresponds to a path in the graph. Bartels (1990), Biomed. Environ. Mass Spectrom 19: Taylor and Johnson (1997). Rapid Comm. Mass Spec. 11: (Lutefisk) Dancik et al. (1999), JCB 6: Chen et al. (2001), JCB 8: ……

Warm up – Counting Only Y-ions

The Score of a Suffix y1y1 y2y2 y3y3 score(Q) are the sum of scores of those y-ions of Q. Let Q be a suffix of the peptide. It can determine some y-ions. 19

Recursive Computation of DP(m) Q’ Do not know a? a Suppose Q is such that DP(m)=score(Q). 19 score(Q’)=DP(m(Q’))

Dynamic Programming 1.for m from 0 to M 2.backtracking

Counting Both y and b Ions

Good News y1y1 y2y2 y3y3 b n-1 b n-2 b n-3

Bad News

Ions Determined By a Pair P=LGEY Q=LLVR score(P,Q) is the sum of matched peak intensities. A peak can only count once.

Chummy Pairs Two strings P and Q are called chummy pairs, iff. either of the following two is true: (C1) (C2)

Recursive Computation of score(P,Q) P=LGEY Q=LLVR u=m(P), v=m(Q)

Chummy pairs Lemma 1 – Suppose P and Q are a chummy pair. u=m(P), v=m(Q). If (C1) is true, If (C2) is true,

Chummy Pairs Lemma 2 – Let (P,Q) be a chummy pair, a be a letter. –(C1)  (P,aQ) is a chummy pair but (Pa,Q) is not. –(C2)  (Pa,Q) is a chummy pair but (P,aQ) is not. Lemma 3 – Let S be the optimal solution. Then there is a chummy pair (P,Q) and a letter a such that S=PaQ. Also, there is a chummy pair series such that

Dynamic Programming Combining Lemma 1, 2, 3, we can compute Suppose (P,Q) is the pair maximizing DP(u,v) under the condition m(P)+m(Q)+m(a)=M. Then PaQ is the optimal peptide.

Algorithm Sandwich DP(0,0) = 0; DP(u,v) = -infinity for (u,v)!=(0,0); for u from 1 to M/2 step d do for v from u-m(W) to u+m(W) step d do for a in Σ do if u<v then else find u,v,a, s.t. u+v+m(a)=M and DP(u,v) maximized; backtracking; Time:

PEAKS – The Software

Comparison LCQ data (Iontrap instrument): –Generously provided by Dr. Richard Johnson. 144 spectra. Micromass Q-Tof data: –Measured in UWO’s Protein ID lab. 61 spectra Sciex Q-Star data: –Provided by U. Victoria’s Genome BC Proteomics Centre. 13 good/okay spectra.

PEAKS v.s. Lutefisk completely correct sequences: –38/144 v.s. 15/144 correct amino acids: –1067/1702 v.s. 767/1702 v.s. partially correct sequences with 5 or more contiguous correct amino acids: –94/144 v.s. 64/144

PEAKS v.s. Micromass PLGS completely correct sequences: –13/61 v.s. 7/61 correct amino acids: –456/764 v.s. 232/764 partially correct sequences with 5 or more contiguous correct amino acids: –38/61 v.s. 24/61

PEAKS v.s. Sciex BioAnalyst completely correct sequences: –7/13 v.s. 1/13 correct amino acids: –115/150 v.s. 86/150 partially correct sequences with 5 or more contiguous correct amino acids: –12/61 v.s. 7/61

Users The company logos have been deleted from the original presentation. Please visit for a list of users.

Other Techniques Used by PEAKS Preprocess the MS/MS spectra –Deconvolution, noise reduction, and signal enhancement. –It does a better job than spectrometer vendor’s software. Recalibration –compress/stretch the spectrum for calibration error Positional Confidence –Estimate the confidence level of individual amino acids.

Sophisticated Ion Matching Score Score of one peak matching b ion

PEAKS 2.x’s Additional Feature Identify the proteins by matching the de novo (partial) sequences. Then further match the spectra with the peptides of the proteins.

Collaborators and References Sandwich algorithm: –B. Ma, K. Zhang, C. Liang, CPM’03. (sandwich algorithm) PEAKS: –B. Ma, K. Zhang, C. Hendrie, C. Liang, M. Li, A. Doherty-Kirby, G. Lajoie, Rapid Comm. Mass Spec. (software feature, score function, experiments) Acknowledgement: –PEAKS development team. (Bioinformatics Solutions Inc.).