Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated.

Slides:



Advertisements
Similar presentations
Tandem MS (MS/MS) on the Q-ToF2
Advertisements

Statistics in Bioinformatics May 2, 2002 Quiz-15 min Learning objectives-Understand equally likely outcomes, Counting techniques (Example, genetic code,
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
1336 SW Bertha Blvd, Portland OR 97219
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
De Novo Sequencing and Homology Searching with De Novo Sequence Tags.
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland,
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.
Protein Sequencing and Identification by Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.
Building and Using Libraries of Peptide Ion Fragmentation Spectra S.E. Stein, L.E. Kilpatrick, M. Mautner, P. Neta, J. Roth National Institute of Standards.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Previous Lecture: Regression and Correlation
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
Each results report will contain:
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
Facts and Fallacies about de Novo Sequencing & Database Search.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
FDR Thresholding Caleb J. Emmons Slide: 1. What is FDR? Slide: 2 If decoy proteins are present Protein FDR = # decoy proteins identified # target proteins.
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Improving the Sensitivity of Peptide Identification Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical.
Faster, more sensitive peptide identification from tandem mass spectra by sequence database compression Nathan J. Edwards Center for Bioinformatics & Computational.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
De Novo Peptide Sequencing via Probabilistic Network Modeling PepNovo.
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
The observed and theoretical peptide sequence information Cal.MassObserved. Mass ±da±ppmStart Sequence EndSequenceIon Score C.I%modification FLPVNEK.
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
Using Scaffold OHRI Proteomics Core Facility. This presentation is intended for Core Facility internal training purposes only.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
B Monoisotopic mass of neutral peptide M r (calc): Fixed modifications: Carbamidomethyl Ions score: 45 † Expect: ‡ Matches (red): 18/50.
Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.
Database Search Algorithm for Identification of Intact Cross-Links in Proteins and Peptides Using Tandem Mass Sepctrometry 신성호.
Yiming Yang1,2, Abhay Harpale1 and Subramanian Ganaphathy1
MassMatrix Search Results Explained
Protein Identification via Database searching
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra I
Proteomics Informatics –
NoDupe algorithm to detect and group similar mass spectra.
Protein Identification Using Tandem Mass Spectrometry
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
Presentation transcript:

Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated mass measurement error (should be seen in peptide view graphs, – Incorrect determination of precursor charge state – Peptide sequence is not in the database. – Missed cleavage & unexpected cleavage, – Unexpected chemical & post-translational modification. The biological structure, function and activity of a protein can be determined by the modification of the given protein. An increasing part of the proteins that have been mapped to e.g. different diseases, not only change in expression levels but also or exclusively in the level of posttranslational modifications. 1

Post-Translational Modifications (PTMs) PTM alters the weight of amino acids and the peptide that results peak shifts in the spectrum: b 1 : H b 2 : HQ b 3 : HQS b 4 : HQSV b 5 : HQSVM … b 10 :HQSVMVGMVQ QSVMVGMVQK:y 10 SVMVGMVQK: y 9 VMVGMVQK: y 8 MVGMVQK: y 7 VGMVQK: y 6 … K: y 1 m/z b1b1 y1y1 b2b2 b3b3 y 10 b 10 b3b3 y 10 b 10 y7y7 …… H Q S V M V G M V Q K b1b1 y 10 b2b2 y9y9 b3b3 y8y8 b4b4 y7y7 b5b5 y6y6 b9b9 y5y5 b6b6 y4y4 b7b7 y3y3 b8b8 y2y2 b 10 y1y1 2

PTMs Complete modifications (chemical modifications) Variable modifications 3

PTMs Obstacles – Complexity (means longer execution time) Can increase the search space 1,10, fold – Significance 4

Obstacles - Complexity Let the theoretical peptide be: – HQSVMVGMVQK (11 amino acids) – Each amino acid can be modified by, let’s say, 5 PTMs # included PTMs# modified theoretical spectratime 011 sec 111*5 = 5555 seconds (1min) 211*25 = mins 311*15*125 = hours hours (3.5 years) In general: Peptide length = L Included PTMs = K PTMs/aa = M 5

– Inserting many PTMs make the theoretical spectra too flexible and in the end all theoretical spectra can be aligned to the experimental spectra % 0% 1 0

Significance Increases the random matches – Inserting many PTMs make the theoretical spectra too flexible and in the end all theoretical spectra can be aligned to the experimental spectra. 7

Computational Identification of PTMs 3 approaches: – Targeted, – Untargeted or also called restricted – Unrestricted, de novo, blind search 8

Targeted approach Almost all search engine supports it. – Experimenter needs to guess the PTMs in the sample. Two pass strategy – Two rounds, refinement on a smaller – Sequest, Mascot 9

Targeted approach – X!Tandem 10

Targeted approach – InsPecT 11

Untargeted approaches Uses a big list of databases – Search space is limited but can be very huge. – if we allow 5 of the 10 most frequent modifications to occur in a peptide at the same type, the search space grows 3 orders of magnitude. – The growth is more dramatic if instead of 10 types of modifications we wish to consider all of roughly 500 known types. 12

Database of PTMs Unimod – – Contains 906 modifications Resid – – 559 Entries 13

Untargeted PILOT_PTM – Uses a large dataset of modifications. – Binary Linear programming. Objective function is the number of the matched peaks Linear constrain functions are guarantee meaningful modifications of the peptide. 14

Unrestricted No priori information about PTMs. De novo identification of PTMs Search space is infinite. In practice no more than one or two PTMs can be identified on the same peptide. 15

TwinPeaks approach Based on the Sequest idea. Shifts the experimental spectra over a range, and plots the similarity score as a function of the mass shift. 16

TwinPeaks approach 17 Sum of matched intensity

MS-Alignment Based on the alignment of the theoretical spectra to the experimental spectra 18

19 Theoretical Spectrum Experimental Spectrum

MS-alignment 20

Comparison of targeted and unrestricted results 21 Scan IDlog(-E)Peptide fqyr 295 ILTAAALCHF TSIEVVK 311 kasg (130)ILTAAALCHF TSIEVVK rihr 159 FVEKPQVFVS NK 170 inag (471)FVEKPQVFVS NK rtcr 30 SPEPGPSSSI GSPQASSPPR PN 51 hyll (48)SPEPGPSSSI GSPQASSPPR PN dvtr 473 TMHFGTPTAY EK 484 ecft (306)TMHFGTPTAY EK ietk 133 FFDDDLLVST SR 144 vrlf (176)FFDDDLLVST SR pskr 237 QTNGCLNGYT PSR 249 krqa (112)QTNGCLNGYT PSR ntpr 149 KNGGLGHMNI ALLSDLTK 166 qisr (1776)KNGGLGHMNI ALLSDLTK pqgr 19 IHQIEYAMEA VK 30 qgsa (10317)IHQIEYAMEA VK kefk 80 DREDLVPYTG EK 91 rgkv (137)DREDLVPYTG EK dyhr 131 YLAEFATGND R 141 keaa (9406)YLAEFATGND R grar 16 QYTSPEEIDA QLQAEK 31 qkar (2754)QYTSPEEIDA QLQAEK rlar 172 QDPQLHPEDP ER 183 raai (644)QDPQLHPEDP ER iflh 92 ISDVEGEYVP VEGDEVTYK 110 mcsi (73)ISDVEGEYVP VEGDEVTYK mrsr 328 TASGSSVTSL DGTR 341 srsh (2698)TASGSSVTSL DGTR lgnk 29 YVQLNVGGSL YYTTVR 44 altr (71)YVQLNVGGSL YYTTVR dlqk 183 EGEFSTCFTE LQR 195 dflk (239)EGEFSTCFTE LQR pkek 135 QPVAGSEGAQ YR 146 kkql (694)QPVAGSEGAQ YR lsar 446 ASNAWILQQH IATVPSLTHL CR 467 leir (107)ASNAWILQQH IATVPSLTHL CR evyr 175 NSMPASSFQQ QK 186 lrvc (7099)NSMPASSFQQ QK iygk 81 QFEDELHPDL K 91 ftga (491)QFEDELHPDL K Scan IDP-valuePeptide 31.00E-05R.ILTAAALCHFTSIEVVK.K 61.00E-05R.FVEKPQVFVSNK.I E-05K.FFDDDLLVSTSR.V E-05R.IHQIEYAMEAVK.Q A.V+172LTAFANGR.S E-05K.QFEDELHPDLK.F R.ETFY+18LAQDFFDR.F E-05R.TCLSQLLDIMK.S E-05K.EYFSTFGEVLM+16VQVK.K E-05K.QH-18LENDPGSNEDTDIPK.G Q.L+128GVSHVFEYIR.S C.T+160EDMTEDELR.E E-05R.EFFD-18SNGNFLYR.I E-05R.LVLESPAPVEVNLK.L E-05K.LQEFAYVTDGAC+14SEEDILR.M E-05K.SFDENGFDYLLTYSDNPQTVFP+156.R E-05R.GPATVEDLPSAFEEK.A E-05Y.ITD+163VLTEEDALEILQK.G E-05R.IYSYQMALTPVVVTLWYR.A X!Tandem targeted MS-Alignment Unrestricted (de novo)

Validate your results 22

Summary What you should remember: – PTM identification is computationally expensive – 3 approaches (targeted, untargeted, unrestricted) – Always examine the results, omit weird PTMs, – Decreases the statistical significance – The more you are looking for the less you get (due to significance) 23