Facts and Fallacies about de Novo Sequencing & Database Search.

Slides:



Advertisements
Similar presentations
Tandem MS (MS/MS) on the Q-ToF2
Advertisements

De novo glycan structure search with CID MS/MS spectra of native N-glycopeptides Hannu Peltoniemi
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
N-Glycopeptide Identification from CID Tandem Mass Spectra using Glycan Databases and False Discovery Rate Estimation Kevin B. Chandler, Petr Pompach,
De Novo Sequencing and Homology Searching with De Novo Sequence Tags.
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
Database Searches. Peptide mass fingerprinting digestMS Search HIT SCORE Protein X 1000 Protein Y 50 Protein Z 5 Protein X theoretical digestProtein Y.
Mass Fingerprint. Protease A protease is any enzyme that conducts proteolysis, that is, begins protein catabolism by hydrolysis of the peptide bonds that.
PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland,
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.
Building and Using Libraries of Peptide Ion Fragmentation Spectra S.E. Stein, L.E. Kilpatrick, M. Mautner, P. Neta, J. Roth National Institute of Standards.
ProReP - Protein Results Parser v3.0©
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
Scaffold Download free viewer:
My contact details and information about submitting samples for MS
Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.
Antibody Sequencing by LC-MS/MS Paul Shan Bioinformatics Solutions Inc.
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Production of polypeptides, Da, and middle-down analysis by LC-MSMS Catherine Fenselau 1, Joseph Cannon 1, Nathan Edwards 2, Karen Lohnes 1,
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Common parameters At the beginning one need to set up the parameters.
Novel Empirical FDR Estimation in PepArML David Retz and Nathan Edwards Georgetown University Medical Center.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Laxman Yetukuri T : Modeling of Proteomics Data
Search Engine Result Combining Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
Peptidesproteinsgenes protein accessionsharedsharedunique gene nameshareduniqueunique Identified by gene unique peptides Identified by protein and gene.
PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.
FDR Thresholding Caleb J. Emmons Slide: 1. What is FDR? Slide: 2 If decoy proteins are present Protein FDR = # decoy proteins identified # target proteins.
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
A Reference Library of Peptide Ion Fragmentation Spectra: Yeast S.E. Stein, L.E. Kilpatrick, P. Neta, Q.L. Pu, J. Roth, X. Yang National Institute of Standards.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
Glycoprotein Microheterogeneity via N-Glycopeptide Identification Kevin Brown Chandler, Petr Pompach, Radoslav Goldman, Nathan Edwards Georgetown University.
INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.
Improving the Sensitivity of Peptide Identification Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical.
Faster, more sensitive peptide identification from tandem mass spectra by sequence database compression Nathan J. Edwards Center for Bioinformatics & Computational.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated.
Doug Raiford Phage class: introduction to sequence databases.
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
The observed and theoretical peptide sequence information Cal.MassObserved. Mass ±da±ppmStart Sequence EndSequenceIon Score C.I%modification FLPVNEK.
Dynamic programming with more complex models When gaps do occur, they are often longer than one residue.(biology) We can still use all the dynamic programming.
Peptide-assisted annotation of the Mlp genome Philippe Tanguay Nicolas Feau David Joly Richard Hamelin.
Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. The PepArML meta-search.
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
Constructing high resolution consensus spectra for a peptide library
Minimize Database-Dependence in Proteome Informatics Apr. 28, 2009 Kyung-Hoon Kwon Korea Basic Science Institute.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Hanyang Univ. Introduction to Data Analyses for Mass Spectrometry-based Proteomics 1.
Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.
MassMatrix Search Results Explained
Protein Identification via Database searching
Proteomics Lecture 4 Proteases.
Bioinformatics Solutions Inc.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra I
Proteomics Informatics –
Presentation transcript:

Facts and Fallacies about de Novo Sequencing & Database Search

1. There are a large number of high quality spectra left unassigned after DB search. True False

Unassigned Spectra in ABRF/iPRG 2011 Study

Unassigned Spectra Nonspecific trypsin cleavages Novel peptide/incomplete database PTM Mutations PEAKS PTM SPIDER PEAKS DB De novo sequencing

2. Nonspecific cleavage, PTM, mutations and novel peptides are the main reasons for the unassigned spectra. True False

Average Software Misses Peptides

3. De novo sequencing is slow. True False

Speed PEAKS 6 de novo sequence 15 spec/second. – Intel i7 Quad Core, 8GB RAM. – Trypsin – Orbitrap CID MS/MS, mostly charge +2/+3 PEAKS 7 (coming soon): – Improve speed on high charge states and longer peptides. – Add 8 core support in standard (desktop) license.

4. De novo should be done after DB search. True False DB search DB peptides de novo seq. Unassigned spectra de novo peptides

Order of de Novo and DB Better conduct de novo on all spectra. – De novo not slow, and computing is cheap. – De novo provides independent validation for DB result. # consensus AA (de novo vs. DB search) true score false without de novo with de novo

5. My protein sequence is confirmed with two unique peptide hits. True False

Routine Full Protein Coverage For regular proteins, full sequence coverage can be routinely achieved with – 3 or more enzyme digests, and – multiple algorithms in PEAKS 6. For highly variable proteins (such as antibodies), BSI offers data analysis service for antibody sequencing.

6. If a peptide is identified with 1% FDR, then it’s sequence is 99% correct. True False

Peptide Validation vs. Amino Acid Validation You are confident about the peptide sequence only if you can de novo sequence it, and the de novo sequence matches the database peptide.

7. I don’t need de novo sequencing if I have a protein DB. True False

8. Target-decoy provides a reliable result validation for every DB search engine. True False

weak hits confident protein weak protein Target-Decoy Incompatible with Certain Highly Optimized Search Engines Adding “protein bonus” to peptide hits increases accuracy. But it creates bias between target and decoy. – In extreme, bonus is so large that only peptides from target proteins are selected. – This gives the wrong impression that FDR=0, while there are still false peptides in the result.

weak hits confident protein weak protein Decoy Fusion Is A More Powerful Validation Method Decoy fusion append a decoy sequence to each protein. Recreates the balance. The built-in validation method since PEAKS 5.3.

9. Combining 1% FDR results of multiple engines gives 1% FDR. True False

Error Accumulation In PEAKS, the inChorus algorithm automatically selects a less than 1% common FDR for each engine so that the combined FDR is approximately 1%. PEAKS DBMascot 1696(37) 2.4% 2174(1) 0.1% 195(22) 13% Target(decoy) FDR% PEAKS DB 3870(38) 1% 2369(23) 1% Mascot Correct < sum of the two Error ≈ sum of the two Correct < sum of the two Error ≈ sum of the two Combined FDR = 1.5%

10. There is no automated way to validate de novo sequencing results. True False