1336 SW Bertha Blvd, Portland OR 97219

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
Database Searches. Peptide mass fingerprinting digestMS Search HIT SCORE Protein X 1000 Protein Y 50 Protein Z 5 Protein X theoretical digestProtein Y.
PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland,
Improving Statistical Significance Assignment in Mass Spectrometry Based Peptide Identification Overview Statistical Significance in Peptide Identification.
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Previous Lecture: Regression and Correlation
Each results report will contain:
Scaffold Download free viewer:
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
Spectral Counting. 2 Definition The total number of identified peptide sequences (peptide spectrum matches) for the protein, including those redundantly.
Respiration rate (nmol O 2 /min/mg protein) Succinate (mM) Supplementary Figure S1. Respiration rates of the mitochondria isolated from the florets of.
Identification of regulatory proteins from human cells using 2D-GE and LC-MS/MS Victor Paromov Christian Muenyi William L. Stone.
Multiple testing correction
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Comparison of chicken light and dark meat using LC MALDI-TOF mass spectrometry as a model system for biomarker discovery WP 651 Jie Du; Stephen J. Hattan.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Laxman Yetukuri T : Modeling of Proteomics Data
Search Engine Result Combining Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
Chapter 11: Inference for Distributions of Categorical Data Section 11.1 Chi-Square Goodness-of-Fit Tests.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Peptidesproteinsgenes protein accessionsharedsharedunique gene nameshareduniqueunique Identified by gene unique peptides Identified by protein and gene.
PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
A Reference Library of Peptide Ion Fragmentation Spectra: Yeast S.E. Stein, L.E. Kilpatrick, P. Neta, Q.L. Pu, J. Roth, X. Yang National Institute of Standards.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.
A Reference Library of Peptide Ion Fragmentation Spectra Stephen Stein 1 ; Lisa Kilpatrick 2 ; Pedatsur Neta 1 ; Jeri Roth 1 ; Xiaoyu Yang 1 National Institute.
Faster, more sensitive peptide identification from tandem mass spectra by sequence database compression Nathan J. Edwards Center for Bioinformatics & Computational.
Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated.
Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. The PepArML meta-search.
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
Minimize Database-Dependence in Proteome Informatics Apr. 28, 2009 Kyung-Hoon Kwon Korea Basic Science Institute.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Using Scaffold OHRI Proteomics Core Facility. This presentation is intended for Core Facility internal training purposes only.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
B Monoisotopic mass of neutral peptide M r (calc): Fixed modifications: Carbamidomethyl Ions score: 45 † Expect: ‡ Matches (red): 18/50.
Protein identification by mass spectrometry The shotgun proteomics strategy, based on digesting proteins into peptides and sequencing them using tandem.
Protein identification by mass spectrometry The shotgun proteomics strategy, based on digesting proteins into peptides and sequencing them using tandem.
Post translational modification n- acetylation Peptide Mass Fingerprinting (PMF) is an analytical technique for identifying unknown protein. Proteins to.
9.3 Hypothesis Tests for Population Proportions
MassMatrix Search Results Explained
Protein Identification via Database searching
Peptide & Protein Identification by MS/MS
Volume 53, Issue 3, Pages (February 2007)
Proteomics Informatics –
Metabolic Network Prediction of Drug Side Effects
Softberry Mass Spectra (SMS) processing tools
Sequence comparison: Multiple testing correction
NoDupe algorithm to detect and group similar mass spectra.
Evaluation of the novel peptides derived from MS/MS data.
Supertertiary Structure of the MAGUK Core from PSD-95
High level view of the MAE algorithm.
Xbar Chart By Farrokh Alemi Ph.D
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Maria S. Robles, Sean J. Humphrey, Matthias Mann  Cell Metabolism 
Presentation transcript:

1336 SW Bertha Blvd, Portland OR 97219 X!Tandem Explained Brian C. Searle Proteome Software Inc. www.proteomesoftware.com 1336 SW Bertha Blvd, Portland OR 97219 (503) 244-6027 An explanation of the X!Tandem MS/MS spectra search program developed by Craig,R. and Beavis,R.C. (2003) Rapid Commun. Mass Spectrom., 17, 2310–2316.

X!Tandem Theme Central Axiom: For each identifiable protein, there is at least one detectable tryptic peptide.

containing identified X!Tandem, like Mascot and SEQUEST, compares each spectrum to all likely candidate peptides in a protein database. One of X!Tandem’s strengths is its automatic search for modified peptides — but only on proteins it has otherwise identified. The following pages explain how X!Tandem matches peptides, and how this differs from the way SEQUEST matches them. X!Tandem Workflow Quickly identify proteins from tryptic peptides. Create database containing identified proteins only. Extensively search for modified/ non-enzymatic peptides only on identified proteins.

X!Tandem’s Chief Advantage X!Tandem’s multi-stage workflow algorithm results beats a comparable single stage algorithm by a wide margin. Speed: ~200x faster for nonspecific searches ~1000x faster if you also look for oxidation, deamidation, and phosphorylation

Other X!Tandem Advantages Modern search techniques: Considers semi-tryptic peptides. Considers sequence polymorphisms. Uses probability-based scoring.

acquired spectrum model spectrum (y/b ions) Spectra X!Tandem works by matching the acquired MS/MS spectra to a model spectrum based on peptides in a protein database. The model spectrum is very simple, based on the presence or absence of y and b ions. 100% acquired spectrum 0% 1 model spectrum (y/b ions)

x acquired spectrum hypothetical spectrum (y/b ions) Spectra matched Only matching spectral peaks — the ones marked in the figure — are considered. Any peaks that don’t match, in either the model or acquired spectra, are not used. 100% acquired spectrum 0% x 1 hypothetical spectrum (y/b ions)

acquired spectrum similar peaks (y/b ions) Similar peaks The acquired spectrum is simplified to only those peaks that are similar to the peaks in the model spectrum. 100% acquired spectrum 0% 100% similar peaks (y/b ions) 0%

similar peaks (y/b ions) Dot product X!Tandem’s preliminary score is a dot product of the acquired and model spectra. Because only similar peaks are considered, this is the sum of the intensities of the matched y and b ions. spectrum intensities predicted? (1,0) 100% similar peaks (y/b ions) 0%

similar peaks (y/b ions) Hyperscore X!Tandem modifies the preliminary score by multiplying by N factorial for the number of b and y ions assigned. The use of factorials is based on the hypergeometric distribution. spectrum intensities predicted? (1,0) 100% similar peaks (y/b ions) 0%

Histogram of hyperscores Next, X!Tandem makes a histogram of all the hyperscores for all the peptides in the database that might match this spectrum. For example, in this figure, 52 peptides were found with a hyperscore of 19, and one peptide with a hyperscore of 83. X!Tandem assumes that the peptide with the highest hyperscore is correct, and all others are incorrect. incorrect IDs # results hyperscore

Log histogram If the data on the right side of the histogram, (colored in upper figure) is taken and log-transformed, the data fall on a straight line. A straight line is the expected result from a statistical argument that assumes the incorrect results are random. Note: this histogram is calculated independently for each spectrum. # results hyperscore log(# results)

Significant scores significant X!Tandem has already assumed that the top hyperscore is the only possible correct match. This match is significant if it is greater than the point at which the straight line through the log data intersects the log(#results)=0 line. Any hyperscores greater than this are unlikely to have arisen by chance. # results hyperscore significant log(# results)

E-value The E-value expresses just how unlikely a greater hyperscore is. X!Tandem calculates the E-value by extrapolating the red line of the log histogram. For the example shown, a hyperscore of 83 would occur by chance where the red line crosses 83. The log of this value — the E-value — is -8.2, as shown. # results hyperscore log(# results) E-value=e-8.2

X!Tandem vs. SEQUEST Which is better? How close is the spectrum to the peptide model? How far is the top-scoring match from the rest of the pack? X!Tandem hyperscore E-value better measure SEQUEST XCorr DCn

Plotting SEQUEST and X!Tandem scores The sample in this experiment has only 10 proteins. Spectra identifying one of these 10 are plotted in blue. Spectra in red are incorrect matches. X!Tandem and SEQUEST scores for all the spectra in a control experiment X!Tandem –log(E-value) score SEQUEST discriminant score (PeptideProphet, ISB)

Cutoffs for SEQUEST and X!Tandem The yellow box contains the spectra with scores below X!Tandem’s threshold. The green box shows the spectra with scores below SEQUEST’s threshold. Note: Many good identifications (blue dots) are below threshold for one, but not both. X!Tandem –log(E-value) score SEQUEST discriminant score (PeptideProphet, ISB)

X!Tandem Summary X!Tandem is fast. X!Tandem grades on a curve. The hyperscore is much faster to calculate than SEQUEST’s XCorr. This allows X!Tandem to build the histograms that allow it to calculate the E-values. X!Tandem grades on a curve. It’s the E-value that counts, not the hyperscore. X!Tandem and SEQUEST can often match different spectra.