Peptidesproteinsgenes protein accessionsharedsharedunique gene nameshareduniqueunique Identified by gene unique peptides Identified by protein and gene.

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
1336 SW Bertha Blvd, Portland OR 97219
Lecture Presentation Software to accompany Investment Analysis and Portfolio Management Seventh Edition by Frank K. Reilly & Keith C. Brown Chapter.
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
CAP and ROC curves.
REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –
Scaffold Download free viewer:
Facts and Fallacies about de Novo Sequencing & Database Search.
Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Portfolio Management-Learning Objective
Lecture Presentation Software to accompany Investment Analysis and Portfolio Management Seventh Edition by Frank K. Reilly & Keith C. Brown Chapter 7.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
©2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
Scenario 6 Distinguishing different types of leukemia to target treatment.
A Phospho-Peptide Spectrum Library for Improved Targeted Assays Barbara Frewen 1, Scott Peterman 1, John Sinclair 2, Claus Jorgensen 2, Amol Prakash 1,
Ratio Frequency Supplemental Figure S1 : A representative histogram showing the distribution of the incorporation ratios in proteins.
Bar Graph b The purpose of a bar graph is to display and compare data. b Bar graphs use bars to show the data. b A bar graph must include: - a title.
PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Figure 1: The model for crosstalk between the Gαi and Gαq pathways depends on both differential specificity and activity for Gαi, Gαq and Gβγ interactions.
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
Chapter 7 An Introduction to Portfolio Management.
Peterson xBSM Optics, Beam Size Calibration1 xBSM Beam Size Calibration Dan Peterson CesrTA general meeting introduction to the optics.
CHAPTER 2.3 PROBABILITY DISTRIBUTIONS. 2.3 GAUSSIAN OR NORMAL ERROR DISTRIBUTION  The Gaussian distribution is an approximation to the binomial distribution.
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
USING GRAPHING SKILLS. Axis While drawing graphs, we have two axis. X-axis: for consistent variables Y-axis: for other variable.
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
Environmental and Exploration Geophysics I tom.h.wilson Department of Geology and Geography West Virginia University Morgantown,
BPS - 5th Ed. Chapter 231 Inference for Regression.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Chapter 4 More on Two-Variable Data. Four Corners Play a game of four corners, selecting the corner each time by rolling a die Collect the data in a table.
Date of download: 6/22/2016 Copyright © ASME. All rights reserved. From: The Importance of Intrinsic Damage Properties to Bone Fragility: A Finite Element.
Does the brain compute confidence estimates about decisions?
Date of download: 6/28/2016 Copyright © 2016 SPIE. All rights reserved. Image collection and region of interest assignment. Four channels were collected.
Date of download: 10/9/2017 Copyright © ASME. All rights reserved.
Michael Epstein, Ben Calderhead, Mark A. Girolami, Lucia G. Sivilotti 
Fiona E. Müllner, Sheyum Syed, Paul R. Selvin, Fred J. Sigworth 
Volume 129, Issue 2, Pages (April 2007)
Volume 112, Issue 7, Pages (April 2017)
Optimal Degrees of Synaptic Connectivity
Visual Search and Attention
Probabilistic Population Codes for Bayesian Decision Making
The Mechanism of Rate Remapping in the Dentate Gyrus
Alternative Splicing QTLs in European and African Populations
Volume 24, Issue 13, Pages (July 2014)
Volume 3, Issue 5, Pages e13 (November 2016)
Analysis of Microarray Data Using Z Score Transformation
Volume 95, Issue 5, Pages e5 (August 2017)
Serial, Covert Shifts of Attention during Visual Search Are Reflected by the Frontal Eye Fields and Correlated with Population Oscillations  Timothy J.
Volume 99, Issue 8, Pages (October 2010)
Timescales of Inference in Visual Adaptation
Stephen V. David, Benjamin Y. Hayden, James A. Mazer, Jack L. Gallant 
Stochastic Pacing Inhibits Spatially Discordant Cardiac Alternans
Scarlet S. Shell, Christopher D. Putnam, Richard D. Kolodner 
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Robust Driving Forces for Transmembrane Helix Packing
Don E. Burgess, Oscar Crawford, Brian P. Delisle, Jonathan Satin 
Volume 86, Issue 3, Pages (March 2004)
Stochastic Pacing Inhibits Spatially Discordant Cardiac Alternans
Felix Ruhnow, Linda Kloβ, Stefan Diez  Biophysical Journal 
Morpheus Unbound: Reimagining the Morphogen Gradient
Volume 80, Issue 4, Pages (April 2001)
Presentation transcript:

peptidesproteinsgenes protein accessionsharedsharedunique gene nameshareduniqueunique Identified by gene unique peptides Identified by protein and gene unique peptides Not identified Identified by peptides unique on protein and gene level Supplemental Fig. 1 Supplemental Fig. 1: Gene-centric uniqueness calculation. Peptides matching to either one particular protein isoform (green circles, protein unique) or to multiple protein isoform but with the same gene name (blue circles, gene unique) are classified as unique peptides. All others, namely peptides matching to multiple protein isoforms with different gene names (red circles), are classified as shared. Shared peptides were discarded during the protein inference whereas both protein unique and gene unique peptides give rise to the identification of gene products.

Supplemental Fig. 2 A B FE C D Supplemental Fig. 2: Search engine score normalization and variations in score cutoffs to reach 1% PCM FDR. A, B The local peptide length dependent score cutoffs at 5% PSM FDR between Mascot (A) and Andromeda (B) used for the score normalization are vastly different. While the cutoffs determined for Mascot decrease at the beginning and converge at ~17, the cutoffs used for Andromeda decrease constantly. C, D To illustrate vast differences in data quality dependent on technical and biological differences we plotted the score histograms of length normalized Mascot ion scores for (C) a dimethyl labeled tryptic digests of human embryonic stem cells measured by low resolution CID and (D) an unlabeled tryptic digests of the melanoma cell line A375 measured by HCD. To reach 1% PCM FDR, the labeled dataset had to be cut at 0.63 whereas the unlabeled dataset at E,F Differences in data quality require different length normalized score cutoffs to reach 1% PCM FDR. While the range of length normalized score cutoffs is similar for Mascot (E) and Andromeda (F), the shape of the distribution varies

Supplemental Fig. 3 A B Supplemental Fig. 3: Target and decoy PCM saturation. A In contrast to the saturation of proteins when accumulating multiple experiments, the number of unique target PCMs (blue) only shows a slight saturation effect. Furthermore, the numbers of unique decoy PCMs (red) increases linearly with increasing amount of data. B This is mirrored by the global PCM FDR. The sharp increase at ~250 experiments in the PCM FDR is due to an experiment containing multiple LC-MS/MS raw files acquired while optimizing an acquisition method and thus contains highly redundant target PCMs but many random decoy PCMs.

r=# target/# decoy A B C Supplemental Fig. 4: R factor correction. A Using the number of decoy proteins from the classic TDS massively overestimates the number of false positive protein identifications, decoy proteins (red), target proteins (blue). B The R factor is calculated as the ratio between the number of target and decoy hits with a score below 3.6. At this score the local ratio of forward and decoy hits is 1/5. C After applying the R factor correction, the decoy (red) protein distribution agrees better with the target (blue) protein distribution which yields more reasonable protein FDR estimation using the adjusted number of decoy proteins. The distribution of true protein hits (green dashed line), calculated as the difference between the distributions of target and decoy hits is more sensible than for the standard decoy approach, although negative values are observed for low scoring proteins. Supplemental Fig. 4

A B DC Supplemental Fig. 5: Protein FDR estimation using the classic and picked TDS using the sum of best Q-scores as protein score. A Using the sum of best Q-scores of all PCMs matching to a protein as protein score, the number of decoy proteins (red) of the classic TDS massively overestimates the number of false positive protein identifications. Furthermore, the target distribution (blue) shows no bimodal shape and is not well separated from the decoy distribution. B After applying the picked approach, the decoy (red) protein distribution superimposes with the target (blue) protein distribution which allows a more accurate protein FDR estimation. C Comparing the performance of the picked (solid) and classic (dashed) approach when filtering the PCMs on various FDR shows a similar trend as in Figure 3A. With increasing PCM q-value cutoffs, the number of true positive protein identifications (number of target proteins – number of decoy proteins) increases and is comparable between the picked and classic approach. At roughly PCM q-value cutoff, the number of true positive proteins starts to decrease and quickly drops to 0 for the classic approach, whereas true positive proteins IDs increase further and converge at a rather stable plateau in the picked approach. The slight decrease at the end is likely due to accumulation of false positive PCMs which further deteriorates the separation of decoy and target proteins. D The estimated protein FDR of the picked (solid) and classic (dashed) approach mirrors the trend seen in panel C. While the estimated protein FDR increases constantly when increasing the PCM q-value cutoff and eventually reaches 100% in the classic approach, the picked approach starts to rise much later and plateaus at roughly 10%. Supplemental Fig. 5

A B C D Supplemental Fig. 6: Enlarged illustrations of the comparison of the classic and picked TDS from Figure 3. A Even when aggregating small numbers of experiments, the picked (solid) TDS outperforms the classic (dashed) TDS. While the numbers of target proteins (blue) is comparable (marginally higher number of the classic approach) the difference between the number of decoy proteins (red) reported by the classic and picked approach is starting to increase. B The overestimation of false positive proteins by the classic approach is particularly apparent when comparing the number of target (dashed blue) and decoy (dashed red) proteins at the end of the aggregation process. The number of decoy proteins is increasing more rapidly than the number of target proteins and is approaching the same limit. C The picked approach shows a complete opposite effect. The number of decoy proteins reported by the picked approach (solid red) is decreasing because of new evidence (especially at ~1540 experiments) introduced by additional experiments. D The trend explained in panel B and C is mirrored by the estimated protein FDR in the picked (solid) and classic (dashed) TDS. While the protein FDR increases and approaches 100% in the classic approach, the picked approach sows a decrease, potentially reaching close to 0% when adding more data. Supplemental Fig. 6

AB Supplemental Fig. 7: R factor FDR. A R factor correction produces more reasonable protein FDR curves than the standard decoy strategy, the agreement between the picked and R factor approach is not perfect, but better than between either of the approaches and the standard approach. B Number of true protein hits as a function of FDR for the standard, picked and R factor approach. Both the R factor and picked approaches perform better than the standard strategy, with the picked TDS consistently yielding higher coverage. Supplemental Fig. 7

Supplemental Fig. 8 AB Supplemental Fig. 8: Comparison of best and sum Q-score protein scoring of the classic and picked TDS. A When using the best Q- score to score proteins, the number of proteins identified at 1% proteins FDR is increasing in both picked (solid) and classic (dashed) approach, but the picked approach consistently reports higher numbers of proteins. B Using the sum of best Q-scores of all PCMs matching to a protein, the number of proteins identified at 1% protein FDR is first increasing in both picked (solid) and classic (dashed) approach, but is starting to decrease and breaks down at high PCM q-value cutoffs. The picked approach shows a delayed behavior but also overestimates the number of false positive proteins IDs using the decoy proteins. Especially at high PCM q-value cutoffs, the decoy and target protein distribution start to blend into each other (data not shown) and shows almost no separation any more.