Absolute protein quantification estimated by spectral counting using large datasets in PeptideAtlas Ning Zhang 1*, Eric W. Deutsch 1*, Henry Lam 1, Hamid.

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

David Campbell 1,, Eric Deutsch 1, Henry Lam 1, Hamid Mirzaei 1, Paola Picotti 2, Jeff Ranish 1, Ning Zhang 1, and Ruedi Aebersold 1,2,3 1.Institute for.
What makes an image memorable?
Exploring the Human Transcriptome
1336 SW Bertha Blvd, Portland OR 97219
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
Mass Spectrometry in a drug discovery setting Claus Andersen Senior Scientist Sienabiotech Spa.
Previous Lecture: Regression and Correlation
Overview We have developed a complete, end-to-end data analysis pipeline that provides an automated, reliable, consistent, and objective analysis of high-throughput.
Scaffold Download free viewer:
Build Results Plasma-only Build Empirical Observability Scores Eric W. Deutsch, Nichole L. King, Jimmy K. Eng, Alexey I. Nesvizhskii, David S. Shteynberg,
Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
The Mtb Proteome Library: Development and application of assays for targeted MS analysis of the complete proteome of Mycobacterium tuberculosis by SRM.
Proteomics Informatics Workshop Part III: Protein Quantitation
Isolation of N-linked glycopeptides from plasma Yong Zhou 1, Ruedi Aebersold 2, and Hui Zhang 1,3 * 1 Institute for Systems Biology, Seattle, Washington.
Identification of regulatory proteins from human cells using 2D-GE and LC-MS/MS Victor Paromov Christian Muenyi William L. Stone.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Comparison of chicken light and dark meat using LC MALDI-TOF mass spectrometry as a model system for biomarker discovery WP 651 Jie Du; Stephen J. Hattan.
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Laxman Yetukuri T : Modeling of Proteomics Data
Novel Algorithms for the Quantification Confidence in Quantitative Proteomics with Stable Isotope Labeling* Novel Algorithms for the Quantification Confidence.
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Peptidesproteinsgenes protein accessionsharedsharedunique gene nameshareduniqueunique Identified by gene unique peptides Identified by protein and gene.
PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
Essential Statistics Chapter 31 The Normal Distributions.
Standards for proteomics: The HUPO Proteomics Standards Initiative (HUPO PSI) Public Repository for Mass spectrometry spectral.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Background Spectral library searching Spectral library searching is an alternative approach to traditional sequence database searching for peptide inference.
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
Protein quantitation I: Overview (Week 5). Fractionation Digestion LC-MS Lysis MS Sample i Protein j Peptide k Proteomic Bioinformatics – Quantitation.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Cedar: A Multi-Tiered Protein Identification Scheme for Shotgun Proteomics Terry Farrah (1); Eric Deutsch (1); Gilbert Omenn (2,1); Ruedi Aebersold (3),
Results Protein Removal is shown in the silver stained SDS-PAGE gel in Figure 3 below: F IGURE 3: A 10% SDS-PAGE gel after silver staining showing the.
Bottom-Up Proteomics Data collection
Protein Identification via Database searching
Pinpointing phosphorylation sites using Selected Reaction Monitoring and Skyline Christina Ludwig group of Ruedi Aebersold, ETH Zürich.
Protein Inference by Generalized Protein Parsimony reduces False Positive Proteins in Bottom-Up Workflows Nathan J. Edwards, Department of Biochemistry.
Itsik Pe’er, Yves R. Chretien, Paul I. W. de Bakker, Jeffrey C
NSAF and GeneChip data have similar distribution properties.
Distribution of summed MS1 intensities
Volume 6, Issue 5, Pages e5 (May 2018)
Volume 138, Issue 4, Pages (August 2009)
Nat. Rev. Nephrol. doi: /nrneph
Volume 13, Issue 9, Pages (December 2015)
Proteomic analysis of seminal plasma from infertile patients with oligoasthenoteratozoospermia due to oxidative stress and comparison with fertile volunteers 
A Role for Codon Order in Translation Dynamics
N6-Methyladenosines Modulate A-to-I RNA Editing
A, high resolution MS/MS spectrum (lower panel) of 1435
Volume 24, Issue 13, Pages (July 2014)
Compartment‐specific analysis reveals differences in organelle composition that can be validated by sub‐cellular fractionation Compartment‐specific analysis.
Plot of the deviation of the predicted pI value of every peptide spectrum from the average pI calculated for each fraction for validated (a) and non-validated.
Absolute proteome quantification of highly purified populations of circulating reticulocytes and mature erythrocytes by Emilie-Fleur Gautier, Marjorie.
What Determines the Specificity and Outcomes of Ubiquitin Signaling?
Validation of knock‐in cell line and image calibration (related to Fig 1)‏ Validation of knock‐in cell line and image calibration (related to Fig 1) Validation.
Comparison of observed and predicted coverage patterns.
Comparison of proteomics and RNA‐Seq data.
Impact of Alternative Splicing on the Human Proteome
Protein identification using MS/MS.
The Coming Age of Complete, Accurate, and Ubiquitous Proteomes
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Brandon Ho, Anastasia Baryshnikova, Grant W. Brown  Cell Systems 
Proteomics Informatics David Fenyő
Presentation transcript:

Absolute protein quantification estimated by spectral counting using large datasets in PeptideAtlas Ning Zhang 1*, Eric W. Deutsch 1*, Henry Lam 1, Hamid Mirzaei 1, Paola Picotti 2, Abhishek Pratap, David Shteynberg 1, Luis Mendoza 1, Dave Campbell 1, Julian Watts 1, Ruedi Aebersold 1,2 * equal contributors 1 Institute for Systems Biology, 1441 N 34th Street, Seattle, WA 98103, USA 2 Institute of Molecular Systems Biology, ETH Zurich, CH-8049 Zurich, Switzerland Introduction PeptideAtlas [1] is a compendium of observations of peptides and associated annotations, based on a large number of contributed tandem mass spectrometry datasets from many laboratories which have been reprocessed through a consistent processing pipeline. Here we report that using the datasets currently in PeptideAtlas, we are able to estimate absolute protein abundances in yeast and human plasma. A total of 2,814 proteins in yeast were confidently identified from 31 experiments totaling over 3 million spectra using SEQUEST, PeptideProphet and ProteinProphet, with an estimated FDR of 0.7%. The total counts of confidently identified spectra for each protein were used to estimate its absolute abundance in yeast cell lysate using an improved published count normalization method [2]. We found that our estimated abundances from spectral counts are in reasonable agreement with measured abundances using several methods, including Western blotting [3], flow cytometry [4] and mass spectrometry using heavy synthetic peptides, and correlates well with codon usage bias. We applied our method to the observed human plasma proteome and obtained similar agreement with published protein abundance measurements and we are able to estimate the absolute protein abundances for around 2000 proteins in human plasma. Thus, we demonstrated that spectral counting applied on a sufficiently large set of data can be used to determine approximate absolute protein abundances. Methods (1)AQUA peptides design and MRM experiments. Fourteen proteins present in human plasma at various concentrations were validated by the MRM experiments. Four unique peptides of each protein were chosen to be synthesized and subsequently were mixed with MARS column depleted plasma.The three most intense daughter ions were chosen to quantify each peptide. Protein abundance was determined by averaging the quantities of its peptides. (2)Database search 31 datasets totaling 3,457,179 spectra were searched against yeast database with a decoy database attached using Sequest.77 datasets totaling 16 million spectra were searched against the latest IPI human protein database using Sequest. ProteinProphet [5] and the number of decoys in yeast search results were used to estimate the false discovery rate. Results (1)PeptideAtlas Protein Abundance (PAPA) Estimation The spectral counts (SC) of each positively identified protein were calculated by summing the number of each peptide ion identified with an adjusted probability of 0.8 and above. The number of theoretical peptides (NTP) was calculated by in silico tryptic digestion of each protein sequences and counting those peptides with mass range between 400 to 5000, length larger than 5 and hydrophobility score [6] between 0 and 70. The heuristic thresholds were determined by examining all peptides observed in yeast peptide atlas. Normalized spectra counts for each protein was calculated as in eq. [1]. [1] [2] PAPA for each positively IDed protein was calculated in eq. [2]. C is the total concentration of all proteins in yeast cells. (2) Comparison of correlation between protein abundances and codon bias in yeast. In organisms such as E.coli and Saccharomyces cerevisiae (baker's yeast), it is generally thought codon bias which correlates to the tRNA pool reflects the level of protein expression. Apparently, abundances estimated by spectra counting best correlate with codon bias with the highest R square value of From this perspective, this suggests that spectral counting using a large amount of data is one of the best ways to estimate protein abundances in a population. Figure 1: Each of the panels displays the correlation between the concentration values from western, GFP, and PAPA, respectively, versus the codon bias for each protein. All protein abundances are fitted with a secondary polynomial function. It is expected that overall concentrations should correlate with codon bias. The PAPA scores correlate with codon bias with the highest R^2 value, suggesting that it is the most accurate method for estimating concentration of the three methods. (3)Importance using large quantity of datasets to estimate protein abundances Figure 2: Histogram of protein abundances estimated by PeptideAtlas data (2814 proteins in total) and by Godoy et al. [7] (2404 proteins in total). Overall, more low abundant proteins (<10E3 copies/cell) are estimated by PeptideAtlas data (42%) than by Godoy (11%). Figure 3: Percentage of proteins that observed by MS/MS versus those detected by Western method. Figure 4: Plasma protein concentrations reported in the literature versus spectral counts from the Human Plasma PeptideAtlas. Each blue box represents one of the 140?? Proteins in found in both sources. The red arrows on the left represent upper limits for proteins with reported concentrations but no detections in the HPPA. The red line represents a best fit to the data points. The error bars at each abundance decade denote the 1 sigma deviation of the residuals. These are approximately a factor of 4 at higher concentrations and a factor of 10 at lower concentrations. The histogram at the right depicts an estimate of the completeness of HPPA as a function of concentration, calculated as the number of blue boxes divided by the sum of blue boxes and red arrows within each decade. Conclusions (1)In yeast, we have proved that protein abundances estimated using PAPA correlates best with the codon bias. (2)Although Western blot method [3] estimates 1000 more protein abundances than PAPA, it is much more labor intensive. Whereas using spectral counting method, data acquisition, data analysis, statistical validation, data visualization is streamlined by TPP [8] and PeptideAtlas. (3)In human plasma, the protein abundances estimated using spectral counting agrees well with published values [9]. (4)The estimated protein concentrations are quite useful for MRM type experiments when determining the quantities of synthesized heavy peptides to be spiked into the samples. Acknowledgements This work has been funded in part with Federal funds from the National Heart, Lung, and Blood Institute, National Institutes of Health, under contract No. N01-HV-28179, and from PM50 GMO76547/Center for Systems Biology. References [1] Deutsch, E. W., Lam, H. and Aebersold, R., Nat Biotechnol submitted [2] Lu, P., Vogel, C., Wang, R., Yao, X. and Marcotte, E. M., Nat Biotechnol 2007, 25, [3] Ghaemmaghami, S., et al. Nature 2003, 425, [4] Newman, J. R., et al. Nature 2006, 441, [5] Nesvizhskii, A. I., et al. Anal. Chem. 2003, 75(17), [6] Spicer, V., et al. Anal. Chem. 2007, 79(22) [7] Godoy, L. M., et al. Genome Bio. 2006,7, R:50. [8] Keller, A., et al. Mol. Syst. Biol. 2005, 1: