Shared Peptides in Mass Spectrometry Based Protein Quantification Banu Dost, Nuno Bandeira, Xiangqian Li, Zhouxin Shen, Steve Briggs, Vineet Bafna University.

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

Yinyin Yuan and Chang-Tsun Li Computer Science Department
A basic overview of Proteomics Bioinformatics Unit Lab Meeting F.M. Mancuso 21/02/2012.
Protein Quantitation II: Multiple Reaction Monitoring
RIP – T RANSCRIPT E XPRESSION L EVELS. O UTLINE RNA Immuno-Precipitation (RIP) NGS on RIP & its alternatives Alternate splicing Transcription as a graph.
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
Darya Chudova, Alexander Ihler, Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no , pages
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
Timothy H. W. Chan, Calum MacAulay, Wan Lam, Stephen Lam, Kim Lonergan, Steven Jones, Marco Marra, Raymond T. Ng Department of Computer Science, University.
Mass Spectrometry in a drug discovery setting Claus Andersen Senior Scientist Sienabiotech Spa.
Yoona Kim University of California, San Diego UCSD Mass Spectrometry Journal Club 12/03/10.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Clustering (Part II) 10/07/09. Outline Affinity propagation Quality evaluation.
+ Protein and gene model inference based on statistical modeling in k-partite graphs Sarah Gester, Ermir Qeli, Christian H. Ahrens, and Peter Buhlmann.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
A combination of the words Proteomics and Genomics. Proteogenomics commonly refer to studies that use proteomic information, often derived from mass spectrometry,
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Proteomics Informatics Workshop Part III: Protein Quantitation
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications.
A highly abbreviated introduction to proteomics
LECTURE 2 Splicing graphs / Annoteted transcript expression estimation.
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
An introduction and possible applications Ariane Kahnt
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
1 Chemical Analysis by Mass Spectrometry. 2 All chemical substances are combinations of atoms. Atoms of different elements have different masses (H =
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Metabolomics Metabolome Reflects the State of the Cell, Organ or Organism Change in the metabolome is a direct consequence of protein activity changes.
Quantification of Membrane and Membrane- Bound Proteins in Normal and Malignant Breast Cancer Cells Isolated from the Same Patient with Primary Breast.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
A New Strategy of Protein Identification in Proteomics Xinmin Yin CS Dept. Ball State Univ.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Isobaric tags for quantitative analysis Joshua J. Coon U. Wisconsin-Madison.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Inferring gene regulatory networks from multiple microarray datasets (Wang 2006) Tiffany Ko ELE571 Spring 2009.
Salamanca, March 16th 2010 Participants: Laboratori de Proteomica-HUVH Servicio de Proteómica-CNB-CSIC Participants: Laboratori de Proteomica-HUVH Servicio.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Protein quantitation I: Overview (Week 5). Fractionation Digestion LC-MS Lysis MS Sample i Protein j Peptide k Proteomic Bioinformatics – Quantitation.
Ho-Tak Lau, Hyong Won Suh, Martin Golkowski, and Shao-En Ong
Considerations for multi-omics data integration Michael Tress CNIO,
RANIA MOHAMED EL-SHARKAWY Lecturer of clinical chemistry Medical Research Institute, Alexandria University MEDICAL RESEARCH INSTITUTE– ALEXANDRIA UNIVERSITY.
Custom peptide synthesis services In the quantitative proteomics research, several MS-based methodologies for relative quantification have been introduced.
Custom peptide synthesis services In the quantitative proteomics research, several MS-based methodologies for relative quantification have been introduced.
Shotgun protein identification Creative Proteomics offers iTRAQ protein quantification service suited for unbiased untargeted biomarker discovery. Relative.
Bottom-Up Proteomics Data collection
Functional Genomics in Evolutionary Research
Protein/Peptide Quantification
Thomas BOTZANOWSKI & Blandine CHAZARIN
Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing  Graham Heimberg, Rajat.
Bioinformatics Solutions Inc.
Quantitative proteomics reveals daf‐16‐mediated reduction in protein metabolism in long‐lived daf‐2(e1370) mutants. Quantitative proteomics reveals daf‐16‐mediated.
Proteomics Informatics David Fenyő
Volume 65, Issue 2, Pages (January 2017)
A perspective on proteomics in cell biology
Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs BMI/CS Spring 2019 Colin Dewey
Cluster analysis and pathway-based characterization of differentially expressed genes and proteins from integrated proteomics. Cluster analysis and pathway-based.
A, Absolute ion intensities of m/z 322, 922 and 1522 as function of the transfer time. A, Absolute ion intensities of m/z 322, 922 and 1522 as function.
Shotgun Proteomics in Neuroscience
Impact of Alternative Splicing on the Human Proteome
Proteomics Informatics David Fenyő
Validation of DNMT3A splice variant expression.
Presentation transcript:

Shared Peptides in Mass Spectrometry Based Protein Quantification Banu Dost, Nuno Bandeira, Xiangqian Li, Zhouxin Shen, Steve Briggs, Vineet Bafna University of California, San Diego contact:

Mass Spectrometer ION SOURCE DETECTOR MASS ANALYZER positive ion beam magnet sample Quantify the amount of compounds in a sample.

Mass Spectrometry-based Protein Quantification Protein Sample A Protein Sample B Digestion & Labeling Mixing Peptide Sample B Peptide Sample A Peptide Mixture

Mass Spectrometry-based Protein Quantification Intensity Sample B Sample A Mass Spectrometer Peptide Mixture Relative abundance of peptides across samples.

Mass Spectrometry-based Protein Quantification Traditionally, If a protein does not have a unique peptide, its relative abundance across samples is not measured. Relative abundance of 2 different proteins is never measured.

> ATPase 8 MTSLLKSSPGRRRGGDVESGKSEHADSDSDTFYIPSKNASIERLQQWRKAALVLN ASRRFRYTLDLKKEQETREMRQKIRSHAHALLAANRFMDMGRESGVEK...ILMVAA VASLALGIKTEGIKEGWYDGGSIAFAVILVIVVTAVSDYKQSLQFQNLNDEKRNIHLE ………NFASVVKVVRWGRSVYANIQKFIQFQLTVNVAALVINVVAAISSGDVPLTAVQ LLWVNLIMDTLGALALATEPPTDHLMGRPPVGRKEPLITNIMWRNLLIQAIYQVSVLL TLNFRGISILGLEHEVHEHATRVKNTIIFNAFVLCQAFNEFNARKPDEKNIFKGVIKNR LFMGIIVITLVLQVIIVEFLGKFASTTKLNWKQWLICVGIGVISWPLALVGKFIPVPAAP ISNKLKVLKFWGKKKNSSGEGSL > ATPase 10 MSGQFNNSPRGEDKDVEAGTSSFTEYEDSPFDIASTKNAPVERLRRWRQAALVLN ASRRFRYTLDLKREEDKKQMLRKMRAHAQAIRAA…………………………………GIA HNTTGSVFRSESGEIQVSGSPTERAILNWAIKLG………..KSDIIILDDNFESVVKVVR WGRSVYANIQKFIQFQLTVNVAALVINVVAAISAGEVPLTAVQLLWVNLIMDTLGALA LATEPPTDHLMDRAPVGRREPLITNIMWRNLFIQAMYQVTVLLILNFRGISILHLKSKP NAERVKNTVIFNAFVICQVFNEFNARKPDEINIFRGVLRNHLFVGIISITIVLQVVIVEFL GTFASTTKLDWEMWLVCIGIGSISWPLAVIGKLIPVPETPVSQYFRINRWRRNSSG > ATPase 9 MSTSSSNGLLLTSMSGRHDDMEAGSAKTEEHSDHEELQHDPDDPFDIDNTKNASV ESLRRWRQAALVLNASRRFRYTLDLNKEEHYDNRRRMIRAHAQVIRAALLFKLAGE ……….EKEVIDRKNAFGSNTYPKKKGKNFFMFLWEAWQDLTLIILIIAAVTSLALGIKT EGLKEGWLDGGSIAFAVLLVIVVTAVSDYRQSLQFQNLNDEKRNIQLEV…..TLQSIE SQKEFFRVAIDSMAKNSLRCVAIACRTQELNQVPKEQEDLDKWALPEDELILLAIVGI KDPCRPGVREAVRICTSAGVKVRMVTGDNLQTAKAIALECGILSSDTEAVEPTIIEGK VFRELSEKEREQVAKKITVMGRSSPNDKLLLVQALRKNGDVVAVTGDGTNDAPALH EADIGLSMGISGTEVAKESSDIIILDDNFASVVKVVRWGRSVYANIQKFIQFQLTVNVA ALIINVV……..GKLIPVPKTPMSVYFKKPFRKYKASRNA NSSGEGSL YTLDLK SESGEIQVSGSPTER QSLQFQNLNDEK SVYANIQK MVTGDNLQTAK [Based on arabidopsis ITRAQ data] ~50% of Peptides are shared

Shared Peptides ~50% of the peptides are shared by multiple proteins due to [Jin et al., J. Proteome Res., 2008)] Homologues Splicing variants (isoforms) ~50% of the proteins do not have a unique peptide. For protein quantification, only unique peptides are taken into consideration, half of the data is ignored.

Goal Demonstrate that shared peptides are a resource that adds value to protein quantification. 1) Across-samples relative quantification of proteins with no unique peptide 2) Relative quantification of distinct proteins in a sample

Example-I We can compute relative abundances of proteins 1 & 2 within sample A & B.

Example-II We can solve relative abundance of protein 2 across samples, even though it has no unique peptide. 6 unknowns, 5+1 constraints

Protein Quantification via Shared Peptides

Linear Programming (LP) Problem Formulation …. ….. m=#proteins, n=#peptides 2m unknowns, n+1 constraints

Linear Programming (LP) Problem Formulation …. Given peptide ratios r i, we estimate relative protein amounts Q A j and Q B.

Robustness of Estimates Low objective does not necessarily result in robust estimates. 4 unknowns, 3 constraints => under-determined => infinitely many solutions with zero error

Robustness of Estimates Low objective does not necessarily result in robust estimates. Rank(A) under-determined

Rank-threshold 2*m(#proteins) n(#peptides)+1 A = V Σ UTUT xx σ1σ1 σ2σ2 σpσp A good way to characterize the reliability of the estimates. R(A) = ∞ => under-determined system R(A) is high => small singular values => ill-conditioned, poor estimates R(A) is low => large singular values => full-rank, better estimates..

Robust estimates for ill-conditioned systems A = V UTUT xx σ1σ1 σ2σ2 σpσp σkσk.. 2m n+1

Robust estimates for ill-conditioned systems A = VkVk UkUk xx σ1σ1 σ2σ2 σkσk.. n+1 2mkk

Peptide Detectability Not all peptides are detected in mass spectrometer with the same efficiency. mass spectrometer

Incorporating Peptide Detectability Peptide detectability, d i  [0,1] relates peptide abundance to the total abundances of its parent proteins.

Incorporating Peptide Detectability If we know the detectabilities, m=#proteins, n=#peptides 2m variables, 2n+1 constraints (n more constraints) The number of components solved are considerably increased. If we do not know the detectabilities, 2m+n variables, 2n+1 constraints Inference of peptide detectabilities in addition to relative protein abundances.

Incorporating Peptide Detectabilities Current mass spectrometry data do not provide reliable peptide abundance values. Recent developments indicate that peptide abundances can be experimentally estimated. [Bantscheff et al., Anal Bioanal Chem, 2007] peptide detectabilities can be reliably estimated across mass spectrometry runs. [Alves, et al., Pac Symp Biocomput, 2007]

Simulation Protein-peptide mapping based on Arabidopsis ITRAQ data. 257 topologically different components Generate 100 datasets for each component. Q B j = R j x Q A j perturb according to a log-normal N(0, σ). σ : perturbation level Solve each dataset using LP formulation. QB1QB1 QB2QB2 r1r1 r2r2 r3r3 r4r4 r5r5 QB3QB3 R 1 R 2 R 3 …………..….. ….……….. QA1QA1 QA2QA2 QA3QA3

Simulation: Validation Statistics If answer is known, protein abundances distance If answer is unknown, we measure consistency by peptide ratios distance

Simulation: Results With no noise, we achieve ideal case for all full-rank systems at R(A)= full-rank systems at R(A) = 1, σ= % have PAD < 0.16 and LRD < In all cases, objective is close to 0. (<10 -4 ) Performance degrades with less strict rank-thresholds.

Simulation: Incorporating Peptide Detectabilities Peptide detectabilities distance

Arabidopsis ITRAQ Data Two samples before and after nematode infection ~120K spectra, 27K peptides mapping onto 8K protein Close to half of the peptides (10K) are shared. Close to half of the proteins (4K) do not have a unique peptide. Bi-partite mapping graph 4119 connected components 1190 have ≥2 proteins. 257 non-isomorphic topologies, size ranging 2-127

Arabidopsis ITRAQ Data Results R(A)#full-rank comps 199 (8.3%) 2249 (20.9%) 4276 (23.2%) 8277 (23.3%) (23.7%) 99 components with R(A)= proteins, 357 peptides 79 have LRD < 10 -1, 55 have LRD < 10 -4

Arabidopsis ITRAQ Data Results – Example I ATPase 8 ATPase 10 ATPase 9 A system of 3 proteins from P-type Ca+2 ATPase super-family and 6 peptides. Q A 1 : %34, R 1 : 3.39, Q B 1 : %66 Q A 2 : %58, R 2 : 0.95, Q B 2 : %31 Q A 3 : %8, R 3 : 0.61, Q B 3 : %3 ATPase8,10 are co-expressed evenly over all vegetative tissues [Marmagne et al., Mol. Cell Proteomics, 2004].

Arabidopsis ITRAQ Data Results – Example III AtCAD4 AtCAD5 A system of 2 proteins from Cinnamyl-alcohol dehydrogeneases (CAD) family and 3 peptides. Q A 1 : %56, R 1 : 1.5, Q B 1 : %79 Q A 2 : %44, R 2 : 0.5, Q B 2 : %21 Among many genes in CAD family members, only AtCAD4 and AtCAD5 are found to be central in the CAD metabolic network. [Kim et al., Phytochemistry, 2007]

Applications Mass spectrometric data: Accurate peptide-protein mapping Differential regulation of proteins from a family Differential alternative splicing patterns Differential phosphorylation Transcript sequencing data: Accurate gene – exon mapping Differential expression of transcripts

Discussion&Conclusion Shared peptides in protein quantification. Relative abundance of proteins with no unique peptide Relative abundance of distinct proteins. Accuracy of results depends upon the quality of the data. Viability of using shared peptides for peptide detectability computation

Acknowledgements Vineet Bafna (UCSD, CSE) Nuno Bandeira (UCSD, CSE) Xiangqian Li, Zhouxin Shen, Steve Briggs (UCSD, Biology)

Simulation: Results Performance degrades with an increase in perturbation error Full rank systems at R(A)=1, increasing perturbation levels 93% have LRD ≤ 0.1 at σ= % have LRD ≤ 0.1 at σ=0.15

Result I: ill-conditioned systems 339 ill-conditioned systems at most 3 singular values are ≤10 -16, remaining are ≥ Revised LP for ill conditioned systems provides better estimates for those. 65% have LRD ≤ 0.25 under original formulation 88% have LRD ≤ 0.25 under revised formulation