Proteomics Informatics David Fenyő

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

Protein Quantitation II: Multiple Reaction Monitoring
Protein Quantitation II: Multiple Reaction Monitoring
From Genome to Proteome Juang RH (2004) BCbasics Systems Biology, Integrated Biology.
Proteomics Informatics – Protein characterization I: post-translational modifications (Week 10)
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
Kelly Ruggles, Ph.D. Proteomics Informatics Week 9
Protein Sequencing and Identification by Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Kelly Ruggles, Ph.D. Proteomics Informatics March 31, 2015
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
A combination of the words Proteomics and Genomics. Proteogenomics commonly refer to studies that use proteomic information, often derived from mass spectrometry,
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Proteomics Informatics Workshop Part III: Protein Quantitation
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications.
Proteome.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Common parameters At the beginning one need to set up the parameters.
Laxman Yetukuri T : Modeling of Proteomics Data
Quantification of Membrane and Membrane- Bound Proteins in Normal and Malignant Breast Cancer Cells Isolated from the Same Patient with Primary Breast.
Temple University MASS SPECTROMETRY INTRODUCTION Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Introduction to Biostatistics and Bioinformatics Experimental Design.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
ISOMATCH-web For automatic matching of isotope peak distributions ■ Automatic matching of a raw spectrum (ASCII format) to theoretical isotopic distributions.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
Protein quantitation I: Overview (Week 5). Fractionation Digestion LC-MS Lysis MS Sample i Protein j Peptide k Proteomic Bioinformatics – Quantitation.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
Protein Identification via Database searching
Volume 4, Issue 6, Pages e4 (June 2017)
Mass spectrometry-based proteomics
Bioinformatics Solutions Inc.
Phosphopeptide sequencing by MALDI-TOF/TOF of the C-terminal tail of AtPIP2;1.A, MS/MS spectrum of singly phosphorylated 277SLGSFRSAANV287 (m/z ).
Interpretation of Mass Spectra I
A perspective on proteomics in cell biology
Volume 4, Issue 6, Pages e4 (June 2017)
Proteomics Informatics –
A, high resolution MS/MS spectrum (lower panel) of 1435
NoDupe algorithm to detect and group similar mass spectra.
Top-down protein identification.
Bioinformatics for Proteomics
Identification of SUMO3peptides from 2D-LC-MS/MS analyses of a tryptic digest of HEK293-SUMO3 cells using DDA and DIA methods. Identification of SUMO3peptides.
A, schematic presentation of fetuin-A domains.
2D-LC-MS/MS analysis of tryptic digest of HEK293-SUMO3 cells (2 μg inj
Example of MS/MS spectrum of peptide FPTLTGFNR (hypothetical protein with signal peptide EAK88888; N77) from a protein digestion mixture prepared by labeling.
Relative quantification of cis and trans PSP gp10040–42/47–52 variants
Is Proteomics the New Genomics?
Shotgun Proteomics in Neuroscience
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Schematic of AIMS-to-MRM experiment.
Quantitative Proteomics II: Targeted Quantitation
Proteomics Informatics David Fenyő
Identification of Post Translational Modifications
Interpretation of Mass Spectra
Kuen-Pin Wu Institute of Information Science Academia Sinica
Volume 15, Issue 2, Pages (April 2016)
MS3 for peptide identification and mapping phosphorylation sites
Presentation transcript:

Proteomics Informatics David Fenyő

Course Information http://fenyolab.org/pi2018

Protein Identification and Quantitation Samples Peptides Mass Spectrometry Quantity intensity m/z Identity

Central Dogma of Molecular Biology Transcription Replication Translation Modification P

X X Central Dogma of Molecular Biology Slow Fast P Transcription Replication Slow Degradation Translation X Fast Degradation Modification P X

Motivating Example: Protein Regulation GRB7 ERBB4 Breast Cancer ERBB2 ERBB2 GRB7 ERBB4 ERBB2 ERBB2 GRB7 ERBB4 ERBB2 ERBB2

Motivating Example: Protein Complexes Alber et al., Nature 2007

Motivating Example: Signaling Choudhary & Mann, Nature Reviews Molecular Cell Biology 2010

Identified and Quantified Proteins Mass Spectrometry Based Proteomics Lysis Fractionation Digestion Mass spectrometry Peak Finding Charge determination De-isotoping Integrating Peaks Searching MS Identified and Quantified Proteins

Ion Source Mass Analyzer Detector Mass Spectrometry intensity mass/charge

y b Mass Spectrometry Mass Analyzer 1 Frag-mentation Detector Ion Source Mass Analyzer 2 y b

Example data – ESI-LC-MS/MS m/z m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022 MS/MS Time

Information Content in a Single Mass Measurement Human 10 8 6 Avg. #of matching peptides 4 3 2 1 2 3 4 6 8 10 1 #of matching peptides 1000 2000 3000 Tryptic peptide mass [Da] S. cerevisiae 10 8 6 Avg. #of matching peptides 4 3 2 1 2 3 4 6 8 10 1 #of matching peptides 1000 2000 3000 Tryptic peptide mass [Da]

Compare, score, test significance Identified peptides and proteins Protein Identification by Mass Spectrometry Samples Peptides MS/MS Protein DB Compare, score, test significance Identified peptides and proteins

Repeat for all proteins Compare, Score, Test Significance Tandem MS – Database Search Sequence DB Lysis Fractionation Pick Protein Digestion LC-MS Pick Peptide Repeat for all proteins MS/MS All Fragment Masses all peptides Repeat for MS/MS Compare, Score, Test Significance

Search Results

Search Results Most proteins show very reproducible peptide patterns

Search Results

Compare, Score, Test Significance Spectrum Library Search Spectrum Library Lysis Fractionation Digestion LC-MS/MS Pick Spectrum all spectra Repeat for MS/MS Compare, Score, Test Significance Identified Proteins

Interpretation of Mass Spectra K L E D F G S m/z % Relative Abundance 100 250 500 750 1000

Interpretation of Mass Spectra K L E D F G S K 1166 L 1020 E 907 D 778 663 534 405 F 292 G 145 S 88 b ions m/z % Relative Abundance 100 250 500 750 1000

Interpretation of Mass Spectra K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000

Interpretation of Mass Spectra K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

Interpretation of Mass Spectra K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

Interpretation of Mass Spectra K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022 113 113

Interpretation of Mass Spectra K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022 129 129

Interpretation of Mass Spectra K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

Interpretation of Mass Spectra K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

Interpretation of Mass Spectra K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

De Novo Sequencing Sequences consistent with spectrum Amino acid masses 762 100 875 [M+2H]2+ % Relative Abundance 633 292 405 260 389 534 1022 504 663 778 907 1020 1080 250 500 750 1000 m/z Mass Differences Sequences consistent with spectrum

Significance Testing False protein identification is caused by random matching An objective criterion for testing the significance of protein identification results is necessary. The significance of protein identifications can be tested once the distribution of scores for false results is known.

C I Protein Quantitation by Mass Spectrometry Sample i Protein j Lysis ij Protein j Lysis Peptide k Fractionation Digestion MS I LC - MS ik

Protein Quantitation by Mass Spectrometry

Protein Quantitation by Mass Spectrometry

Protein Quantitation by Mass Spectrometry

Protein Quantitation by Mass Spectrometry Light Heavy Lysis Assumption: All losses after mixing are identical for the heavy and light isotopes and Fractionation Digestion Sample i Protein j Peptide k LC-MS MS H L Oda et al. PNAS 96 (1999) 6591 Ong et al. MCP 1 (2002) 376

Protein Quantitation MS MS MS/MS MS/MS LC-MS Digestion Fractionation Shotgun proteomics LC-MS Targeted MS 1. Records M/Z 1. Select precursor ion MS MS Digestion 2. Selects peptides based on abundance and fragments Fractionation 2. Precursor fragmentation MS/MS MS/MS Lysis 3. Protein database search for peptide identification 3. Use Precursor-Fragment pairs for identification Data Dependent Acquisition (DDA) Uses predefined set of peptides

Compare, score, test significance Identified peptides and proteins Proteogenomics Samples Peptides MS/MS Protein DB Compare, score, test significance Identified peptides and proteins

Proteogenomics Next-generation sequencing of the genome Samples and transcriptome Samples Peptides MS/MS Sample-specific Protein DB Compare, score, test significance Identified peptides and proteins

Proteogenomics Non-Tumor Sample Genome sequencing Identify germline variants Genome sequencing RNA-Seq Tumor Sample Identify alternative splicing, somatic variants and novel expression TCGAGAGCTG TCGATAGCTG Exon 1 Exon 2 Exon 3 Variants Alt. Splicing Novel Expression Exon X Fusion Genes Gene X Gene Y Tumor Specific Protein DB Reference Human Database (Ensembl)

Proteogenomics ERBB2 Breast Cancer Breast

Proteogenomics ERBB2 Breast Cancer Breast Ovarian Cancer

Posttranslational Modifications Peptide with two possible modification sites Matching MS/MS spectrum Intensity m/z Which assignment does the data support? 1, 1 or 2, or 1 and 2?

Protein Interactions Digestion Mass spectrometry Identification E F A B Digestion Mass spectrometry Identification

Data Analysis - Normalization Normalized: mean=0, std=1 Raw Data

Data Analysis - Normalization Normalized 3 replicates Normalized 3 replicates + one more replicate a few months later

Data Analysis

FDA calls them “in vitro diagnostic multivariate assays” Molecular Markers A molecular signature is a computational or mathematical model that links high-dimensional molecular information to phenotype or other response variable of interest. FDA calls them “in vitro diagnostic multivariate assays”