Download presentation
Presentation is loading. Please wait.
Published byDwain Hines Modified over 9 years ago
1
Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications Protein complexes Cross-linking The Global Proteome Machine Database
2
MS MS/MS Biological System Samples Information about each sample Information about the biological system Measurements What does the sample contain? How much? Proteomics Informatics Experimental Design Data Analysis Information Integration Sample Preparation What does the sample contain? How much?
3
Biological System Information about each sample Information about the biological system What does the sample contain? How much? Sample Preparation Experimental Design Data Analysis Information Integration MS MS/MS Samples Measurements Sample Preparation What does the sample contain? How much? Enrichment Separation etc Digestion Top down Bottom up PeptidesProteins Fragmentation Fragments
4
Top down / bottom up Top down Bottom up mass/charge intensity
5
Top down Bottom up Charge distribution mass/charge intensity mass/charge intensity 1+ 2+ 3+ 4+ 27+ 31+
6
Top down Bottom up Isotope distribution mass/charge intensity mass/charge intensity
7
Fragmentation Top downBottom up Fragmentation
8
Correlations between modifications Top down Bottom up
9
Alternative Splicing Top down Bottom up Exon 123
10
Top down Kellie et al., Molecular BioSystems 2010 Protein mass spectra Fragment mass spectra
11
Non-Covalent Protein Complexes Schreiber et al., Nature 2011
12
Dynamic Range in Proteomics Large discrepancy between the experimental dynamic range and the range of amounts of different proteins in a proteome Experimental Dynamic Range Distribution of Protein Amounts Log (Protein Amount) Number of Proteins The goal is to identify and characterize all components of a proteome Desired Dynamic Range
13
Experimental Designs Simulated
14
Parameters in Simulation ● Distribution of protein amounts in sample ● Loss of peptides before binding to the column ● Loss of peptides after elution off the column ● Distribution of mass spectrometric response for different peptides present at the same amount ● Total amount of peptides that are loaded on column (limited by column loading capacity) ● # of peptide fractions ● # of Proteins in each fraction ● Total amount of peptides that are loaded on column (limited by column loading capacity) ● # of peptide fractions ● Dynamic range of mass spectrometer ● Detection limit of mass spectrometer
15
Simulation Results for 1D-LC-MS Complex Mixtures of Proteins RPC Digestion MS Analysis No Protein Separation Protein Separation: 10 fractions Protein Separation: 10 fractions No Protein Separation Tissue Body Fluid
16
Success Rate of a Proteomics Experiment DEFINITION: The success rate of a proteomics experiment is defined as the number of proteins detected divided by the total number of proteins in the proteome. Log (Protein Amount) Number of Proteins Proteins Detected Distribution of Protein Amounts
17
Relative Dynamic Range of a Proteomics Experiment DEFINITION: RELATIVE DYNAMIC RANGE, RDR x, where x is e.g. 10%, 50%, or 90% Log (Protein Amount) RDR 90 RDR 50 RDR 10 Fraction of Proteins Detected Number of Proteins Proteins Detected Distribution of Protein Amounts
18
Repeat Analysis 1 Analysis2 Analyses3 Analyses4 Analyses5 Analyses6 Analyses7 Analyses8 Analyses
19
Repeat Analysis: Comparison of Simulations and Experiments
20
Number of Proteins in Mixture TissueBody Fluid 112 RDR 50 Success Rate Tissue Body Fluid 1 1 Tissue 2 2 2
21
Amount loaded and peptide separation 1. Protein separation 2. Amount loaded 3. Peptide separation Order: Tissue Protein separation Tissue Protein separation Amount loaded Tissue Protein separation Peptide separation Amount loaded 1. Protein separation 2. Peptide separation 3. Amount loaded Protein separation Tissue Protein separation Peptide separation Tissue Protein separation Amount loaded Peptide separation Protein separation Amount loaded Peptide separation Ranges: Protein separation: 30000 – 3000 proteins in each fraction Amount loaded: 0.1 ug – 10 ug Peptide separation: 100 – 1000 fractions
22
Phosphopeptide identification m precursor = 2000 Da m precursor = 1 Da m fragment = 0.5 Da Phosphorylation Localization of modifications
23
Localization (d min =3) m precursor = 2000 Da m precursor = 1 Da m fragment = 0.5 Da Phosphorylation d min >=3 for 47% of human tryptic peptides Localization of modifications
24
Localization (d min =2) m precursor = 2000 Da m precursor = 1 Da m fragment = 0.5 Da Phosphorylation d min =2 for 33% of human tryptic peptides Localization of modifications
25
Localization (d min =1) m precursor = 2000 Da m precursor = 1 Da m fragment = 0.5 Da Phosphorylation d min =1 for 20% of human tryptic peptides Localization of modifications
26
Localization (d=1*) m precursor = 2000 Da m precursor = 1 Da m fragment = 0.5 Da Phosphorylation Localization of modifications
27
Peptide with two possible modification sites Localization of modifications
28
Peptide with two possible modification sites MS/MS spectrum m/z Intensity Localization of modifications
29
Peptide with two possible modification sites MS/MS spectrum m/z Intensity Matching Localization of modifications
30
Peptide with two possible modification sites MS/MS spectrum m/z Intensity Matching Which assignment does the data support? 1, 1 or 2, or 1 and 2? Localization of modifications
31
AAYYQK Visualization of evidence for localization AAYYQK
32
Visualization of evidence for localization
33
3 2 1 3 2 1
34
Estimation of global false localization rate using decoy sites By counting how many times the phosphorylation is localized to amino acids that can not be phosphorylated we can estimate the false localization rate as a function of amino acid frequency. Amino acid frequency False localization frequency Y
35
How much can we trust a single localization assignment? If we can generate the distribution of scores for assignment 1 when 2 is the correct assignment, it is possible to estimate the probability of obtaining a certain score by chance for a given peptide sequence and MS/MS spectrum assignment.
36
Is it a mixture or not? If we can generate the distribution of scores for assignment 2 when 1 is the correct assignment, it is possible to estimate the probability of obtaining a certain score by chance for a given peptide sequence and MS/MS spectrum assignment.
37
1 and 2 1 1 or 2 Ø Localization of modifications
38
Protein Complexes A B A C D Digestion Mass spectrometry
39
Tackett et al. JPR 2005 Protein Complexes – specific/non-specific binding
40
Sowa et al., Cell 2009 Protein Complexes – specific/non-specific binding
41
Choi et al., Nature Methods 2010
42
Analysis of Non-Covalent Protein Complexes Taverner et al., Acc Chem Res 2008
43
Determining the architectures of macromolecular assemblies Alber et al., Nature 2007
44
M/Z Peptides Fragments Fragmentation Proteolytic Peptides Enzymatic Digestion Protein Complex Chemical Cross-Linking MS MS/MS Isolation Cross-Linked Protein Complex Interaction Partners by Chemical Cross-Linking
45
M/Z Peptides Fragments Fragmentation Proteolytic Peptides Enzymatic Digestion Protein Complex Chemical Cross-Linking MS MS/MS Isolation Cross-Linked Protein Complex Interaction Sites by Chemical Cross-Linking
46
Cross-linking protein n peptides with reactive groups (n-1)n/2 potential ways to cross-link peptides pairwise + many additional uninformative forms Protein A + IgG heavy chain 990 possible peptide pairs Yeast NPC ˜ 10 6 possible peptide pairs
47
Cross-linking Mass spectrometers have a limited dynamic range and it therefore important to limit the number of possible reactions not to dilute the cross-linked peptides. For identification of a cross-linked peptide pair, both peptides have to be sufficiently long and required to give informative fragmentation. High mass accuracy MS/MS is recommended because the spectrum will be a mixture of fragment ions from two peptides. Because the cross-linked peptides are often large, CAD is not ideal, but instead ETD is recommended.
48
Search Results
51
GPMDB
52
Year (as of Jan 1 st ) Assigned spectra Sequence-spectrum assignments in GPMDB
53
Human Genes Observed in GPMDB
54
Proteotypic peptide relative composition
55
Comparison with GPMDB Most proteins show very reproducible peptide patterns
56
Comparison with GPMDB
57
Global frequency of observing a peptide Peptide SequenceObservations FSTVAGESGSADTVR2633 FNTANDDNVTQVR2432 AFYVNVLNEEQR1722 LVNANGEAVYCK1701 GPLLVQDVVFTDEMAHFDR1637 LSQEDPDYGIR1560 LFAYPDTHR1499 NLSVEDAAR1400 FYTEDGNWDLVGNNTPIFFIR1386 ADVLTTGAGNPVGDK1338
58
If the number of times a peptide sequence (i) has been observed is n i, then for a particular protein: Global frequency of observing a peptide
59
Define a normalized global frequency of observation for a particular peptide sequence from a particular protein as: Global frequency of observing a peptide (ω)
60
Peptide Sequenceω FSTVAGESGSADTVR0.08 FNTANDDNVTQVR0.07 AFYVNVLNEEQR0.05 LVNANGEAVYCK0.05 GPLLVQDVVFTDEMAHFDR0.05 LSQEDPDYGIR0.04 LFAYPDTHR0.04 NLSVEDAAR0.04 FYTEDGNWDLVGNNTPIFFIR0.04 ADVLTTGAGNPVGDK0.04 Global frequency of observation (ω), catalase
61
ω Peptide sequences Global frequency of observation (ω), catalase
62
For any set peptides observed in an experiment assigned to a particular protein (1 to j ): Omega (Ω) value for a protein identification
63
Protein IDΩ (z=2)Ω (z=3) SERPINB10.880.82 SNRPD10.880.59 CFL10.810.87 SNRPE0.80.81 PPIA0.790.64 CSTA0.790.36 PFN10.760.61 CAT0.710.78 GLRX0.660.8 CALM10.620.76 FABP50.570.17 Protein Ω’s for a set of identifications
64
Part of Best Practices Integrative Informatics Consultation Service (BPIC) at the NYU Center for Health Informatics and Bioinformatics (CHIBI) Contact InformaticsConsultation@nyumc.org or David.Fenyo@nyumc.org Walk-in Clinic: Wednesday, February 23, 3-5 pm 227 E 30th Street, 7th Floor, Room #739 Proteomics Consultation
65
Proteomics Informatics Workshop Part III: Protein Quantitation February 25, 2011 Metabolic labeling – SILAC Chemical labeling Label-free quantitation Spectrum counting Stoichiometry Protein processing and degradation Biomarker discovery and verification
66
Proteomics Informatics Workshop Part I: Protein Identification, February 4, 2011 Part II: Protein Characterization, February 18, 2011 Part III: Protein Quantitation, February 25, 2011
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.