Download presentation
Presentation is loading. Please wait.
1
Bottom-Up Proteomics Data collection
Ruedi Aebersold, Ph.D Institute of Molecular Systems Biology, ETH-Zürich; Faculty of Science, University of Zürich
2
The proteome: The ensemble of all biochemical reactions
3
Steps of Bottom-up proteomics
protein sample protein identifications Database Protein level A B C D A B C Peptide grouping/ validation enzymatic digestion Peptide level Quantitation Validation Database search peptide mixture peptide identifications LC/MS/MS MS/MS spectrum level MS/MS spectra Protein Inference assumptions?? -- many possible, none is right
4
The proteome as seen by a mass spectrometer:
Possibly:10exp6- 10exp8 features 1200 m/z 1100 1000 900 800 700 600 500 400 min 10 20 30 40 50 60 70 80 90 100 110
5
Slicing and dicing the proteome
6
Proteomics: The global (quantitative) analysis
of the proteins expressed in a cell at a time Enumerate all the components of a proteome - Proteome as Database -Analytic chemistry slant Proteome analyzed once Detect dynamic changes in proteome following external or internal perturbations - Proteomics as Biol. or clin. Assay - Biology slant multiple (infinite) times Haynes P, Gygi S, Figeys D, and Aebersold R. (1998) Proteome analysis: biological assay or data archive? Electrophoresis 19:
7
Proteomics: The global (quantitative) analysis
of the proteins expressed in a cell at a time Enumerate all the components of a proteome Proteome as database: Proteome analyzed once Detect dynamic changes in proteome following external or internal perturbations Proteomics as Biol. or clin. assay: multiple (infinite) times
8
Human PeptideAtlas 2013-2015 14,274 13,230 +377 +110 +4 +34 +3 +516
2014 2015 2013 2014 2015 2013 2013
9
Human PeptideAtlas 14,274 13,230 +377 +110 +4 +34 +3 +516 True new identifications or statistical noise? 2013 2014 2015 2013 2014 2015 2013 2013
10
Open questions re: Proteome catalogue…
When have we reached an endpoint in proteome cataloguing? Why do we reach apparent saturation before hitting all predicted ORF’s? What are relevant endpoints? (one representative per ORF?, all proteoforms? Other?) How do we quantify proteins? How do errors propagate in large datasets and how do we control FDR at peptide and protein level? How do we best complete the catalogue? What (biology) can we learn from the (complete) catalogue?
11
Proteomics: The global (quantitative) analysis
of the proteins expressed in a cell at a time Enumerate all the components of a proteome Proteome as database: Proteome analyzed once Detect dynamic changes in proteome following external or internal perturbations Proteomics as Biol. or clin. assay: multiple (infinite) times
12
Data and the scientific method
Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise. But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
13
Data matrix supporting analyses via association
Accurately and reproducibly quantify proteotypes across samples and conditions Conditions: Clinical cohorts Time courses Dosage courses Samples with structured genomes Conditions 1-n Proteins 1--n
14
Open questions re: Association studies
How many proteins are enough? Which ones? How precisely do proteins need to be quantified? Which peptides are best suited to quantify a protein? Should proteins be considered as independent actors (like transcripts) or as parts of modules? What factors affect protein modules and how? How do errors propagate in large datasets and how do we control FDR at peptide and protein level? Are data reliable, robust and accessible enough? Data integration, dissemination of methods.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.