Bottom-Up Proteomics Data collection

Slides:



Advertisements
Similar presentations
Test-tube or keyboard? Computation in the life sciences.
Advertisements

David Campbell 1,, Eric Deutsch 1, Henry Lam 1, Hamid Mirzaei 1, Paola Picotti 2, Jeff Ranish 1, Ning Zhang 1, and Ruedi Aebersold 1,2,3 1.Institute for.
ProteinPilot ™ Software © 2008 Applera Corporation and MDS Inc.
Biological pathway and systems analysis An introduction.
Proposal for a Standard Representation of the Results of GC-MS Analysis: A Module for ArMet Helen Fuell 1, Manfred Beckmann 2, John Draper 2, Oliver Fiehn.
Metabolomics DNA RNA Protein Biochemicals (Metabolites) Genomics – 25,000 Genes Transcriptomics – 100,000 Transcripts Metabolomics – 2,800 Compounds Proteomics.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Isolation of N-linked glycopeptides from plasma Yong Zhou 1, Ruedi Aebersold 2, and Hui Zhang 1,3 * 1 Institute for Systems Biology, Seattle, Washington.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
Center for Human Health and the Environment
Common parameters At the beginning one need to set up the parameters.
Novel Empirical FDR Estimation in PepArML David Retz and Nathan Edwards Georgetown University Medical Center.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Laxman Yetukuri T : Modeling of Proteomics Data
PerkinElmer Life Sciences Production Company Meeting - 1st February 2002 Progenesis John Hoyland Product Manager - Bioinformatics.
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Shared Peptides in Mass Spectrometry Based Protein Quantification Banu Dost, Nuno Bandeira, Xiangqian Li, Zhouxin Shen, Steve Briggs, Vineet Bafna University.
Central dogma: the story of life RNA DNA Protein.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
A New Strategy of Protein Identification in Proteomics Xinmin Yin CS Dept. Ball State Univ.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metabolic Network Science, Vol 292, Issue 5518, , 4 May 2001.
Background Spectral library searching Spectral library searching is an alternative approach to traditional sequence database searching for peptide inference.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
What is proteomics? Richard Mbasu and Ben Richards.
Target Analyses in Parallel Reaction Monitoring Mode (PRM)
LOOKING FOR PATTERNS BETWEEN VARIABLES WHY DO SCIENTISTS COLLECT DATA, GRAPH IT AND WRITE EQUATIONS TO EXPRESS RELATIONSHIPS BETWEEN VARIABLES ?
Protein identification by mass spectrometry The shotgun proteomics strategy, based on digesting proteins into peptides and sequencing them using tandem.
Protein identification by mass spectrometry The shotgun proteomics strategy, based on digesting proteins into peptides and sequencing them using tandem.
Custom peptide synthesis services In the quantitative proteomics research, several MS-based methodologies for relative quantification have been introduced.
Selecting the Best Measure for Your Study
Jarrett Egertson, Ph.D. MacCoss Lab
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
What is Science? 1. Science deals only with the natural world.
Creation of assays using repositories
Protein/Peptide Quantification
Protein Inference by Generalized Protein Parsimony reduces False Positive Proteins in Bottom-Up Workflows Nathan J. Edwards, Department of Biochemistry.
Thomas BOTZANOWSKI & Blandine CHAZARIN
Welcome to Introduction to Biology
Proteomics Informatics David Fenyő
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
The Science of Biology! Chapter 1.
A perspective on proteomics in cell biology
Introduction.
Nature of Science.
Volume 5, Issue 6, Pages e3 (December 2017)
NoDupe algorithm to detect and group similar mass spectra.
General strategy for organelle-based proteomics: example of cardiac mitochondria.Top left, validation of organelle-specific proteins occurs at two independent.
Bioinformatics for Proteomics
Identification of chaperonin GroEL (Rv0440) with representative MS/MS spectrum. Identification of chaperonin GroEL (Rv0440) with representative MS/MS spectrum.A,
Example of MS/MS spectrum of peptide FPTLTGFNR (hypothetical protein with signal peptide EAK88888; N77) from a protein digestion mixture prepared by labeling.
Pierre P. Massion, MD, Richard M. Caprioli, PhD 
Protein identification using MS/MS.
A simplified example of a protein summary list.
Research Methods & Statistics
Schematic of AIMS-to-MRM experiment.
Integrative omic approaches for the study of host–pathogen interactions Integrative omic approaches for the study of host–pathogen interactions (A) Proteomic.
Proteomics Informatics David Fenyő
The Processes of Science
Presentation transcript:

Bottom-Up Proteomics Data collection Ruedi Aebersold, Ph.D Institute of Molecular Systems Biology, ETH-Zürich; Faculty of Science, University of Zürich

The proteome: The ensemble of all biochemical reactions

Steps of Bottom-up proteomics protein sample protein identifications Database Protein level A B C D A B C Peptide grouping/ validation enzymatic digestion Peptide level Quantitation Validation Database search peptide mixture peptide identifications LC/MS/MS MS/MS spectrum level MS/MS spectra Protein Inference assumptions?? -- many possible, none is right

The proteome as seen by a mass spectrometer: Possibly:10exp6- 10exp8 features 1200 m/z 1100 1000 900 800 700 600 500 400 min 10 20 30 40 50 60 70 80 90 100 110

Slicing and dicing the proteome

Proteomics: The global (quantitative) analysis of the proteins expressed in a cell at a time Enumerate all the components of a proteome - Proteome as Database -Analytic chemistry slant Proteome analyzed once Detect dynamic changes in proteome following external or internal perturbations - Proteomics as Biol. or clin. Assay - Biology slant multiple (infinite) times Haynes P, Gygi S, Figeys D, and Aebersold R. (1998) Proteome analysis: biological assay or data archive? Electrophoresis 19:1862-1871

Proteomics: The global (quantitative) analysis of the proteins expressed in a cell at a time Enumerate all the components of a proteome Proteome as database: Proteome analyzed once Detect dynamic changes in proteome following external or internal perturbations Proteomics as Biol. or clin. assay: multiple (infinite) times

Human PeptideAtlas 2013-2015 14,274 13,230 +377 +110 +4 +34 +3 +516 2014 2015 2013 2014 2015 2013 2013

Human PeptideAtlas 2013-2015 14,274 13,230 +377 +110 +4 +34 +3 +516 True new identifications or statistical noise? 2013 2014 2015 2013 2014 2015 2013 2013

Open questions re: Proteome catalogue… When have we reached an endpoint in proteome cataloguing? Why do we reach apparent saturation before hitting all predicted ORF’s? What are relevant endpoints? (one representative per ORF?, all proteoforms? Other?) How do we quantify proteins? How do errors propagate in large datasets and how do we control FDR at peptide and protein level? How do we best complete the catalogue? What (biology) can we learn from the (complete) catalogue?

Proteomics: The global (quantitative) analysis of the proteins expressed in a cell at a time Enumerate all the components of a proteome Proteome as database: Proteome analyzed once Detect dynamic changes in proteome following external or internal perturbations Proteomics as Biol. or clin. assay: multiple (infinite) times

Data and the scientific method Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise. But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

Data matrix supporting analyses via association Accurately and reproducibly quantify proteotypes across samples and conditions Conditions: Clinical cohorts Time courses Dosage courses Samples with structured genomes Conditions 1-n Proteins 1--n

Open questions re: Association studies How many proteins are enough? Which ones? How precisely do proteins need to be quantified? Which peptides are best suited to quantify a protein? Should proteins be considered as independent actors (like transcripts) or as parts of modules? What factors affect protein modules and how? How do errors propagate in large datasets and how do we control FDR at peptide and protein level? Are data reliable, robust and accessible enough? Data integration, dissemination of methods.