Build Results Plasma-only Build Empirical Observability Scores Eric W. Deutsch, Nichole L. King, Jimmy K. Eng, Alexey I. Nesvizhskii, David S. Shteynberg,

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

David Campbell 1,, Eric Deutsch 1, Henry Lam 1, Hamid Mirzaei 1, Paola Picotti 2, Jeff Ranish 1, Ning Zhang 1, and Ruedi Aebersold 1,2,3 1.Institute for.
Microsoft Excel 2003 Illustrated Complete Excel Files and Incorporating Web Information Sharing.
UC Mass Spectrometry Facility & Protein Characterization for Proteomics Core Proteomics Capabilities: Examples of Protein ID and Analysis of Modified Proteins.
MS-Viewer – A Web Based Spectral Viewer For Database Search Results Peter R. Baker 1, Alma L. Burlingame 1 and Robert J. Chalkley 1 1 Mass Spectrometry.
1336 SW Bertha Blvd, Portland OR 97219
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
How to Create Top Ranking Searchable and Accessible Documents Chris Pollett and Elizabeth Tu April, 2010.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Scaffold Download free viewer:
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
Proteomics Josh Leung Biology 1220 April 13 th, 2010.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Absolute protein quantification estimated by spectral counting using large datasets in PeptideAtlas Ning Zhang 1*, Eric W. Deutsch 1*, Henry Lam 1, Hamid.
Daehee Hwang Leroy Hood Institute for Systems Biology.
6 th Annual Focus Users’ Conference Application Editor and Form Builder Presented by: Mike Morris.
Evaluated Reference MS/MS Spectra Libraries Current and Future NIST Programs.
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
Human Proteome Project? Màster en bioquímica, biologia molecular i biomedicina Mòdul 4: Genòmica i Proteòmica Núria Colomé Calls.
StAR web server tutorial for ROC Analysis. ROC Analysis ROC Analysis: This module allows the user to input data for several classifiers to be tested.
Copyright OpenHelix. No use or reproduction without express written consent1.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Generating Peptide Candidates from Protein Sequence Databases for Protein Identification via Mass Spectrometry Nathan Edwards Informatics Research.
Common parameters At the beginning one need to set up the parameters.
Training Guide for Inzalo SOP Users. This guide has been prepared to demonstrate the use of the Inzalo Intranet based SOP applications. The scope of this.
Microsoft Office 2007 Intermediate© 2008 Pearson Prentice Hall1 PowerPoint Presentation to Accompany GO! With Microsoft ® Office 2007 Intermediate Chapter.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
CaIntegrator2 – Part 1: Create a Study with Clinical Data Fan Lin, Ph. D Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Johannes Griss PSI Meeting Heidelberg, April 2011 EBI is an Outstation of the European Molecular Biology Laboratory. mzTab Proposal for.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Background Spectral library searching Spectral library searching is an alternative approach to traditional sequence database searching for peptide inference.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Peptide-assisted annotation of the Mlp genome Philippe Tanguay Nicolas Feau David Joly Richard Hamelin.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Welcome to the combined BLAST and Genome Browser Tutorial.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
2015/06/03 Park, Hyewon 1. Introduction Protein assembly Transforms a list of identified peptides into a list of identified proteins. 2 Duplicate Spectrum.
Using Scaffold OHRI Proteomics Core Facility. This presentation is intended for Core Facility internal training purposes only.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Cedar: A Multi-Tiered Protein Identification Scheme for Shotgun Proteomics Terry Farrah (1); Eric Deutsch (1); Gilbert Omenn (2,1); Ruedi Aebersold (3),
Considerations for multi-omics data integration Michael Tress CNIO,
CPAS Comparative Proteomics Analysis System Adam Rauch LabKey Software
Protein identification by mass spectrometry The shotgun proteomics strategy, based on digesting proteins into peptides and sequencing them using tandem.
Proteomic Parsimony through Bipartite Graph Analysis Improves Accuracy and Transparency 2013/05/28 Ahn, Soohan.
A Database of Peak Annotations of Empirically Derived Mass Spectra
MassMatrix Search Results Explained
SAGExplore web server tutorial for Module III:
Protein Identification via Database searching
Creation of assays using repositories
Proteomics Informatics David Fenyő
A perspective on proteomics in cell biology
Fast, Quantitative and Variant Enabled Mapping of Peptides to Genomes
High level view of the MAE algorithm.
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
(A) Design of the PhosphoPep database.
Proteomics Informatics David Fenyő
Operation manual of AI SIDA
Presentation transcript:

Build Results Plasma-only Build Empirical Observability Scores Eric W. Deutsch, Nichole L. King, Jimmy K. Eng, Alexey I. Nesvizhskii, David S. Shteynberg, and Ruedi Aebersold PeptideAtlas is a multi-organism, publicly accessible compendium of peptides identified in a large set of LC – MS/MS proteomics experiments and interfaces to access the datasets. Both previously published and unpublished raw experimental data are contributed from researchers around the world. All results of SEQUEST sequence searching have subsequently been processed through PeptideProphet to derive a probability of correct identification in a uniform manner to insure a high quality database. All peptides have been mapped to the Ensembl genome and can be viewed as custom tracks on the Ensembl Genome Browser. The peptides and their annotations in PeptideAtlas aid in the analysis of new experiments by allowing comparison with previous work, contributing to the definition and annotation of the proteome, and supporting high throughput approaches by providing a resource for identifying the best peptides to target and a means to identify more rapidly the spectra that have been previously observed. Data Repository The Observed Human Proteome in PeptideAtlas Empirical Observability Score = (EOS) N samples (peptide) N samples (parent protein) It has been frequently noted that when a protein is observed in a sample that is analyzed with LC – MS/MS techniques, some of the protein’s component peptides are observed many times, while other component peptides are not observed at all, despite being in the observable mass range and otherwise having attributes appropriate for MS analysis. Several algorithms that attempt to predict observability based on sequence attributes have been put forward. These algorithms are often heavily influenced by the data with which they are trained. Peptides that are often observed and map uniquely to only one protein have been called “proteotypic”. We now routinely calculate an empirical observability score, defined below, for all peptides in a PeptideAtlas build. These scores do not rely on prediction algorithms, but merely reflect the frequency with which peptides are observed when the parent protein is observed. We define the Empirical Observability Score (EOS) as the number of samples within which the given peptide is observed divided by the number of samples for which the parent protein was observed. For example, if Protein X is observed in 10 samples within a PeptideAtlas build and its component peptide A is observed in 5 of those, the EPS is 0.5. Note that the number of times a peptide is observed within any given sample is not a factor. In order to visualize the relationships between peptides and proteins and the usefulness of the peptides for targeted proteomics, one can export PeptideAtlas information to the Cytoscape visualization tool. At left, the proteins are shown in purple ovals, peptides in blue rectangles, and proteotypic peptides in green. Very often observed peptides have a red border. Since shotgun-style experiments of complex samples will sometimes miss some proteins due to the large number of peptides present, a targeted experiment in which only peptides contained within specific proteins of interested are selected by the mass spectrometer will be more successful and time efficient. Using the PeptideAtlas web interface one can select a list of peptides based on the EPS score (defined) below and other attributes as an aid in the design of targeted experiments. For example, one can query the PeptideAtlas for the peptides matching constraints: contained within the desired list of target proteins, having EOS > 0.3, mapping to exactly one protein within the proteome, and having an observation count greater than 5. This list can then be used as an inclusion list of the mass spectrometer. The PeptideAtlas web interface provides a simple search box within which a protein name, accession number, peptide sequence, or some fraction thereof can be entered. Search results are returned for specific or any organism. From there, tabs provide a way to navigate between peptide and protein views within the database, and see results within different PeptideAtlas builds. Above is a screenshot of the protein view within the PeptideAtlas web interface. The user has entered a protein of interest and is returned a summary of information about the protein, the number of times it has been observed in the build, its sequence, which peptides map to it, and the samples in which it has been observed. Clicking on the constituent peptides provides additional information. Additionally there are links to the Ensembl Genome Browser, allowing users to see PeptideAtlas peptides as tracks in the genome browser. Links within the Ensembl Genome Browser link back to the PeptideAtlas interface as well. In addition to making the final PeptideAtlas builds available on the PeptideAtlas web site for browsing as well as downloading, we also redistribute the raw instrument output for all experiments for which the contributors allow it. We currently have over 130 experiments available for download. For each experiment, we provide: Raw data in native instrument format Raw data in popular mzXML format Tarball of all SEQUEST search results and Trans Proteomics Pipeline postprocessing (PeptideProphet, ProteinProphet results, etc.) Final identification summary (ProteinProphet output) in protXML format A README text file describing the experiment, protocols, and other information Links to contributors and publications where available Custom tracks for PeptideAtlas builds are available at Ensembl. Any Ensembl user may click on the [DAS Sources] pulldown and enable the PeptideAtlas and/or Plasma PeptideAtlas custom tracks and view peptides overlayed on the genome. The above screenshot shows an instance of alternative splicing confirmed: peptides PAp and PAp begin on the same exon, but finish on different exons. Clicking on the peptide glyphs causes a popup widow with more information to appears, including a hyperlink to the PeptideAtlas peptide view page for complete information on the selected peptide. Protein View Mar 2007 Build # Samples52 # MS runs48 thousand # MS/MS spectra searched14 million # MS/MS ID’ed P> million # Multobs Peptides P>0.915,600 # Proteins identified1950 % of Genes mapped to8% Number of distinct peptides observed as a function of the total number of MS/MS spectra identified with P>0.9 and the corresponding statistics for the latest published Human build. Each box is an experiment. Total height (red) represents the cumulative counter of distinct peptides in the build. The blue height represents the number of distinct peptides in each experiment individually. March 2007 Build # Samples143 # MS runs52 thousand # MS/MS spectra searched17 million # MS/MS ID’ed P> million # Multobs Peptides P>0.935,391 # Proteins identified10,495 % of Genes mapped to30% Above is a screenshot of the peptide view within the PeptideAtlas web interface. The user has entered (or followed a link to) a peptide of interest and is returned a summary of information about the peptide, the number of times it has been observed in the build, its sequence, which charge states and modifications were seen, the samples in which it has been observed, and a listing of individual identified spectra. Clicking on the spectra in the listing brings up it spectrum, decorated with identified ions (to right). Peptide View The protein content of human plasma is considered important for medical diagnosis and has the potential to provide a complete snapshot of the health of an individual. In addition to proteins that carry out their function within the circulatory system, plasma contains proteins that are secreted or leaked from cells and organs throughout the body. As a diagnostic tool, plasma is even more valuable by virtue of its accessibility, with millions of samples stored in clinical archives and even more obtained every year from patients. Human plasma is thought to contain a large number of proteins, perhaps nearly all human proteins on account of low-level tissue leakage. Further, human plasma also contains proteins from foreign organisms as well as millions of distinct immunoglobulins. However, a mere 22 proteins make up 99% of the mass of protein in human serum, and thus an investigation of the thousands of very low abundance proteins is difficult. Due to its medical importance and the large number of human serum and plasma samples that have been made available to us, we have generated a special Human PeptideAtlas Build. As the number of spectra with P > 0.9 assignments are added, we show the cumulative number of distinct peptides observed more than once; peptides observed only once in all of the spectra are exlcuded due to the higher proportion of false positives among this set. Major contributions of spectra come from the HUPO Plasma Proteome Project, Pacific Northwest National Labs (PNNL), NCI, Novartis and the SPC itself. Of the 14 million input spectra, over 1.3 million have been identified, and these coalesce into over 15,000 distinct peptides observed more than once. The most recent trend suggests that despite extensive addition of identified spectra, the list of distinct spectra has flattened. We may be close to covering the major fraction of the human plasma petideome achievable with current MS technologies. A screenshot of the peptide view for PAp , a tryptic peptide that maps uniquely to Secreted phosphoprotein 24 precursor (SPP2). The peptide was observed only in 3 experiments, the large-scale, high sensitivity experiments from HUPO, NCI and Novartis. But since this peptide was seen in 3 of the 4 experiments in which SPP2 was detected, the peptide gets a high Empirical Observability Score. This peptide maps uniquely to one protein only and makes a good peptide for a targeted experimental approach. 50 plasma & serum 29 cell line 13 other 2 saliva 6 bronchial lavage 3 brain tissue 2 cancer tissue 7 B/T cell 9 cell culture Contribution by Number of Experiments Above is a screenshot of the spectrum viewer. Both individual and consensus spectra are available for viewing (and export). Observed B ions are labeled in red and Y ions are labeled in blue. Individual & Consensus Spectra PeptideAtlas protein view page Cytoscape view of proteins & peptides proteins ambiguously mapped peptide proteotypic peptides N prot = 1 N obs > 1 EOS > 0.3