Overview We have developed a complete, end-to-end data analysis pipeline that provides an automated, reliable, consistent, and objective analysis of high-throughput.

Slides:



Advertisements
Similar presentations
David Campbell 1,, Eric Deutsch 1, Henry Lam 1, Hamid Mirzaei 1, Paola Picotti 2, Jeff Ranish 1, Ning Zhang 1, and Ruedi Aebersold 1,2,3 1.Institute for.
Advertisements

Improvements in Mass Spectrometry for Life Science Research – Does Agilent Have the Answer? Ashley Sage PhD.
MS-Viewer – A Web Based Spectral Viewer For Database Search Results Peter R. Baker 1, Alma L. Burlingame 1 and Robert J. Chalkley 1 1 Mass Spectrometry.
1336 SW Bertha Blvd, Portland OR 97219
Computational Biology: A Measurement Perspective Alden Dima Information Technology Laboratory
PepArML: A model-free, result-combining peptide identification arbiter via machine learning Xue Wu, Chau-Wen Tseng, Nathan Edwards University of Maryland,
Proposal for a Standard Representation of the Results of GC-MS Analysis: A Module for ArMet Helen Fuell 1, Manfred Beckmann 2, John Draper 2, Oliver Fiehn.
MSIS 110: Introduction to Computers; Instructor: S. Mathiyalakan1 Systems Design, Implementation, Maintenance, and Review Chapter 13.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
ProReP - Protein Results Parser v3.0©
FIGURE 5. Plot of peptide charge state ratios. Quality Control Concept Figure 6 shows a concept for the implementation of quality control as system suitability.
Introduction to Systems Analysis and Design
Scaffold Download free viewer:
Build Results Plasma-only Build Empirical Observability Scores Eric W. Deutsch, Nichole L. King, Jimmy K. Eng, Alexey I. Nesvizhskii, David S. Shteynberg,
Version 4 for Windows NEX T. Welcome to SphinxSurvey Version 4,4, the integrated solution for all your survey needs... Question list Questionnaire Design.
Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
Absolute protein quantification estimated by spectral counting using large datasets in PeptideAtlas Ning Zhang 1*, Eric W. Deutsch 1*, Henry Lam 1, Hamid.
Daehee Hwang Leroy Hood Institute for Systems Biology.
Isolation of N-linked glycopeptides from plasma Yong Zhou 1, Ruedi Aebersold 2, and Hui Zhang 1,3 * 1 Institute for Systems Biology, Seattle, Washington.
The CCE 5 th Annual Retreat Global Proteomics & Determination of Vitamin D Metabolites Update Jiri Adamec.
Conclusions What’s next? * Implementation of additional input formats * Additional vendor support: As vendors become more open with their APIs for accessing.
Data standards from the Proteomics Standards Initiative Andy Jones University of Liverpool.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
Principles of Information Systems, Sixth Edition Systems Design, Implementation, Maintenance, and Review Chapter 13.
ITGS Case Study Theatre Booking System Ayushi Pradhan.
Common parameters At the beginning one need to set up the parameters.
Novel Empirical FDR Estimation in PepArML David Retz and Nathan Edwards Georgetown University Medical Center.
Data Standards Submission 1 st CHr-16 Workshop. Miraflores de la Sierra August, 28 th -29 th 2012 Alberto Medina.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Principles of Information Systems, Sixth Edition Systems Design, Implementation, Maintenance, and Review Chapter 13.
Laxman Yetukuri T : Modeling of Proteomics Data
Novel Algorithms for the Quantification Confidence in Quantitative Proteomics with Stable Isotope Labeling* Novel Algorithms for the Quantification Confidence.
PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.
Enabling Reuse-Based Software Development of Large-Scale Systems IEEE Transactions on Software Engineering, Volume 31, Issue 6, June 2005 Richard W. Selby,
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
自動化蛋白質定量系統 Automatic Protein Quantitation System 生物資訊實驗室 計畫主持人 許聞廉 特聘研究員 宋定懿 研究員 Relative quantitative proteomics Labeling Label-free MS MS/MS 14 N / 15.
Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
Standards for proteomics: The HUPO Proteomics Standards Initiative (HUPO PSI) Public Repository for Mass spectrometry spectral.
Proteomics databases for comparative studies: Transactional and Data Warehouse approaches Patricia Rodriguez-Tomé, Nicolas Pinaud, Thomas Kowall GeneProt,
Principles of Information Systems, Sixth Edition 1 Systems Design, Implementation, Maintenance, and Review Chapter 13.
A Reference Library of Peptide Ion Fragmentation Spectra Stephen Stein 1 ; Lisa Kilpatrick 2 ; Pedatsur Neta 1 ; Jeri Roth 1 ; Xiaoyu Yang 1 National Institute.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Background Spectral library searching Spectral library searching is an alternative approach to traditional sequence database searching for peptide inference.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Poster produced by Faculty & Curriculum Support (FACS), Georgetown University Medical Center Application of meta-search, grid-computing, and machine-learning.
Peptide-assisted annotation of the Mlp genome Philippe Tanguay Nicolas Feau David Joly Richard Hamelin.
Salamanca, March 16th 2010 Participants: Laboratori de Proteomica-HUVH Servicio de Proteómica-CNB-CSIC Participants: Laboratori de Proteomica-HUVH Servicio.
Evaluating Dynamic Services in Bioinformatics Maíra R. Rodrigues Michael Luck University of Southampton, UK Tenth International Workshop CIA 2006, Edinburgh.
Application of meta-search, grid-computing, and machine-learning can significantly improve the sensitivity of peptide identification. The PepArML meta-search.
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
Minimize Database-Dependence in Proteome Informatics Apr. 28, 2009 Kyung-Hoon Kwon Korea Basic Science Institute.
CoLIMS progress Computational Omics and Systems Biology (CompOmics) Group Niels Hulstaert
Using Scaffold OHRI Proteomics Core Facility. This presentation is intended for Core Facility internal training purposes only.
Cedar: A Multi-Tiered Protein Identification Scheme for Shotgun Proteomics Terry Farrah (1); Eric Deutsch (1); Gilbert Omenn (2,1); Ruedi Aebersold (3),
CPAS Comparative Proteomics Analysis System Adam Rauch LabKey Software
Agenda Welcome from the Skyline team!
Protein identification by mass spectrometry The shotgun proteomics strategy, based on digesting proteins into peptides and sequencing them using tandem.
Protein identification by mass spectrometry The shotgun proteomics strategy, based on digesting proteins into peptides and sequencing them using tandem.
Systems Analysis and Design in a Changing World, Fifth Edition
Yiming Yang1,2, Abhay Harpale1 and Subramanian Ganaphathy1
Bottom-Up Proteomics Data collection
Creation of assays using repositories
What's New in eCognition 9
Bioinformatics for Proteomics
Kuen-Pin Wu Institute of Information Science Academia Sinica
What's New in eCognition 9
What's New in eCognition 9
Presentation transcript:

Overview We have developed a complete, end-to-end data analysis pipeline that provides an automated, reliable, consistent, and objective analysis of high-throughput quantitative LC-MS/MS data from multiple data sources using multiple search engines. The Trans-Proteomics Pipeline (TPP) is a complete, mature, suite of software tools for MS data representation, MS data visualization, peptide identification and validation, protein identification, quantification, and annotation, data storage and mining, and biological inference. The TPP has been adopted throughout the international proteomics community, in use at many prominent academic and corporate labs. We present an overview of the TPP and describe newly available functionality. All software tools are freely available under an open- source software license at tools.proteomecenter.org Introduction High throughput LC-MS/MS is capable of simultaneously identifying and quantifying thousands of proteins in a complex sample. The consistent and objective analysis of the obtained large amounts of data is challenging and time-consuming. Over the past 5 years, we have developed and refined a data analysis pipeline that facilitates and standardizes such analysis. The Trans-Proteomic Pipeline (TPP) is an open-source software package with well-established community acceptance. The TPP provides a completely free, open-source proteomics analysis solution, spanning: conversion of raw MS/MS data to open formats and standards; support for searching MS/MS spectra with various search engines, including the bundled X!Tandem engine ( as well as Sequest, Mascot, Phenyx, OMSSA, and others; conversion of search engine results to a uniform open format; statistical validation of peptide identifications with PeptideProphet; statistically validated protein identification with ProteinProphet; quantitative proteomics (SILAC, ICAT, ITRAQ, etc) with XPRESS, ASAPRatio, and Libra; and tools for visualization of and interaction with results. Here we present recent updates to the software tools to improve analysis functionality and user experience. New Developments for Open-Source Shotgun Proteomics Analysis with the Trans-Proteomic Pipeline Joshua Tasman 1, Luis Mendoza 1, David Shteynberg 1, James Eddes 2, Ning Zhang 1, Nichole King 1, Chee-Hong Wong 3, Brian Pratt 4, Patrick Pedrioli 2, Henry Lam 1, Eric Deutsch 1, Jimmy Eng 5, Xiao-jun Li 6, Alexey Nesvizhskii 7, Andrew Keller 8, and Ruedi Aebersold 2 1 Institute for Systems Biology, Seattle, WA; 2 Institute for Molecular Systems Biology (ETH), Zurich, Switzerland, 3 Bioinformatics Institute, Singapore; 4 Insilicos LLC, Seattle, WA; 5 University of Washington; 6 Homestead Clinical; 6Rosetta Biosoftware, Seattle, WA; Seattle, WA; 7 Department of Pathology, University of Michigan, Ann Arbor, MI; 8 Rosetta Biosoftware, Seattle, WA Methods Improvements from the Insilicos TPP version (IPP) have been merged to the TPP, and the build system has been improved to allow native windows deployment. Significant speed improvements have already been seen from moving away from a unix emulation layer (Cygwin) based distribution. True versus false-positive peptide ID discrimination has been improved through addition of the decoy, retention time, and high-mass-accuracy PeptideProphet models, as well as through using a semi- parametric distribution for describing peptide population distributions. The open-source search engine X!Tandem is now bundled with the TPP, allowing us to provide a complete and free solution for proteomics analysis. Work has begun on integrating the OMSSA open-source engine as well. Results and Conclusions This work is performed under the Seattle Proteome Center, suppored by NHLBI contract No. N01-HV We would also like to thank all Aebersold Lab and external developers who have contributed to this project. End-to-End MS/MS Proteomics Analysis with the TPP mzXML document mzML document pepXML document protXML document Spectral search engine results file MS/MS data: Conversion from proprietary (vendor) to open formats Choice of common open formats: mzXML (SPC/ISB) or mzML (HUPO PSI, SPC/ISB, and others– see flagship poster 001) Converters for Thermo Xcalibur (.raw), Waters MassLynx (.raw directory), ABI/MDS Analyst (.wiff), Agilent MassHunter (.d directory) and others Spectal search engine output: Conversion to open formats Supports most common commercial and open-source data formats: Sequest, Mascot, X!Tandem, SpectraST, Pheynx and others Peptide ID Validation with PeptideProphet Majority of peptide assignments by search engines are incorrect Manual validation is time- consuming, subjective and impossible to compare Applies statistical principles to automate peptide validation Validates peptide assignments by Sequest, Mascot, X!Tandem, Phenyx, SpectraST, and others Robust: learns distributions of search scores and peptide properties among correct and incorrect results Accurate: probabilities are true measures of confidence incorrect correct model results PeptideProphet performs post-search processing to compute probabilities that peptide assignments from MS/MS spectra are correct. raw MS/MS data file Quantitation Evaluate peptide ratio from multiple charge states (ASAP) Apply statistical methods to evaluate protein ratios and standard deviations Quantify ICAT, SILAC, and many other samples Libra performs quantification on MS/MS spectra that have multi-reagent labeled (4 or 8 channel) peptides, such as iTRAQ labeled samples. ASAPRatio and XPRESS calculate relative abundance of proteins labeled with heavy and light (2 channel) isotope tags: Compute protein ratios automatically and accurately Specta Information (mzXML/mzML/ mzData Document) Downstream analysis with Other TPP-compatible SPC tools Data storage and mining with PeptideAtlas and SBEAMS (Systems Biology Experiment Analysis Management System) : Data products of the TPP analysis pipeline are imported into the database Data exploration, annotation, and correlation with other experiments can all be managed Interface allows flexible analysis of the data: analysis across multiple experiments Additional visualization, statistical analysis, and exploration tools enabling investigation of biological meaning and significance with Gaggle-compatible tools such as Cytoscape (network visualization), the stats package R, and the PIPE (Protein Inference and Property Explorer) pepXML document protXML document The TPP is constantly improved with new functionality. Highlights of major recent developments include: Build system improvements and native Windows distribution: Insilicos had previously released their own version of the TPP (the "IPP".) In order to combine efforts more efficiently, Insilicos has integrated their customizations into the main TPP distribution. The TPP build system has been improved to allow a native windows distribution, allowing for significant performance improvements as well as ease of installation Implementation of raw-to-mzML data converters and full support for parsing mzML throughout the TPP; Implementation of vendor MS/MS-to-mzML converters and full support for mzML input; PeptideProphet, the TPP module for peptide ID validation, has been updated with additional modeling capabilities to compare observed retention time vs. calculated purported peptide hydrophobicity. Additionally a high-mass-accuracy model improves discrimination of IDs with data from newer instruments. Decoy database entries can now be taken advantage of in distribution modeling. A semi-parametric distribution model allows better discrimination of true and false-positive results; Inclusion of X!Tandem (from the GPM project) for a complete, end-to-end MS/MS searching and validation solution; Upcoming multi-experiment data integration with iProphet (see Poster TPU 669); Spectral library searching with SpectraST Protein Identification ProteinProphet takes as input a list of peptides and probabilities and infers the proteins in the sample: Groups peptides according to their corresponding protein Adjusts individual peptide probabilities to account for new protein grouping information Finds simplest list of proteins sufficient to explain all observed peptides (Occam’s razor approach) Computes accurate protein probabilities Integrates with protein-level quantiation results Allows meaningful comparison between results of different experiments sensitivity error rate minimum probability threshold percentage