10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé Bioinformatics support – ProteoRed.

Slides:



Advertisements
Similar presentations
DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen
Advertisements

PSI Mass Spectrometry Standards Working Group Summary HUPO PSI MS Standards Working Group.
Configuration management
Configuration management
Sandra Orchard EMBL-EBI Molecular Interactions
New tools for MIAPE Generation Emilio Salazar Doñate Bioinformatics Group CNB – CSIC.
MS-Viewer – A Web Based Spectral Viewer For Database Search Results Peter R. Baker 1, Alma L. Burlingame 1 and Robert J. Chalkley 1 1 Mass Spectrometry.
Alternate Software Development Methodologies
EBI Proteomics Services Team – Standards, Data, and Tools for Proteomics Henning Hermjakob European Bioinformatics Institute SME forum 2009 Vienna.
EBI is an Outstation of the European Molecular Biology Laboratory. PRIDE associated tools: Practical exercise 1 PRIDE team, Proteomics Services Group PANDA.
MIAPE Extractor Tutorial SHPP meeting, 28 Aug 2012 La Cristalera, Miraflores de la Sierra, Madrid Salvador Martínez de Bartolomé Izquierdo CNB-CSIC / ProteoRed.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design Third Edition.
Supplement 02CASE Tools1 Supplement 02 - Case Tools And Franchise Colleges By MANSHA NAWAZ.
FIGURE 5. Plot of peptide charge state ratios. Quality Control Concept Figure 6 shows a concept for the implementation of quality control as system suitability.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Daehee Hwang Leroy Hood Institute for Systems Biology.
LEVERAGING THE ENTERPRISE INFORMATION ENVIRONMENT Louise Edmonds Senior Manager Information Management ACT Health.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
This chapter is extracted from Sommerville’s slides. Text book chapter
S/W Project Management
Design and implementations of the new HUPO Proteomics Standards Initiative’s mass spectrometer output file standard format: mzML 1.0 Eric W Deutsch 1,
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Conclusions What’s next? * Implementation of additional input formats * Additional vendor support: As vendors become more open with their APIs for accessing.
Data standards from the Proteomics Standards Initiative Andy Jones University of Liverpool.
‘One Sky for Europe’ EUROCONTROL © 2002 European Organisation for the Safety of Air Navigation (EUROCONTROL) Page 1 VALIDATION DATA REPOSITORY Overview.
EUROCarbDB CCRC – Database for high quality mass spectrometry data Khalifeh Al Jadda 1, Haseeb Yousef 1, Kitae Myong 1, Srikalyan Swayampakula 1, David.
EuPA- EC Education Committee (EC) working plan.
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
EBI is an Outstation of the European Molecular Biology Laboratory. Proteomics repositories PRIDE team, Proteomics Services Group PANDA group European Bioinformatics.
Towards the Management of Information Quality in Proteomics David Stead University of Aberdeen.
Usability Issues Documentation J. Apostolakis for Geant4 16 January 2009.
How to assure MIAPE compliance of the data using the ProteoRed MIAPE Extractor tool HUPO-PSI meeting - Liverpool (15th April 2013) Salvador Martínez-Bartolomé.
Common parameters At the beginning one need to set up the parameters.
Data Standards Submission 1 st CHr-16 Workshop. Miraflores de la Sierra August, 28 th -29 th 2012 Alberto Medina.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Copyright OpenHelix. No use or reproduction without express written consent1.
EMBL- EBI Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK Standards and infrastructure for managing experimental metadata Philippe Rocca-Serra,
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Object-Oriented Software Engineering using Java, Patterns &UML. Presented by: E.S. Mbokane Department of System Development Faculty of ICT Tshwane University.
FuGE: A framework for developing standards for functional genomics Angel Pizarro Univesrity of Pennsylvania Andrew Jones University of Manchester.
XML Standards for Proteomics Data Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and.
Standards for proteomics: The HUPO Proteomics Standards Initiative (HUPO PSI) Public Repository for Mass spectrometry spectral.
Proteomics databases for comparative studies: Transactional and Data Warehouse approaches Patricia Rodriguez-Tomé, Nicolas Pinaud, Thomas Kowall GeneProt,
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
Representing Flow Cytometry Experiments within FuGE Josef Spidlen 1, Peter Wilkinson 2, and Ryan Brinkman 1 1 BC Cancer Research Centre, Vancouver, BC,
Johannes Griss PSI Meeting Heidelberg, April 2011 EBI is an Outstation of the European Molecular Biology Laboratory. mzTab Proposal for.
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
1 Registry Services Overview J. Steven Hughes (Deputy Chair) Principal Computer Scientist NASA/JPL 17 December 2015.
SEA-1 20 Nov 2014 CCSDS System Engineering Area (SEA): System Architecture WG (SAWG) Restart Peter Shames, SEA AD 20 Nov 2014.
Worldwide Protein Data Bank Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Taming the Big Data in Computational Chemistry #euroCRIS2015 Barcelona 9-11-XI-2015 Carles Bo ICIQ (BIST) -
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Democratization of ‘Omics Data Availability and Review Robert Chalkley UCSF Data Management Editor - MCP.
CPAS Comparative Proteomics Analysis System Adam Rauch LabKey Software
Protein identification by mass spectrometry The shotgun proteomics strategy, based on digesting proteins into peptides and sequencing them using tandem.
Protein identification by mass spectrometry The shotgun proteomics strategy, based on digesting proteins into peptides and sequencing them using tandem.
Chapter 1 The Systems Development Environment
Chapter 1 The Systems Development Environment
Overview PSI-PI activities
CVE.
Presentation transcript:

10. Standards in Proteomics MS bioinformatics analysis for proteomics Salvador Martínez de Bartolomé Bioinformatics support – ProteoRed Proteomics Facility, National Center for Biotechnology, Madrid

Index Need of standards in Proteomics HUPO-PSI –Organization –Standard data formats –MIAPEs PEFF: A Common Sequence Database Format in Proteomics PRIDE Standard data format converters

Index Need of standards in Proteomics HUPO-PSI –Organization –Standard data formats –MIAPEs PEFF: A Common Sequence Database Format in Proteomics PRIDE Standard data format converters

Proteomics data is often only made available as arbitrarily formatted PDF tables, carrying important limitations: Source data (mass spectra) are not made available No peer review validation possible Very little raw materials for testing innovative in silico techniques are available Automated (re-)processing of the identifications is impossible (eliminating objective technique comparison) Need of standards in Proteomics

Thoughts in Standards Bradshaw RA, Burlingame AL, Carr S, Aebersold R. Reporting protein identification data: the next generation of guidelines. Mol Cell Proteomics May;5(5): Wilkins et al. Guidelines for the next 10 years of proteomics. Proteomics Jan;6(1):4-8. Nature Biotechnology 2006, Nov: Editorial: Standards Operating Procedures Burgoon LD. The need for standards, not guidelines, in biological data reporting and sharing. Ball C. Are we stuck in standards? Nature Biotechnology: Planned focus issue and Community Consultation on Standards:

Need of standards in Proteomics Proteomics: No standardized reporting, not standard database submission Proteomics data is generated at a high rate, and lost at a high rate Experiments are repeated unnecessarily, the field advances slower than necessary

Need of standards in Proteomics Standards for: Exchange data Compare data Review data Reproduce results Store data

Index Need of standards in Proteomics HUPO-PSI –Organization –Standard data formats –CVs –MIAPEs PEFF: A Common Sequence Database Format in Proteomics PRIDE Standard data format converters

Index Need of standards in Proteomics HUPO-PSI –Organization –Standard data formats –CVs –MIAPEs PEFF: A Common Sequence Database Format in Proteomics PRIDE Standard data format converters

HUPO PSI Protein Standard Initiative

HUPO PSI Protein Standard Initiative Meetingshttp://

HUPO PSI Open community initiative Develop data format standards Data representation and annotation standards Involve data producers, database providers, software producers, publishers Protein Standard Initiative The Proteomics Standards Initiative (PSI) aims to define community standards for data representation in proteomics and to facilitate data comparison, exchange and verification Proteomics 2003, 3 (7): The proteomics standards initiative. Orchard,S., Hermjakob,H., Apweiler,R.

HUPO PSI structure Main unit is workgroup Gel Electrophoresis Molecular Interactions Sample Processing Mass spectrometry Proteomic Informatics (MS oriented) Protein Modifications Transversal activities One Steering Group Controlled vocabulary MIAPE guidelines

HUPO PSI structure Annual workshop, reporting activity at annual HUPO, conference calls, dedicated workshops No permanent funding, active members work on their “spare time” Website ( and mailing-lists PSI Document process Vizcaino, J.A., Martens, L., Hermjakob, H., Julian, R.K. and Paton, N.W. (2007) The PSI formal document process and its implementation on the PSI website. Proteomics 7:

HUPO PSI document process Community consultation at:

HUPO PSI structure

HUPO-PSI Project status

HUPO-PSI PSI deliverables Data formats MIML mzML AnalysisXML gelML giML spML MIAPE minimal reporting requirements One parent document - The minimum information about a proteomics experiment (MIAPE), Nature Biotechnology 25, (2007) MIAPE MI, MS, MSI, GE, GI, CC, CE, SP Formats (XML schema, instance docs, specification docs) Controlled Vocabularies MIAPE docs (representation and annotation standards)

Index Need of standards in Proteomics HUPO-PSI –Organization –Standard data formats –CVs –MIAPEs PEFF: A Common Sequence Database Format in Proteomics PRIDE Standard data format converters

Standard data formats for Experimental data: spectra, acquisition parameters, acquisition equipment,... Analyzed data: identifications, quantitations, data analysis software...

Standard data formats Experimental data: spectra, acquisition parameters, acquisition equipment,... mzXML 2.0 mzXML 1.05 mzXML 3.0 mzXML 4.0 mzXML 2.0 mzML 1.0 mzML: Released on June 1st, 2008 Seattle Proteome Center at the Institute for Systems Biology HUPO-PSI data format capturing peak list information. Its aim is to unite the large number of current formats (pkl's, dta's, mgf's,.....) into one It is NOT a substitute for the rawfile formats of the instrument vendors. Some vendors, if not all, will provide software transforming their raw files to that standards

Sample instance document mzML 1.0

Standard data formats for Experimental data: spectra, acquisition parameters, acquisition equipment,... Analyzed data: identifications, quantitations, data analysis software...

Standard data formats Analyzed data: identifications, quantitations, data analysis software... describes the results of identification and quantitation processes for proteins, peptides and protein modifications from mass spectrometry protXML pepXML AnalysisXML AnalysisXML: v1.0 – candidate (Dic 08) Seattle Proteome Center at the Institute for Systems Biology HUPO-PSI

Sample instance document AnalysisXML (beta)

Standard data formats Other data: XML data formatMIAPE GelMLMIAPE GE GelInfoMLMIAPE GI miXMLMIAPE MIMIX spMLMIAPE SP

proprie- tary format mass spectrometer B mass spectrometer A converter mzML search engine A search engine B analysisXML Public repository Standard data formats

Index Need of standards in Proteomics HUPO-PSI –Organization –Standard data formats –CVs –MIAPEs PEFF: A Common Sequence Database Format in Proteomics PRIDE Standard data format converters

Controlled Vocabularies The Controlled Vocabularies (CVs) of the Proteomic Standard Initiative (PSI) provide a consensus annotation system to standardize the meaning, syntax and formalism of terms used across proteomics, as required by the PSI Working Groups. Each PSI working group develop the CVs required by the technology or data type it aims to standardize, following common recommendations for development and maintenance. At the PSI meeting in Washington (Sept 06), it was decided that all PSI working groups should adopt the same CVs standardizing some overlapping concepts (units and resources).

TOFT.O.F. time of flight time-of-flight What is a CV? Term Synonyms Controlled Vocabularies

PSI CVs are composed of two documents: a design principle description the implementation of the CVs in OBO (Open Biomedical Ontologies) Developing CVs is a process of collecting, and if necessary defining terms. Every effort must be made to adopt and re-use existing ontologies or CVs where they exist, to avoid “re-inventing the wheel”.

Ontology Lookup Service The OLS provides a web service interface to query multiple ontologies from a single location with a unified output format.

Ontology Lookup Service

Index Need of standards in Proteomics HUPO-PSI –Organization –Standard data formats –CVs –MIAPEs PEFF: A Common Sequence Database Format in Proteomics PRIDE Standard data format converters

Sufficiency and practicability Unambiguous description of the experimental context Allow understanding of the results and their interpretation Sufficient to permit a critical evaluation In principle allow recreation of the work MIAPE: Minimum Information About a Proteomics Experiment Taylor, C.F., Paton, N.W., Lilley, K.S., Binz, P.A., Julian, R.K., Jr., Jones, A.R., Zhu, W., Apweiler, R., Aebersold, R., Deutsch, E.W., Dunn, M.J., Heck, A.J., Leitner, A., Macht, M., Mann, M., Martens, L., Neubert, T.A., Patterson, S.D., Ping, P., Seymour, S.L., Souda, P., Tsugita, A., Vandekerckhove, J., Vondriska, T.M., Whitelegge, J.P., Wilkins, M.R., Xenarios, I., Yates, J.R., 3rd and Hermjakob, H. (2007) The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 25:

It is: – Describing a list of information and data to provide when an experiment is reported (it is a content descriptor) Peptide sequence, scores, modifications, mass errors, etc. – Helping to assess quality control Number of replicates, expected error rate MIAPE guidelines

It is not: – Describing the way to run an experiment does not specify the use of a search engine in particular does not force the use of one protocol – Describing the data representation Use excel to create a table with these five following columns:… – Including any quality judgment need 30% sequence coverage to identify a protein “The absence of thorough validation of both analytical and biological results, including error analysis should result in rejection” “Authors should justify the use of a very small database or database that excludes common contaminants, since this may generate misleading assignments” MIAPE guidelines

MIAPE Gel Electrophoresis (GE) v1.4 MIAPE Gel Informatics (GI) v0.5 MIAPE Mass Spectrometry (MS) v2.22 MIAPE Mass Spectrometry Informatics (MSI) v0.8 MIAPE Column Chromatography (CC) v1.0 MIAPE Capillary Electrophoresis (CE) v0.7 MIAPE Sample Preparation and handling (SP) v0.2 MIAPE Molecular Interactions (MI) v1.1.2 MIAPE guidelines

Online tool to generate and store MIAPE documents

A MIAPE generator tool Fill all minimal information by hand Fill only some changes or new items by hand, and add automatically static information from previous MIAPE documents ProteoRed server

A MIAPE generator tool

A MIAPE generator tool

HUPO-PSI: MIAPE Gel Electrophoresis v1.2

Generate XML Generate report Delete documentEdit document

MIAPE Reports Generate report

MIAPE Reports

Index Need of standards in Proteomics HUPO-PSI –Organization –Standard data formats –MIAPEs PEFF: A Common Sequence Database Format in Proteomics PRIDE Standard data format converters

A Common Sequence Database Format in Proteomics P-A Binz, S Seymour, J Shofsthal, D Creasy, E Kapp Problem: interpretation of current fasta format by search engines: Protein identifiers Description Taxonomy Other annotation (PTMs, sequence variants, etc) Propose an alternative to the heterogeneous fasta format, ideally generated by the database providers, or alternatively via an accepted converter, to submit one single source sequence database to various search engines SwissProt and EBI already agreed on the principle Format proposal reached (not only for MS, flexible, extensible) PEFF: PSI Extended FASTA Format

A unified format for protein and nucleotide sequence databases to be used by sequence search engines and other associated tools (spectra library search tools, sequence alignment software, data repositories, etc). PEFF: PSI Extended FASTA Format Enables consistent extraction, display and processing of information such as protein/nucleotide sequence database entry identifier, description, taxonomy, etc. across software platforms. Allows the representation of structural annotations such as post-translational modifications, mutations and other processing events. Flat file that includes a header of meta data to describe relevant information about the database(s) from which the sequence has been obtained (i.e., name, version, etc). Sequence database providers are encouraged to generate this format as part of their release policy or to provide appropriate converters that can be incorporated into processing tools.

Index Need of standards in Proteomics HUPO-PSI –Organization –Standard data formats –MIAPEs PEFF: A Common Sequence Database Format in Proteomics PRIDE Standard data format converters

PRIDE – Protein Identification Database Turns publicly available data into publicly accessible data Protein identifications Experimental detail Peak lists Linkout to raw data Fully open source Fully open data Implementation of PSI standards as they are released

proprie- tary format mass spectrometer B mass spectrometer A converter mzML search engine A search engine B analysisXML Public repository PRIDE

PRIDE – Protein Identification Database...Tomorrow with Alberto Medina

Index Need of standards in Proteomics HUPO-PSI –Organization –Standard data formats –MIAPEs PEFF: A Common Sequence Database Format in Proteomics PRIDE Standard data format converters

msconvert (ProteoWizard): –From: mzML, mzXML, Thermo RAW, MGF –To: mzML, mzXML –Vendor format reading restrictions: Thermo RAW: Windows with XCalibur XDK installed

Standard data format converters ReAdW version 4.0.2: –From: Thermo RAW –Exports to: mzXML mzML (not yet updated to final mzML 1.0 standard; try msconvert) –Requires a valid installation of the Thermo XCalibur software system, as it relies on the XCalibur libraries.

Standard data format converters CompassXport : –From: analysis.baf (instrument families: APEX, micrOTOF, micOTOF-Q,...) analysis.yep (esquire/HCT instrument family) AutoXecute run for LCMaldi (instrument family: autoFlex, ultraFlex,...) fid files (flex instrument family) –Exports to: mzXML version 2.1 mzData, version 1.05 mzML in progress –Do not requires to install Bruker propietary software –Replace to mzBruker

Standard data format converters massWolf (1st july 08): –From: MassLynx native acquisition files –Exports to: mzXML –Requires installation of MassLynx software on the same computer –You must select the appropriate massWolf download to match the version of your MassLynx software (4.0 or 4.1).

Standard data format converters mzWiff (1st July 08): –From: Analyst native acquisition (.wiff) files –Exports to: mzXML –Requires installation of Analyst software

Standard data format converters T2DExtractor (Dec 07): –From: data from a SCIEX/ABI 4000 series MALDI TOFTOF instruments –Exports to: mzXML

Standard data format converters Trapper (17 th july 08): –From: Agilent MassHunter format (.d directories) –Exports to: mzXML –Requires Agilent's MHDAC software installed –This software will be included in the upcoming TPP distribution