EBI is an Outstation of the European Molecular Biology Laboratory. Proteomics repositories PRIDE team, Proteomics Services Group PANDA group European Bioinformatics.

Slides:



Advertisements
Similar presentations
Sandra Orchard EMBL-EBI Molecular Interactions
Advertisements

David Campbell 1,, Eric Deutsch 1, Henry Lam 1, Hamid Mirzaei 1, Paola Picotti 2, Jeff Ranish 1, Ning Zhang 1, and Ruedi Aebersold 1,2,3 1.Institute for.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
MS-Viewer – A Web Based Spectral Viewer For Database Search Results Peter R. Baker 1, Alma L. Burlingame 1 and Robert J. Chalkley 1 1 Mass Spectrometry.
EBI Proteomics Services Team – Standards, Data, and Tools for Proteomics Henning Hermjakob European Bioinformatics Institute SME forum 2009 Vienna.
EBI is an Outstation of the European Molecular Biology Laboratory. PRIDE associated tools: Practical exercise 1 PRIDE team, Proteomics Services Group PANDA.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Global Alignment and Collaboration Jo
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Archives and Information Retrieval
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
European Life Sciences Infrastructure for Biological Information “BILS-ProteomeXchange integration using EUDAT resources” ELIXIR-Pilot.
Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit
EBI is an Outstation of the European Molecular Biology Laboratory. MS Identification Dr. Juan Antonio VIZCAINO PRIDE Group coordinator PRIDE team, Proteomics.
Build Results Plasma-only Build Empirical Observability Scores Eric W. Deutsch, Nichole L. King, Jimmy K. Eng, Alexey I. Nesvizhskii, David S. Shteynberg,
OMICS Group Contact us at: OMICS Group International through its Open Access Initiative is committed to make genuine and.
Daehee Hwang Leroy Hood Institute for Systems Biology.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
How to assure MIAPE compliance of the data using the ProteoRed MIAPE Extractor tool HUPO-PSI meeting - Liverpool (15th April 2013) Salvador Martínez-Bartolomé.
Application of Hadoop to Proteomic Searches Steven Lewis 1, Attila Csordas 2, Sarah Killcoyne 1, Henning Hermjakob 2, John Boyle 1 1 Institute for Systems.
Data Standards Submission 1 st CHr-16 Workshop. Miraflores de la Sierra August, 28 th -29 th 2012 Alberto Medina.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
IntAct- An Open Standard and Software for Protein-Protein Interaction Data Henning Hermjakob 1, Luisa Montecchi-Palazzi 9, Chris Lewington 1, Dan Wu 1,
EMBL- EBI Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK Standards and infrastructure for managing experimental metadata Philippe Rocca-Serra,
EMBL-EBI EMBL-EBI EMBL-EBI What is the EBI's particular niche? Provides Core Biomolecular Resources in Europe –Nucleotide; genome, protein sequences,
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Data Integration and Management A PDB Perspective.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
Standards for proteomics: The HUPO Proteomics Standards Initiative (HUPO PSI) Public Repository for Mass spectrometry spectral.
Proteomics databases for comparative studies: Transactional and Data Warehouse approaches Patricia Rodriguez-Tomé, Nicolas Pinaud, Thomas Kowall GeneProt,
Johannes Griss PSI Meeting Heidelberg, April 2011 EBI is an Outstation of the European Molecular Biology Laboratory. mzTab Proposal for.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
EBI is an Outstation of the European Molecular Biology Laboratory. PRIDE centric exercise: BioMart interface PRIDE team, Proteomics Services Group PANDA.
Construction of Shanghai Life Science & Bio-technology Service Platform for Data Access and Sharing International Workshop on Strategies Presentation of.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
The Protein Identifier Cross-Reference (PICR) service.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
PeptideShaker Overview What makes PeptideShaker special? - proteomics: shaken, not stirred! 1)Free, open-source and platform independent! 2)Focus on user-friendliness.
The Earth Information Exchange. Portal Structure Portal Functions/Capabilities Portal Content ESIP Portal and Geospatial One-Stop ESIP Portal and NOAA.
CoLIMS progress Computational Omics and Systems Biology (CompOmics) Group Niels Hulstaert
Reproducibility and Big (Omics) Data Henning Hermjakob Team Leader Proteomics Services EMBL-EBI
ProteomeXchange: Data Deposition … but where? Questions about submission: Which repository should I submit to? Should I submit to more than one? Do I need.
Democratization of ‘Omics Data Availability and Review Robert Chalkley UCSF Data Management Editor - MCP.
CPAS Comparative Proteomics Analysis System Adam Rauch LabKey Software
Algorithms and Computation: Bottom-Up Data Analysis Workflows
ELIXIR Core Data Resources and Deposition Databases
Bottom-Up Proteomics Data collection
Building a community for genome and proteome annotation
생물정보학 Bioinformatics.
Creation of assays using repositories
Functional Annotation of the Horse Genome
Introduction to Bioinformatics
Reportnet 3.0 Database Feasibility Study – Approach
Presentation transcript:

EBI is an Outstation of the European Molecular Biology Laboratory. Proteomics repositories PRIDE team, Proteomics Services Group PANDA group European Bioinformatics Institute Hinxton, Cambridge United Kingdom Dr. Juan Antonio VIZCAINO PRIDE Group coordinator

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Overview … Why sharing proteomics data? PRIDE in detail… Other proteomics repositories ProteomeXchange consortium

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno THE RATIONALE BEHIND SHARING PROTEOMICS DATA

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Key technologies in modern biology Genomics Transcript- omics Proteomics DNA sequencing is a central technology for studying DNA Microarrays and RNA-seq are a central technology for studying RNA Mass spectrometry is a central technology for studying the proteome.

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Need of data sharing in the proteomics field

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Proteomics data sharing: why? 1) Data producers are not always the best data analysts Sharing of data allows analysts access to real data, and in turn allows better analysis tools to be developed 2) Meta-analysis of data can recycle previous findings for new tasks Putting findings in the context of other findings increases their scope 3) Sharing data allows independent review of the findings When actual replication of an experiment is often impossible, a re- analysis or spot checks on the obtained data become vitally important 4) Direct benefit for the field: fragmentation models, spectral libraries,...

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno How do we make this all happen? Journal guidelines Journal guidelines heavily influence the decisions taken by authors; by first requesting and subsequently mandating data submission to established repositories, they provide an important stick. Funder support and guidelines Funders contribute both sticks and carrots. The sticks lie in the grant application guidelines; they can require a plan for data management and dissemination. The carrot is in providing specific funding for this aspect of science. Data repositories The availability of reliable, freely available repositories is key; submission thresholds should be kept low and added value needs to be provided. Furthermore, feedback loops need to be established in order to ensure that accumulated data flows back to the user community. Repositories thus provide mostly carrots.

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno WHAT CAN BE STORED IN PROTEOMICS REPOSITORIES

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno MS proteomics: peptide IDs and protein IDs proteins MS/MS spectra peptides Search engine sequence database TDMDNQIVVSDYAQ MDR LFDQAFGLPR AKPLMELIER DESTNVDMSLAQR DIVVQETMEDIDK NGMFFSTYDR GTAGNALMDGASQL IPI IPI IPI IPI IPI IPI IPI IPI

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Types of information stored 1) Original experimental data recorded by the mass spectrometer (primary data)

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Primary data Binary data mzData mzXML mzML XML-based files.dta,.pkl,.mgf,.ms2 Peak lists

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Types of information stored 1) Original experimental data recorded by the mass spectrometer (primary data) 2) Identification results inferred from the original primary data

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Peptide and Protein Identifications mzIdentML, mascot.dat, sequest.out, SpectrumMill.spo pep.xml, prot.xml Only qualitative data!

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Types of information stored 1) Original experimental data recorded by the mass spectrometer (primary data) 2) Identification results inferred from the original primary data 3) Experimental and technical metadata

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno TOFT.O.F. time of flight time-of-flight Controlled Vocabularies (CVs) Term Synonyms

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno is-a mass analyzer part-of mass spectrometer TOFT.O.F. time of flight time-of-flight Relationships between CV terms

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno CVs, ontologies (here: PSI-MOD)

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Types of information stored 1) Original experimental data recorded by the mass spectrometer (primary data) 2) Identification results inferred from the original primary data 3) Experimental and technical metadata 4) Quantification information

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Wide variety of quantitative techniques…

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Quantification techniques Label free Gel-based quantitation approaches -Different philosophies -Very heterogeneous data formats -Techniques not very well established Very problematic data for proteomics repositories

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Types of information stored 1) Original experimental data recorded by the mass spectrometer (primary data) 2) Identification results inferred from the original primary data 3) Experimental and technical metadata 4) Quantification information

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno PROTEOMICS REPOSITORIES

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Existing proteomics repositories Main public repositories: - PROteomics IDEntifications database (PRIDE) - Global Proteome Machine (GPMDB) - Peptide Atlas - Tranche Smaller scale repositories, more specialized: Among others: Human Proteinpedia, Genome Annotation Proteomics Pipeline (GAPP), MAPU, SwedCAD, PepSeeker, Open Proteomics Database, … Very diverse: different aims, functionalities, …

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Reprocesses data No reprocessing Editorial control No editorial control Limited annotation Detailed annotation ??539 million peptides302 million spectra ??63 million protein IDs9.2 million protein IDs ?? Other MS proteomics repositories No reprocessing No editorial control Limited annotation ?? Tranche

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno PeptideAtlas -Peptide identifications from MS/MS -Data are reprocessed using the popular Trans Proteomic Pipeline (TPP) - Uses PeptideProphet to derive a probability for the correct identification for all contained peptides

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno All peptides mapped to Ensembl using ProteinProphet (for human) Built by the Aebersold lab to help them find proteotypic peptides Provides proteotypic peptide predictions Limited metadata Great support for targeted proteomics approaches (SRM/MRM) PeptideAtlas

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno PeptideAtlas Builds available at present

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno GPMDB End point of the GPM proteomics pipeline, to aid in the process of validating peptide MS/MS spectra and protein coverage patterns.

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno GPMDB End point of the GPM proteomics pipeline, to aid in the process of validating peptide MS/MS spectra and protein coverage patterns. Data are reprocessed using the popular X!Tandem or X!Hunter spectral searching algorithm Also provides proteotypic peptides

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno GPMDB Powerful visualization features Provides very limited annotation with GO, BTO Some support to targeted approaches is available

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Tranche Peer-to-peer distributed filesystem (original name: the DFS) Meant to securely store, and conveniently deliver large amounts of data Provides a highly specialized, but much needed niche service Has already been used by PRIDE to store certain large files Very limited annotation (metadata is not mandatory)

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Reprocesses data No reprocessing Editorial control No editorial control Limited annotation Detailed annotation ??539 million peptides302 million spectra ??63 million protein IDs9.2 million protein IDs ?? Other MS proteomics repositories No reprocessing No editorial control Limited annotation ?? Tranche

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno The PRIDE database

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno MS proteomics: Shot-gun/bottom-up approaches MS analysis MS/MS analysis fragmentation PROTOCOLPROTOCOL peptides proteins sequence database

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno PRIDE database ( MS analysis MS/MS analysis fragmentation peptides proteins sequence database PRIDE stores: 1) Peptide IDs 2) Protein IDs 3) Mass spectra as peak lists 4) Valuable additional metadata

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno PRIDE: why is it there? Repository to support publications (proteomics MS derived data)

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Journal Submission Recommendations Journal guidelines recommend now submission to proteomics repositories:  Proteomics  Nature Biotechnology  Nature Methods  Molecular and Cellular Proteomics Funding agencies are enforcing public deposition of data to maximize the value of the funds provided

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno New guidelines from MCP for data deposition: For all proteins identified on the basis of ONE OR TWO unique peptide spectra, the ability to view annotated spectra for these identifications must be made available. This can be achieved in one of three ways: 1)Submission of spectra and search results to a public results repository that is equipped with a spectral viewer (e.g. PRIDE, Peptidome etc). This information will appear as a hyperlink in the published article… 2)Submission (with the manuscript) of spectra and search results in a file format that allows visualization of the spectra using a freely-available viewer. 1)Submission (with the manuscript) of annotated spectra in an ‘office’ or PDF format. MCP guidelines

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno PRIDE: why is it there? Repository to support publications (proteomics MS derived data) Source of proteomics data for other data resources PRIDE: reliable source of MS proteomics data for other resources

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Data content in PRIDE

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Data content in PRIDE

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno PRIDE data content Protein IDsPeptide IDs

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno THE LOOK OF PRIDE

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno PRIDE web interface – overview PART_OF

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno PRIDE web interface – experiment and protein

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno PRIDE web interface – mass spectra

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno PRIDE BioMart

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno PRIDE AND OTHER REPOSITORIES: ProteomeXchange

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno For sharing, superstructures must be built sequence databases (INSDC) EMBL DDBJ NCBI interactions IMEx IntAct BIND DIP MINT … mass spec ProteomeXchange PRIDE PeptideAtlas GPMDB Tranche … Often, multiple repositories will emerge more or less simultaneously in a particular field. By exchanging data, and by collaborating on data acquisition an increase in coverage as well as a more comprehensive dataset is obtained by each individual resource. Such superstructures do require additional infrastructure, however.

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno ProteomeXchange consortium Sharing proteomics data between existing proteomics repositories Includes PeptideAtlas, GPMDB, and PRIDE, with data sharing infrastructure provided by different members Submission guidelines document finalized, it was proven on three different datasets ProteomeXchange is primarily user-oriented: the idea is to provide a single point of submission, but multiple points of data visualization and analysis

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno ProteomeXchange Data workflow PeptideAtlas Raw files archive Large-scale submissions Other DBs (GPMDB, …) RawReprocessed UniProt EBI PRIDE Published Individual submissions Users

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno PX submission tool

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Support for quantitative data is now starting…

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno In the next talk… PRIDE Submission pipeline: PRIDE Converter PRIDE Inspector

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Do you want to know a bit more…?

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno The PRIDE Team: Who we are Attila Csordas David Ovelleiro Richard Côté Rui Wang Daniel Ríos Florian Reisinger Joe Foster Johannes Griss Jose A. Dianes Antonio Fabregat Yasset Perez-Riverol Juan Antonio Vizcaíno Henning Hermjakob

EBI Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno Questions?