5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI
2 Why is it useful to study PPI interactions and networks? Proteins are the workhorses of cell – and all activities are controlled through interactions with other molecules. To understand the biology of a single protein, you have to study its interacting partners One way to predict protein function is through identification of binding partners – Guilt by Association. If the function of at least one of the components with which the protein interacts is known, that should let us assign its function(s) and the pathway(s) Hence, through the intricate network of these interactions we can map cellular pathways, their interconnectivities and their dynamic regulation
Why are there so many issues with interaction data? 1.Wide variety of methods for demonstrating molecular interactions – all have their strengths and weaknesses 2.No single method accurately defines an interaction as being a true binary interaction observed under physiological conditions
Why do we need interaction databases Issues with all interaction data – true picture can only be built up by combining data derived using multiple techniques, multiple laboratories Problematic for any bench researcher to do – issues with data formats, molecular identifiers, sheer volume of data Molecular interaction databases publicly funded to collect this data and annotate in a format most useful to researchers
Interaction Databases Deep Curation IntAct – active curation, broad species coverage, all molecule types MINT – active curation, broad species coverage, PPIs DIP – active curation, broad species coverage, PPIs MPACT - ? curation, limited species coverage, PPIs MatrixDB – active curation, extracellular matrix molecules only BIND – ceased curating 2006/7, broad species coverage, all molecule types – information becoming dated Shallow curation BioGRID – active curation, limited number of model organisms HPRD – active curation, human-centric, modelled interactions MPIDB – active curation, microbial interactions
6 Engineering 1850 Nuts and bolts fit perfectly together, but only if they originate from the same factory Standardisation proposal in 1864 by William Sellers It took until after WWII until it was generally accepted, though … Proteomics 2003 Proteomics data are perfectly compatible, but only if they are from the same lab / database / software “Publish and vanish” by data producers Collecting all publicly available data requires huge effort Urgent need for standardisation
What constitutes a PSI standard Documents that make up each individual standard Minimal reporting requirements => MIAPE document XML Data exchange format Domain-specific controlled vocabulary
MIMIx
9 Community standard for Molecular Interactions XML schema and detailed controlled vocabularies Jointly developed by major data providers: BIND, CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono, U. Bielefeld, U. Bordeaux, U. Cambridge, and others Version 1.0 published in February 2004 The HUPO PSI Molecular Interaction Format - A community standard for the representation of protein interaction data. Henning Hermjakob et al, Nature Biotechnology 2004, 22, Version 2.5 published in October 2007 Broadening the Horizon – Level 2.5 of the HUPO-PSI Format for Molecular Interactions; Samuel Kerrien et al. BioMed Central PSI-MI XML format
10 Collecting and combining data from different sources has become easier Standardized annotation through PSI-MI ontologies Tools from different organizations can be chained, e.g. analysis of IntAct data in Cytoscape. PSI-MI XML benefits Home page
Controlled vocabularies
Additional benefits MITAB format – released 2007 by popular demand. Tab-delimitated organisation of data. PSIQUIC – query access that runs across all interaction databases using PSI formats PSISCORE – common scoring mechanism in development Access to R Bioconductor statistics packages Growth industry in “composite” databases – do no new curation but merge the output of resources producing data in PSI format. IMEx
Consortium of molecular interaction databases dedicated to producing high quality, annotated data, curated to the same standards Data will be curated once at a single centre then exchanged between partners Users need only go to a single site to obtain all data
14 1.Publicly available repository of molecular interactions (mainly PPIs) - ~300K binary interactions taken from >5,300 publications (May 2012) 2.Data is standards-compliant and available via our website, for download at our ftp site or via PSICQUIC 3.Provide open-access versions of the software to allow installation of local IntAct nodes. IntAct goals & achievements ftp://ftp.ebi.ac.uk/pub/databases/intact
Master headline “Lifecycle of an Interaction” Publication (full text) Sanity Checks (nightly) IntAct Curation CVs curator report Curation manual. reject Super curator annotate p1 p2 I exp IMEx MatrixDB Mint DIP Public web site FTP site accept check
16 UniProt Knowledge Base Interactions can be mapped to the canonical sequence….. to splice variants.... or to post- processed chains
Relationship with UniProtKB Master headline Protein sequence Data filters Other IMEx databases High confidence PPIs In place Early 2012 Interaction curation Other DBs
18 Data model Support for detailed features i.e. definition of interacting interface Interacting domains Overlay of Ranges on sequence:
19 How to deal with Complexes Some experimental protocol do generate complex data: Eg. Tandem affinity purification (TAP) One may want to convert these complexes into sets of binary interactions, 2 algorithms are available:
EBI is an Outstation of the European Molecular Biology Laboratory. Performing and visualing a Simple Search EBI Walthrough May 2009 EBI Data, Standards and Tools
21 IntAct – Home Page
Performing a Simple Search 22
23 Visualizing - networkView From search to networkView…
Extend and Visualise your Search 24
25 Visualizing - networkView
Cytoscape Web Cytoscape Web - web-based network visualization tool Modeled after Cytoscape – open-source, interactive, customizable and easily integrated into web sites. Contains none of the plugin architecture functionality of Cytoscape 26
Master headline Visualization Opening the network in Cytoscape…
Master headline Visualization Applying a better graph layout…
Master headline Visualization Applying a better graph layout…
Master headline Visualization Highlighting network properties…
Master headline Visualization Highlighting network properties…
Master headline Visualization Highlighting network properties…
Master headline Visualization Highlighting network properties…
Cytoscape Plugins 34
EBI is an Outstation of the European Molecular Biology Laboratory. Exploring a single interaction in more depth
Interaction detail 36 First search from the home page… Choice of UniProtKB or Dasty View Details of interaction UniProt Taxonomy PubMed/IMEx ID
Detail of interaction 37 Expansion method Details of interaction Interaction Score
Interaction Score All evidences of Protein A interacting with Protein B are clustered. Evidences are scored according to a. Interaction detection method b. Interaction type c. Number of publications interaction has been observed in Score is normalised on 0-1 scale Low score – low confidence interaction High score – high confidence interaction 38
Changing the tabular view 39
Participant information 40 Search result for ‘RAD1’
Interaction detail 41 First search from the home page… Details of interaction
42 Viewing Interaction Details Additional information
Interaction Details 43
IntAct – Home Page-Quick Search 44
Advanced search Filtering options Add more filtering options
Ontology search 46
47 Searching with MIQL First search from the home page… Using the Molecular Interaction Query Language (MIQL), one can also build complex queries List of terms one can query on :
48 Browsing – Molecule View Binary view of o60671_human
49 Browsing – extending your search
50 Interactions, Pathways and Networks Network analysis Analyzing protein-protein interaction networks. Koh GC, Porras P, Aranda B, Hermjakob H, Orchard SE PMID: J Proteome Res [2012 (11) ] page info:
51 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?