Sandra Orchard Introduction to Molecular Interaction Data Master headline.

Slides:



Advertisements
Similar presentations
Network inference from repeated observations of node sets Neil Clark, Avi Ma'ayan.
Advertisements

5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title International Molecular Exchange Consortium - IMEx Sandra Orchard EMBL-EBI.
MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
Microarray Data Analysis Day 2
Biological networks: Types and sources Protein-protein interactions, Protein complexes, and network properties.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
The IntAct Database Sandra Orchard & Birgit Meldal.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions and Pathways Sandra Orchard EMBL-EBI
IntAct Janna Hastings and James Watson EBI Bioinformatics Roadshow ILRI, Nairobi (2-3 March 2011) UCT, Cape Town (7-8 March 2011) A database of Molecular.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
Computational analysis of protein-protein interactions for bench biologists 2-8 September, Berlin Protein Interaction Databases Francesca Diella.
Biological networks: Types and sources Protein-protein interactions, Protein complexes, and network properties.
Gene Ontology John Pinney
Session outline 1.Standards and the problem of data integration Example: PSICQUIC and the PSICQUIC game 2.Introduction to ontologies. Exploring the Gene.
Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis Jonsson.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
Biological networks: Types and origin Protein-protein interactions, complexes, and network properties Thomas Skøt Jensen Center for Biological Sequence.
1 Protein-Protein Interaction Networks MSC Seminar in Computational Biology
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis- part 2.
Evidence for dynamically organized modularity in the yeast protein- protein interaction network Han, et al
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
Biological networks: Types and origin
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Affinity chromatography/mass spec Bait protein GST Page 252.
Protein Classification A comparison of function inference techniques.
Chapter 4: Protein Interactions and Disease
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Ch10. Intermolecular Interactions and Biological Pathways
Overview  Introduction  Biological network data  Text mining  Gene Ontology  Expression data basics  Expression, text mining, and GO  Modules and.
Protein analysis and proteomics (Part 2 of 2). Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan.
EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI
Presentation for Shamir group meeting Interactome under construction: protein-protein interaction and pathway databases 5/1/2011 Based on the papers: Protein-protein.
Interactions and more interactions
Copyright OpenHelix. No use or reproduction without express written consent1.
Networks and Interactions Boo Virk v1.0.
Network Biology Presentation by: Ansuman sahoo 10th semester
Finish up array applications Move on to proteomics Protein microarrays.
DAS for Molecular Interactions Hagen Blankenburg.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Copyright OpenHelix. No use or reproduction without express written consent1.
IntAct- An Open Standard and Software for Protein-Protein Interaction Data Henning Hermjakob 1, Luisa Montecchi-Palazzi 9, Chris Lewington 1, Dan Wu 1,
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Introduction to IntAct Pablo Porras Millán, IntAct
Copyright OpenHelix. No use or reproduction without express written consent1.
TAP(Tandem Affinity Purification) Billy Baader Genetics 677.
A curated database of biological pathways.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
LECTURE 2 1.Complex Network Models 2.Properties of Protein-Protein Interaction Networks.
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
Proteomics, the next step What does each protein do? Where is each protein located? What does each protein interact with, if anything? What role does it.
Biol 729 – Proteome Bioinformatics Dr M. J. Fisher - Protein: Protein Interactions.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
GO based data analysis Iowa State Workshop 11 June 2009.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Predicting Protein Function Annotation using Protein- Protein Interaction Networks By Tamar Eldad Advisor: Dr. Yanay Ofran Computational Biology.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
1 Lesson 12 Networks / Systems Biology. 2 Systems biology  Not only understanding components! 1.System structures: the network of gene interactions and.
Network Analysis Goal: to turn a list of genes/proteins/metabolites into a network to capture insights about the biological system 1.Types of high-throughput.
High throughput biology data management and data intensive computing drivers George Michaels.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Protein-protein Interactions
Biological networks CS 5263 Bioinformatics.
Interrogation of cross talk between proteins and gene regulatory networks in breast cancer Chambers, Teressa Lee Hiren Karathia Sridhar Hannenhalli.
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
Presentation transcript:

Sandra Orchard Introduction to Molecular Interaction Data Master headline

Living cells contain crowded and diverse molecular environments Proteins constitute ~30% of E. coli and ~5% of yeast cytoplasm by weight ~2000 protein types are co-expressed co-localized in yeast cytoplasm

3 Example of a PPI Network Nodes – proteins Edges – interactions >80% of proteins are all connected in one giant cluster of PPI network Small-world effect median network distance – 6 steps

4 Why is it useful to study PPI networks? Proteins are the workhorses of cell, enzymes, structural proteins, signal transduction, transport, transcription, translation and degradation, traversing membranes … all done as a functional/regulatory network. By mapping these interactions we can map cellular pathways, their interconnectivities and their dynamic regulation One way to predict protein function is through identification of binding partners – Guilt by Association If the function of at least one of the components with which the protein interacts is known, that should let us assign its function(s) and the pathway(s)

5 Why is it useful to study the structure of PPI networks? Common properties of biological networks – can use these to understand cell behaviour Can help us relate network structure to biological function Understand a protein’s relative position in a network Correlate conserved functional modules with protein complexes

Properties of Networks Scale-free effect - the majority of nodes in a scale-free network have only a few connections to other nodes, whereas some nodes (hubs) are connected to many other nodes in the network Master headline

Properties of networks Scale-free networks are stable - if failures occur at random and the vast majority of proteins are those with a small degree of connectivity, the likelihood that a protein hub would be affected is small. Even if a hub-failure occurs, the network will generally not lose its connectedness, due to the remaining hubs. However, if we lose a few major hubs from the network, the network is turned into a set of rather isolated graphs. Many cancer-linked proteins are hub proteins Master headline

Properties of networks Scale free networks are resistant to random failure but vulnerable to targeted attack, specifically against hubs. This property has been held to account for the robustness of biological networks to perturbations like mutation and environmental stress. One model of a proteome views date hubs as global or ‘higher level’ connectors between modules and party hubs function inside modules at a ‘lower level’ of the organization Master headline

Properties of networks Given the current limited coverage levels (bias towards small soluble cytoplasmic proteins) and variable quality of interaction data, the observed scale-free topology of existing interactome maps cannot be confidently extrapolated to complete interactomes. There is a real need to increase coverage through further experimentation and increased data input into interaction databases Master headline

Why are there so many issues with interaction data? 1. Wide variety of methods for demonstrating molecular interactions – all have their strengths and weaknesses 2.No single method accurately defines an interaction as being a true binary interaction observed under physiological conditions

Interaction Detection Methods 1. Complementation assays Function of the readout mechanism can be split into two independent parts and fused to two proteins of interest – readout is only reconstituted when two halves are brought in close proximity by fusion protein binding Typified by Y2H Advantages -Very high numbers of coding sequences assayed in a relatively simple experiment -Wide variety of interactions detected and characterized following one single commonly used protocol - Binding sites can be accurately mapped - In vivo assay.

1. Complementation assays Disadvantages Technical - Spurious activation of reporter genes, e.g. self activators - Use of multiple reporter genes or swap the two domains in the two proteins - Mutational events leading to an increase in the rate of transcription - Fusion to irrelevant small peptides - The cDNA for the interacting protein might not be represented in the library (or under-represented) - No expression of the fusion protein - Insufficient folding and/or stability of a fusion protein Biological - Possibility of indirect interactions - yeast proteins may act as a bridge - Subcellular location: proteins are brought to proximity in the nucleus. This may not be the physiological location of one of the proteins resulting in proteins brought into proximity which would not normally co-express/locate - Different environment in yeast and mammalian cells – loss of physiological control - Absence of the required post-translational modifications - Toxicity of fusion proteins

2. Affinity-based Assays Techniques which depend upon the strength of the interaction between two entities. Typified by affinity chromatography, pulldown & coimmunopreciptiation Advantages - Proteins can be in their native state and at their native concentration (unless transfected) - transfection/prior isolation of proteins allow binding sites to be mapped, and demonstration of binary interactions

2. Affinity-based Assays Disadvantages Technical - Participant determination more problematic. Ab detection depends on prior knowledge and good quality reagents. Mass spec determination still of variable quality Biological - Mixing of compartments during cell lysis/purification, i.e. interacting proteins might not be in the same cellular compartment - Does not indicate whether interaction is direct (except when in vitro) - Can pulldown entire pathways but very transient, weak interactions probably missed

3. Physical methods Depends on physical properties of molecules to enable measurement of an interaction Typified by X-ray crystallography Advantages - high quality data - can be measurable (e.g. SPR - can be very detailed

3. Physical methods Disadvantages Technical - Tend to rely on large amounts of purified proteins - Tend not to work well on hydrophobic proteins e.g. transmembrane - Very expensive, very low-throughput Biological - In vitro techniques, proteins loose all physiological regulation

4. Enzymatic Assays Enzyme/substrate reaction taken as evidence of interaction Advantages - One of the few ways of identifying transient interactions Disadvantages - Can only use in vitro data, too many unknowns if performed in whole cell - many enzymes promiscuous in vitro - requires purified protein

5. Co-localization Master headline Advantages – the only proof that 2 molecules are expressed in same time and space under ‘normal’ conditions Disadvantage – no actually proof of a physical interaction

Molecular Interactions All data artefactual to a greater or lesser extent Interaction determinations build a degree of confidence in an interaction Users need to understand this before attempting to interpret molecular interaction experiments

Why do we need interaction databases Issues with all interaction data – true picture can only be built up by combining data derived using multiple techniques, multiple laboratories Problematic for any bench researcher to do – issues with data formats, molecular identifiers, sheer volume of data Molecular interaction databases publicly funded to collect this data and annotate in a format most useful to researchers

Interaction Databases Deep Curation IntAct – active curation, broad species coverage, all molecule types MINT – active curation, broad species coverage, PPIs – interactions now in IntAct DIP – active curation, broad species coverage, PPIs MatrixDB – active curation, extracellular matrix molecules only MPACT - no curation, limited species coverage, PPIs BIND – ceased curating 2006/7, broad species coverage, all molecule types – information becoming dated Shallow curation BioGRID – active curation, limited number of model organisms HPRD – Ceased curation 2010, human-centric, modelled interactions *InnateDB - active curation – interactions involved in innate immunity *I2D – active curation – PPIs involved in cancer

Why are data standards essential Prior to 2003, many databases= many formats. Onus on the user to reformat when merging data File conversion inevitably leads to data loss Many formats compromised tool development – each tool developed tended to be database specific 22

23 Community standard for Molecular Interactions XML schema and detailed controlled vocabularies Jointly developed by major data providers: BIND, CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono, U. Bielefeld, U. Bordeaux, U. Cambridge, and others Version 1.0 published in February 2004 The HUPO PSI Molecular Interaction Format - A community standard for the representation of protein interaction data. Henning Hermjakob et al, Nature Biotechnology 2004, 22, Version 2.5 published in October 2007 Broadening the Horizon – Level 2.5 of the HUPO-PSI Format for Molecular Interactions; Samuel Kerrien et al. BioMed Central PSI-MI XML format

24 Collecting and combining data from different sources has become easier Standardized annotation through PSI-MI ontologies Tools from different organizations can be chained, e.g. analysis of IntAct data in Cytoscape. PSI-MI XML benefits Home page

Controlled vocabularies

IMEx There are many databases providing large amounts of data BUT IntAct, DIP, MINT provide original curation, many databases (IRefIndex, APID, I2D, String..) do not curate but merge data from curation resources. IntAct data is repeated in multiple other resources Curation databases formed a consortium to provide users with a single, non-redundant dataset 26

IMEx Independent molecular interaction resources all separately funded and with their own curation priorities Spent several years developing Common curation standards for detailed curation and a joint curation manual Common data formats – all data downloadable in PSI formats (PSI-MI MITAB/XML) IMEx is an instance of PSICQUIC, specific records are tagged as part of the IMEx set and only these records are searchable and downloadable on the website. 27

IMEx Coordinated & non-redundant curation – databases ensure that each paper is curated once, and once only by a single member database. Each paper is registered with a central database, IMEx Central, which ensures curation is not repeated by a second database 28

IMEx Common accession number space – all submitted data gets an IMEx ID and is searchable on the IMEx site, the site of submission and multiple member database sites 29

IMEx partners IntAct – Active DIP – Active MINT – Active MatrixDB – Active I2D - Active Innate DB – Active Molecular Connections – Active UniProtKB – Active UCL-BHF - Active MBInfo – Active (MPACT – Inactive) (BIND – Inactive) (MPIDB – Inactive) – data in IntAct PRIMESDB - Observer BioGRID - Observer

MBInfo

32

33

IMEx statistics 34 May 2014 – 311,141 binary interactions from 9010 publications

Curation levels 1. IMEx – as agreed by the consortium, detailed curation, full description of constructs with tags, binding sites etc. detailed as features as well as experimental info. 2. MIMIx (Minimal information…) – full experimental information but no details of the constructs 3. Minimal – no experimental detail

IMEx In production mode since February 2010 Since 3/2009 supported by the European Commission under PSIMEx, contract number FP7-HEALTH , with additional partners Vital-IT, Nature, Wiley, BiaCore (GE), U. Maryland, CSIC, TU Munich, MIPS, SCBIT (Shanghai) 36

Master headline ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?