The Complex Portal - relationship to Gene Ontology Sandra Orchard (IntAct)

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

Macromolecular complexes – A new Online Portal (under construction!) Birgit Meldal (IntAct)
Sandra Orchard EMBL-EBI Molecular Interactions
May A Database of human biological pathways Steve Jupe -
EBI Proteomics Services Team – Standards, Data, and Tools for Proteomics Henning Hermjakob European Bioinformatics Institute SME forum 2009 Vienna.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
The IntAct Database Sandra Orchard & Birgit Meldal.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
PRO and IntAct protein complexes Sandra Orchard PRO Meeting, June 19, 2014.
The Complex Portal: A ‘one-stop shop’ for protein complexes Birgit Meldal IntAct Curator
Gene Ontology John Pinney
Design principle of biological networks—network motif.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
UniProt - The Universal Protein Resource
An introduction to using the AmiGO Gene Ontology tool.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Session outline 1.Standards and the problem of data integration Example: PSICQUIC and the PSICQUIC game 2.Introduction to ontologies. Exploring the Gene.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Using The Gene Ontology: Gene Product Annotation.
Copyright OpenHelix. No use or reproduction without express written consent1.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
Ontologies, data standards and controlled vocabularies.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
EBI is an Outstation of the European Molecular Biology Laboratory. Avazeh Ghanbarian Paul Kersey Alessandro Vullo EBI Microme Annotation Meeting June 2011.
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
Reactome - a curated knowledgebase of human biological pathways and processes.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Copyright OpenHelix. No use or reproduction without express written consent1.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
A curated database of biological pathways.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
A database of biological pathways and processes (borrowed from a presentation created by Steve Jupe)
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Copyright OpenHelix. No use or reproduction without express written consent1.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
Welcome to Gramene’s RiceCyc (Pathways) Tutorial RiceCyc allows biochemical pathways to be analyzed and visualized. This tutorial has been developed for.
IntAct David Croft A database of Molecular Interactions.
An example of GO annotation from a primary paper Rebecca E. Foulger (UniProt Curator) GO Annotation Camp, June 2005 PMID:
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Protein sequence databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen This also includes old material from my thesis
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster.
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
An example of GO annotation from a primary paper GO Annotation Camp, July 2006 PMID:
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
OncoTrack Bioinformatics Workshop Max Planck Institute for Molecular Genetics, Berlin Wednesday 6 th November 2013 TimeSubject 13:30-15:00 Introduction.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
Cheminformatics and Metabolism Team The EBI Enzyme Portal.
Annotating with GO: an overview
Biological Databases By: Komal Arora.
Interactions and Ontologies
Introduction to the Gene Ontology
The Complex Portal Birgit Meldal
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
Presentation transcript:

The Complex Portal - relationship to Gene Ontology Sandra Orchard (IntAct)

Project Aim To design a Online Portal to search and visualise protein complexes Including cross-referencing to source databases and beyond Export to interested parties in a format of their choice Incorporate the data into network analysis tools Emphasis on major model organisms, chosen to span the taxonomic range – Homo sapiens, Saccharomyces cerevisiae, Escherichia coli Mus musculus, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces pombe, Arabidopsis thaliana All data held in IntAct DB – share editor, protein update mechanism, QC procedures Separate search and visualisation facility wwwdev.ebi.ac.uk/intact/complex/

The complex portal Create stable complex identifiers Joined curation effort  benefit to all collaborating databases: Resource sharing Elimination of redundancies  benefit to user: One central resource that links to all source databases

Definition: stable protein complexes A stable set (2 or more) of interacting protein molecules which can be co-purified and have been shown to exist as a functional unit in vivo. Non-protein molecules (e.g. small molecules, nucleic acids) may also be present in the complex. What is not a stable complex? Two proteins associated in a pulldown / coimmunoprecipitation with no functional link Enzyme/substrate, receptor/ligand or similar transient interactions Exception - obligate complex that requires substrate/ligand, e.g. PDGF receptors

Source Databases PDBe (EBI) – almost 1000 complexes imported ChEMBL (EBI) – 81 complexes imported, more to come with each release MatrixDB (Sylvie Richard-Blum, Univ. of Lyon) Mining UniProt – yeast (Bernd Roechert, SIB – manually) Reactome – human (EBI) Manual curation from IMEx DBs & the literature Gramene – Arabidopsis Unmaintained web resources – CYGD (yeast), CORUM (human), E. coli website, 3D Complexes (Sarah Teichmann, EBI),

Data captured currently for IntAct complexes Participants – proteins (UniProt), small molecules (ChEBI), nucleic acids (Ensembl, ChEBI, RNACentral?) Species Stoichiometry – when known Topology (= binding sites) – when known

Data captured currently for IntAct complexes Complex-specific, free-text annotation fields: Function and context – UniProt-style (visible in search results) Assembly, e.g. homodimer, heterotetramer… Physical properties, e.g. MW, size, topology/assembly Ligands Disease

Data captured currently for IntAct complexes Complex names: Recommended name: most recognisable name from literature, use GO component if specific complex exists in GO Systematic name: based on Reactome’s new CV names – ‘string of gene names with stoichiometry’ Synonyms: all other names the complex may be known as

Data captured currently for IntAct complexes Structured annotation using GO (BP, MF, CC) Cross references to experimental evidence: IMEx (+ non-IMEx IntAct & DIP), PDB, EMDB Cross references to related complex data: Reactome (human) ChEMBL PubMed (for further information) Intenz (enzyme EC numbers) OMIM (disease) ECO (evidence code ontology)

Parallel Annotation of complexes in GO Project start > 400 complex terms in GO CC, mostly children of GO: protein complex – lacking hierarchal structure Good collaboration with GO to provide structured annotation Parent terms mainly based on complex function TermGenie (TG) Standard Form Otherwise use TG Free Form Some complexes still direct children of GO: protein complex Adding “logical definitions” / “cross-products” / “extensions” e.g. “capable_of x activity”

ECO – Evidence Code Ontology ECO: physical interaction evidence used in manual assertion (=IPI) full experimental evidence for the complexes is present ECO: sequence orthology evidence used in manual assertion (=ISO) only limited experimental evidence exists for a complex in one species (e.g. mouse) but it is desirable to curate the complex which has been curated in another species (e.g. human) and orthologous gene products exist in the former species, e.g. PDGFs ECO: : inference from background scientific knowledge used in manual assertion, if: no or only partial experimental evidence can be found but the complexes are generally assumed to exist, e.g. GABA receptors exist in ChEMBL

Download At present: One PSI-MI xml file for all complexes on ftp site From next IntAct release: One file per complex within a folder per species on ftp site and a zip file per species Future: Separate files for each complex accessible on each complex details page List of files for complexes from search results list Database specified dumps Network analysis appropriate format (as developed by MIPS)

Project status Website will move to production site end March Further development (particularly graphics) will be made public over the next 6 months Curation priorities – Human (mouse), yeast, Ecoli - user requests Exports to GOA (process and component) and UniProt under discussion.

Future Plans - Display Add search filters, e.g. Species –almost done GO terms ECO  Advanced Search Links to ‘experimental evidence’ and ‘related complexes’ searches Schematic view of complex Add existing widgets/BioJS components to show content from other databases directly in the Complex Portal (BioJS) - crystal structure, pathway, enzyme reactions etc

Future Plans - Functionality Concept of ‘sets’ – important for Reactome import Hierarchy of complex sets  specific complex  sub- complex Introducing features to indicate, e.g. complex-drug binding sites

Complexes on demand 1. Request via ‘Contact us’ button 1.Name & components 2.Experimental paper 3.Full details including Function, stoichiometry and topology.. or we give you access to editor to create your own

17 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Summary of ‘User Survey’ and own goals

Summary of ‘User Survey’ - Search

Summary of ‘User Survey’ - Display

Summary of ‘User Survey’ - Features Expression Atlas?

Summary of ‘User Survey’ - Features Manually for mouse ECOxref to exp-evidence

Summary of ‘User Survey’ - Features Definition??? Reactome

Summary of ‘User Survey’ - Features

Summary of ‘User Survey’ - Downloads

IntAct and Complex Portal homepage

Complex Portal UniProt-style display

Complex Portal tab-style display