Download presentation
Presentation is loading. Please wait.
Published byMark Prosper Stokes Modified over 9 years ago
1
PRO and IntAct protein complexes Sandra Orchard PRO Meeting, June 19, 2014
2
Project aims Reference resource for macromolecular complexes Create species-specific stable complex identifiers Central reference resource to link all related efforts (UniProt for protein complexes) Dedicated online Portal to search and visualise (text and graphics), can also export to Cytoscape Emphasis on major model organisms Stored in relational database (IntAct) with existing update mechanisms Download format – PSI-MI XML, can write user-defined format from database Reference ontology for protein complexes PRO terms can span across species or be species-specific Stable ontology term identifier Searched and viewed (text-only) in existing PRO website – can export to Cytoscape Emphasis on major model organisms Stored in an internal database with existing update mechanism Download – ontology (OBO, OWL), annotation file IntAct PRO
3
Complex definition A stable set (2 or more, to include homodimers) of interacting protein molecules which – can be co-purified and – have been shown to exist as a functional unit in vivo. Non-protein molecules (e.g. small molecules, nucleic acids) may also be present in the complex. Does not include Molecules associated in a pulldown / coimmunoprecipitation but with no functional link Enzyme/substrate, receptor/ligand or similar transient interactions (except when required for stable complex formation) Protein complexes, including homo complexes (e.g. homodimers) Complexes may include non-proteins components IntAct PRO
4
Data capture Participants – proteins (UniProt), small molecules (ChEBI), nucleic acids (ChEBI, (RNACentral)) Participant features - binding domains, required PTMs Species Stoichiometry – when known Topology (linked binding domains) – when known Function – free text Assembly, e.g. homodimer, heterotetramer… Physical properties, e.g. MW, size, topology/assembly Ligands Disease Participants – proteins (PR identifiers), small molecules (ChEBI), nucleic acids (?) PTMs - implicit in PR term Species Cardinality – indicates stoichiometry Definition – free text “composed of x number of subunits of various components” disease and functional properties are added as an annotation in PAF if known IntAct PRO
5
Data capture - nomenclature Recommended name: - most recognisable name from literature, use GO component if specific complex exists in GO Systematic name: -based on Reactome’s new CV names – ‘string of (species-specific) gene names with stoichiometry’ Synonyms: - all other names the complex may be known as Name: - most recognisable name from literature, use GO component if specific complex exists in GO Systematic name: -based on Reactome’s new CV names (stoichiometry not incorporated) Synonyms: - all other names the complex may be known as IntAct PRO
6
Data Capture - xrefs GO (BP, MF, CC) – manually curated to complex, not just imported from proteins Cross references to experimental evidence: IMEx (+ non-IMEx IntAct, MINT & DIP, MatrixDB), Reactome (human) PDB, EMDB ChEMBL PubMed (for further information) IntEnz (enzyme EC numbers) OMIM/EFO (disease) TaxID GO – used as parent term Reactome (human) PubMed TaxID IntAct PRO
7
Data capture - evidence ECO codes ECO:0000353 (physical interaction evidence used in manual assertion) - full experimental evidence for the complex added to the entry. ECO:0000266 (sequence orthology evidence used in manual assertion) + inferred from “complex ID” – across species ECO:0000250 (sequence similarity evidence used in manual assertion) + inferred from “complex ID” – within species ECO:0000306 (inference from background scientific knowledge used in manual assertion) - modelled ECO Codes EXP experimentally verified → ECO:0000269 (experimental evidence used in manual assertion) ECO:0000088 (biological system reconstruction) - modelled IntActPRO
11
Linked binding domains PTMs annotated using MOD
12
SpeciesIntActPRO Human226215 Mouse17393 Rat490 Cow30 Drosophila Melanogaster 120 C.elegans(2)0 Xenopus laevis30 Arabidopsis thaliana 08 Saccharomyces cerevisiae 301 17 S.pombe16 E. coli870 Total870352 (+215 protein agnostic parent terms) Protein Complex Statistics
13
IntAct - Parallel Annotation of complexes in GO Project start > 400 complex terms in GO Cellular Component branch, mostly children of GO:0043234 protein complex – lacks hierarchical structure Collaboration agreed with GO to provide more structured annotation whilst also adding new terms Parent terms mainly based on complex function e.g. enyzme complexes, transcription factor complexes – TermGenie (TG) Standard Form – Otherwise use TG Free Form – Some complexes still direct children of GO:0043234 protein complex Adding “logical definitions” / “cross-products” / “extensions” – e.g. “capable of x activity”
14
IntAct Data Sources/Curation priorities PDBe – almost 1000 complexes imported, more planned. Experimental data can be imported at same time (N.B. many of these have proven to be partial/sub-complexes so will not directly translate into 1000 finished products. Also many from non-model organisms) – curation ongoing PDB collaborating and mayadd curation effort ChEMBL – 81 drug-target complexes imported – curation complete, more to come with each release (mostly human/mouse/rat) MatrixDB (Sylvie Richard-Blum, Univ. of Lyon) – list of extracellular complexes – curation complete (human/mouse) Reactome – mapping into PSI-MI XML → direct import into IntAct ongoing, issue with sets has now been resolved (human) Mining UniProt (Bernd Roechert, SIB – manually) – curation ongoing (yeast) Manual curation from IMEx DBs & the literature SGD yeast complex list – SGD contributing curation effort EcoCyc – complex list has been dumped into Excel sheet, useful as ‘to do’ list but not suitable for import – curation ongoing (E.coli)
15
PRO data sources/Curation priorities Toll-like receptor pathway. Curation of both human and mouse (Anna Maria Masci at Duke and Veronica Shamovsky/Peter D’Eustachio from Reactome) Complexes for the Brassinosteroid signaling pathway in Arabidopsis (Mengxi Lv and Cecilia Arighi at University of Delaware) Complexes in TGF-beta signaling pathway (Cecilia, human complexes aligned with Reactome data) Complexes in cell cycle spindle checkpoint for human and yeast (Karen Ross, University of Delaware) Beta catenin related complexes (Irem Celen, University of Delaware)
16
What else has IntAct to offer? 1.Web-based editorial tool – Institution/curator management system enables attribution of effort to institute -APIs to UniProt, ChEBI (RNA Central when available) allow immediate import of interactors plus selected xrefs. -OLS enables enrichment of CV terms e.g. GO names when AC no used for import -Pulldown menus restricts CV usage to appropriate fields -Intelligent ‘syntax checker’ limits curator error
17
What else has IntAct to offer? 2. JIRA issue tracker - enables tracking of complexes requiring QC by 2 nd curator - used to request addition of new complex GO terms or hierarchy re-organization, this then undertaken via Term Genie - could additionally be used to request IntAct curation of experimental evidence papers not already in database(s)
18
What else has IntAct to offer? 3. Automated update process - protein update system. Tracks changes to underlying sequence with every release of UniProt and remaps features (binding domains, PTMs) accordingly. Withdrawn proteins (TrEMBL) remapped. - CV update system.
19
Proposal for joint curation 1.IntAct/PRO to align curation rules – discussions ongoing 2.IntAct to import PRO complexes & update all existing to joint rule set 3.IntAct to produce script to write complexes to flat file format 4.PRO curators to train on IntAct editor – all new complexes curated in IntAct 5.IntAct responsible for long-term data maintenance
20
Proposal for joint curation 6. IntAct to write flat files for new/updated complexes with every release 7. PRO to map UniProt + MOD → PR IDs 8. PRO to create ontology, including addition of parent ‘species-agnostic’ terms (IntAct will have “super-complex (Reactome ‘set’ equivalent)/complex/sub- complex relationship – OK for PRO?)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.