Download presentation
Presentation is loading. Please wait.
Published byQuentin McCarthy Modified over 8 years ago
1
Gene3D, Orthology and Homology-Based Inheritance of Protein-Protein Interactions Corin Yeats yeats@biochem.ucl.ac.uk http://gene3d.biochem.ucl.ac.uk/
2
The Gene3D Protein Family and Annotation Resource: (1)Identify sequence homologues of CATH domains -HMMs & hit resolution protocol DomainFinder. -UniProt, RefSeq, Ensembl (with generous help of SIMAP at MIPS). (2)Integrate with sequence annotation resources. -Pfam, GO, KEGG, UniProt annotation, IntAct, String -Flexible cross-resource comparisons, including CATH PDB domains. (3)Import sequence families - In-house OrthoFams, HAMAP, SIMAP clusters.
3
{A} Last Common Ancestor A Species 1 A’ Species 2 Defining Orthology W.M. Fitch (1970) Distinguishing homologous from analogous proteins. Syst. Zool. 19:99–113.
4
Defining Paralogy {A} Last Common Ancestor {a} A Species 1 A’ Species 2 a’ a W.M. Fitch (1970) Distinguishing homologous from analogous proteins. Syst. Zool. 19:99–113.
5
Co-orthology A Species 1 A’’ Species 2 A’ A’’’ {A} Last Common Ancestor Co-orthologues
6
Updating The Terminology: InParalogues: –“paralogs in a given lineage that all evolved by gene duplications that happened after the radiation (speciation) event that separated the given lineage from the other lineage under consideration” OutParalogues: –“paralogs in the given lineage that evolved by gene duplications that happened before the radiation (speciation) event” * E.L.L. Sonnhammer & E.V. Koonin (2002) Orthology, paralogy and proposed classification for paralog subtypes. TiG 18:619-620.
7
Defining “Ortholog Families”: Strict Definition: –Families split at every duplication event. –Many small families. Normal Definition: –Set root at appropriate level of interest. –Accept inparalogues. –More useful for function prediction.
8
Some Example Resources: Name# Fams# ProtsAutom- ated? Description HAMAP1493200,000MManually curated prokaryotic families. EggNog43,5821,241,751AUpdate and extension to COGs, with fine-grained subsets. TreeFam1,400/ 15,000 700,000M & AAnimal orthologue families and gene trees. ClusTr12.6 mill6,000,000ASingle-linkage high similarity clusters. Inparanoid?600,000ASpecific for pair-wise comparisons. OrthoFam300,004,600,000ALarge-scale affinity propagation clustering.
9
Making the OrthoFams: Get similarity matrix from SIMAP. Create 85% non-redundant sequence DB (CD-HIT). Cluster sequences using Affinity Propogation Clustering (APC; Frey & Dueck, 2007). Add back in highly similar sequences. Sub-cluster families at 10 levels of sequence identity. –“S-levels”
10
Creating the OrthoFams: N/AProt AProt BProt CProt D Prot AN/A42035 Prot BN/A 6520 Prot CN/A … SIMAP protein similarity matrix Prot A Prot D Prot C Prot B …. Prot A CD-HIT Prot C …. UniProt & RefSeq
11
A Simple Test of the OrthoFams: 99.9% OrthoFams map to one HAMAP family in bacteria. Each HAMAP family tends to map to several OrthoFams => Too conservative? >80% map to a single KEGG Orthologue term.
12
Inheriting Protein-Protein Interactions: Protein-protein interactions (including mechanism) can be conserved after gene duplication and speciation events. Some interactions are ancient and well conserved, many are not. Interactions within species are better conserved between homologues than between species. Interactions are not binary, but are based on affinity Not all detectable interactions are biologically relevant. Refs: Mika & Rost 2006, Shoemaker & Panchenko 2007
13
Interaction Inheritance Approaches: Homology-based approaches have struggled… –Mika & Rost, 2007 Problems: –High coverage or high quality input, not both. –Interaction networks re-arrange rapidly –No simple universal accurate sequence identity threshold can be found. Need to separate those that can be inherited reliably, and those that can’t.
14
The hiPPI Idea: homology inferred Protein-Protein Interactions (1)Assume OrthoFams provide more reliable functional groupings than simple similarity measures. (2)Assume high affinity ~= high conservation ~= low experimental false positive rate. (3)Require more than one piece of supporting evidence.
15
iLevelcLevelicLevelSpecies Mod Exp Mod Score 1078.5None 8.5 1078.5½½2.1 322.5None 2.5 32 ½¼0.3 Ofam A Hs Ce Mm S30 …. S100 ? ? ? iLevelcLevel Hs Ce Mm S30 …. S100 Poss A13.3Yes Poss B7.3No Ofam B
16
Interactions derived from MIPS, IntAct and MINT. GO Term semantic similarity calculated with the Lord method (Lord et al, 2003).
17
Links and References http://gene3d.biochem.ucl.ac.uk/ “Gene3D: comprehensive structural and functional annotation of genomes” Corin Yeats, Jonathan Lees, Adam Reid, Paul Kellam, Nigel Martin, Xinhui Liu, and Christine Orengo NAR (2008) 36:D414–D418.
18
The Algorithm: For a query protein - At each Ofam S-level (starting at 100%): Identify homologues with interactions. In the interacting Ofams are there any proteins from the same species as the query protein? If so, score the potential interactions. Each piece of supporting evidence is included in the score. Sum the scores for each potential interaction –Since each interaction may be predicted through multiple homologues
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.