Download presentation
Presentation is loading. Please wait.
Published byThomasina Robertson Modified over 9 years ago
1
Orthology, paralogy and GO annotation Paul D. Thomas SRI International
2
Outline Why does orthology matter to us? A little background on evolution, orthology and paralogy Practical considerations for RefGenome
3
Why does “orthology” matter to us? Goal –identify genes in reference genomes that have the same or similar functions, so that comprehensive curation can be done simultaneously Why? –Different model organisms have different strengths for exploring different facets of gene function, and these can often inform each other –Most genes did not first evolve within a given extant species: they were INHERITED from a common ancestor. Genes in different organisms have similar functions because they were inherited, and haven’t changed much since the common ancestor.
4
How do we identify genes with similar functions? Evolutionary analysis Where do orthologs fit in, and what do we mean by orthologs?
5
How do we identify genes with similar functions? Evolutionary analysis Where do orthologs fit in, and what do we mean by orthologs? –Simple answer: “The same gene in different organisms” (separated only by speciation) Orthology = similar function
6
How do we identify genes with similar functions? Evolutionary analysis Where do orthologs fit in, and what do we mean by orthologs? –Simple answer: “The same gene in different organisms” (separated only by speciation) Orthology = similar function –Unfortunately, the world is not that simple Orthologous genes can have the different functions Paralogous genes (duplications) can have (to some extent at least) similar functions
7
How do we identify genes with similar functions? Evolutionary analysis Where do orthologs fit in, and what do we mean by orthologs? –Simple answer: “The same gene in different organisms” (separated only by speciation) Orthology = similar function –Unfortunately, the world is not that simple Orthologous genes can have the different functions Paralogous genes (duplications) can have (to some extent at least) similar functions –Fortunately, a slightly more complicated view can get us much closer to addressing the question of gene function
8
Representing evolution of related genes Start with Darwin’s basic model: –Copying An ancestral “species” “splits” into two separate species –Divergence Each copy (species) changes independently over generations –NATURAL SELECTION: adaptation to different environment
9
Darwin’s species tree Number of generations/time along one axis Amount of divergence along other axis Characters in common are due to inheritance –Also tells us something about common ancestor
10
Representing evolution of related genes “Gene families” Add detail from population genetics/molecular evolution to apply to genes –Copying An ancestral species “splits” into two separate species –SPECIATION A gene is duplicated in one population and subsequently inherited –DUPLICATION –Divergence Each copy (gene sequence) changes independently over generations –NATURAL SELECTION: sequence substitutions to adapt to new function/role –NEUTRAL DRIFT: accumulation of “neutral” substitutions
11
How does this relate to gene function? Copying –An ancestral species “splits” into two separate species SPECIATION: likely to continue performing ancestral function –BUT not always –A gene is duplicated in one population and subsequently inherited DUPLICATION: “redundant gene” free from previous constraints can adapt to a new function –BUT still inherits some aspects of ancestral function Divergence –Each “new” (gene sequence) changes independently over generations NATURAL SELECTION: sequence substitutions adapt to new/modified function/role NEUTRAL DRIFT: sequence changes from accumulation of “neutral” substitutions. This is the MAJOR source of sequence differences!
12
A gene tree Only one “informative” axis: rate of sequence evolution –For neutral changes this can often act as a “molecular clock” –Non-neutral changes will speed up the rate of evolution E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 D.m. A.g. S.p. S.c. MET12 C.e. D.r. G.g. H.s. MTHFR R.n. M.m.
13
So what? Practical considerations
14
OrthoMCL “ortholog cluster” An “ortholog cluster” is made by one or more “slices” through the protein family tree Some combination of evolutionary rates and history of duplications Might miss genes that could be efficiently annotated at the same time From a strict evolutionary standpoint, orthologs are separated ONLY by speciation events; TIGRFAMs has coined the term “equivalog” for functionally conserved groups E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 D.m. A.g. S.p. S.c. MET12 C.e. D.r. G.g. H.s. MTHFR R.n. M.m.
15
“ISS” Inference from sequence similarity A class of database search algorithm (e.g. BLAST) has become a metaphor –Implies “genes have similar functions because they have similar sequences” –Function is best determined using pairwise comparison
17
“ISS” More properly, ISS of function is inheritance! –“related genes have a common function because their common ancestor had that function, which was inherited by its descendants” –ISS is not just a statement about one gene. It is also making assertions about The common ancestor Inheritance of a “character” by –Both “pairwise similar” descendants –Other descendants
18
Homology inference in a tree inheritance and divergence of function E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 D.m. A.g. S.p. S.c. MET12 C.e. D.r. G.g. H.s. MTHFR R.n. M.m. “methylene tetrahydrofolate reductase activity” (m.f.) “methionine metabolic process” (b.p.)
19
Homology inference in a tree E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 D.m. A.g. S.p. S.c. MET12 C.e. D.r. G.g. H.s. MTHFR R.n. M.m. “methylene tetrahydrofolate reductase activity” (m.f.) “methionine metabolic process” (b.p.) NOT “methionine metabolic process” (b.p.) NOT “methylene tetrahydrofolate reductase activity” (m.f.)? NOT “methionine metabolic process” (b.p.)? “regulation of homocysteine metabolic process” (b.p.)
20
Homology inference in a tree E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 D.m. A.g. S.p. S.c. MET12 C.e. D.r. G.g. H.s. MTHFR R.n. M.m. COMBINES: 1.Evolutionary information (tree) 2.Experimental knowledge (GO annotations from literature) 3.Organism-specific biological knowledge (curators)
21
This is just an easy, self- consistent way of doing ISS! E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 D.m. A.g. S.p. S.c. MET12 C.e. D.r. G.g. H.s. MTHFR R.n. M.m. We have a picture of ALL the relationships rather than N flat lists that need to be reconciled
22
Tree annotation tool for RefGenome Pre-computed, searchable “library” of gene trees –Include “outgroup” organisms to help infer evolutionary histories –Gene members in any tree can be modified by curator feedback Tool for viewing tree and selecting “homology group” to be annotated Tool for viewing tree labeled with in-depth GO annotations from all MODs, and inferring ancestral functions and homology annotations Homology annotations will be supported by a tree node as evidence, trees will be available to scientific community HMMs will be constructed to allow other genome projects to infer GO terms, distributed by InterPro
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.