Orthology, paralogy and GO annotation Paul D. Thomas SRI International.

Slides:



Advertisements
Similar presentations
Evolution in population
Advertisements

Homology.
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Homology Review Human arm Lobed-fin fish fin Bat wing Bird wing Insect wing Homologous forelimbs not homologous as forelimbs or wings Definition: Structures.
Chapter 19 Evolutionary Genetics 18 and 20 April, 2004
 Species evolve with significantly different morphological and behavioural traits due to genetic drift and other selective pressures.  Example – Homologous.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Comparative genomics Joachim Bargsten February 2012.
Classification systems have changed over time as information has increased. Section 2: Modern Classification K What I Know W What I Want to Find Out L.
Phylogeny and Systematics By: Ashley Yamachika. Biologists use systematics They use systematics as an analytical approach to understanding the diversity.
Bioinformatics and Phylogenetic Analysis
The Tree of Life From Ernst Haeckel, 1891.
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogeny & The Tree of Life. Phylogeny  The evolutionary history of a species or group of species.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Speciation SJCHS. Evolution Microevolution: Change in a population ’ s gene pool from generation to generation Speciation: When one or more new species.
Classification and Systematics Tracing phylogeny is one of the main goals of systematics, the study of biological diversity in an evolutionary context.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
Systematics and the Phylogenetic Revolution Chapter 23.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Evolutionary Biology Concepts Molecular Evolution Phylogenetic Inference BIO520 BioinformaticsJim Lund Reading: Ch7.
Introduction to Phylogenetics
17.2 Modern Classification
Protein and RNA Families
26.1 Organisms Evolve Through Genetic Change Occurring Within Populations. “Nothing in Biology makes sense except in the light of Evolution” –Theodosius.
Introduction to History of Life. Biological evolution consists of change in the hereditary characteristics of groups of organisms over the course of generations.
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Organon Tim Hulsen, Oss, November 11, 2003.
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Globins. Globin diversity Hemoglobins ( , etc) Myoglobins (muscle) Neuroglobins (in CNS) Invertebrate globins Leghemoglobins flavohemoglobins.
Classification and Phylogenetic Relationships
Lesson Overview 17.4 Molecular Evolution.
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster.
Chapter 26 Phylogeny and Systematics. Tree of Life Phylogeny – evolutionary history of a species or group - draw information from fossil record - organisms.
Phylogeny.
Ch. 26 Phylogeny and the Tree of Life. Opening Discussion: Is this basic “tree of life” a fact? If so, why? If not, what is it?
 Phylogenetic trees and Cladograms are hypotheses. The only guarantee is that they will change as we gather and analyze more data. From Young and Strode.
Gene3D, Orthology and Homology-Based Inheritance of Protein-Protein Interactions Corin Yeats
Monkey Business Bioinformatics Research Center University of Aarhus Thomas Mailund Joint work with Asger Hobolth, Ole F. Christiansen and Mikkel H. Schierup.
TOPIC 7- EVIDENCE FOR THE THEORY OF EVOLUTION
Phylogeny and the Tree of Life
(Quantitative, Evolution, & Development)
Evolutionary genomics can now be applied beyond ‘model’ organisms
Lecture 81 – Lecture 82 – Lecture 83 Modern Classification Ozgur Unal
Basics of Comparative Genomics
Sequence based searches:
Lesson Overview 17.4 Molecular Evolution.
Comparative Genomics.
P-POD-PANTHER: update
Using BLAST to Identify Species from Proteins
Genome Annotation Continued
The process of speciation
PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics.
Phylogeny and Systematics
Chapter 26- Phylogeny and Systematics
6.2 Evidence of Evolution Key concepts: What evidence supports the theory of evolution? How do scientists infer evolutionary relationships among organisms?
Basics of Comparative Genomics
Evolution and Natural Selection
Cladistics 5.4.
Using BLAST to Identify Species from Proteins
Presentation transcript:

Orthology, paralogy and GO annotation Paul D. Thomas SRI International

Outline Why does orthology matter to us? A little background on evolution, orthology and paralogy Practical considerations for RefGenome

Why does “orthology” matter to us? Goal –identify genes in reference genomes that have the same or similar functions, so that comprehensive curation can be done simultaneously Why? –Different model organisms have different strengths for exploring different facets of gene function, and these can often inform each other –Most genes did not first evolve within a given extant species: they were INHERITED from a common ancestor. Genes in different organisms have similar functions because they were inherited, and haven’t changed much since the common ancestor.

How do we identify genes with similar functions? Evolutionary analysis Where do orthologs fit in, and what do we mean by orthologs?

How do we identify genes with similar functions? Evolutionary analysis Where do orthologs fit in, and what do we mean by orthologs? –Simple answer: “The same gene in different organisms” (separated only by speciation) Orthology = similar function

How do we identify genes with similar functions? Evolutionary analysis Where do orthologs fit in, and what do we mean by orthologs? –Simple answer: “The same gene in different organisms” (separated only by speciation) Orthology = similar function –Unfortunately, the world is not that simple Orthologous genes can have the different functions Paralogous genes (duplications) can have (to some extent at least) similar functions

How do we identify genes with similar functions? Evolutionary analysis Where do orthologs fit in, and what do we mean by orthologs? –Simple answer: “The same gene in different organisms” (separated only by speciation) Orthology = similar function –Unfortunately, the world is not that simple Orthologous genes can have the different functions Paralogous genes (duplications) can have (to some extent at least) similar functions –Fortunately, a slightly more complicated view can get us much closer to addressing the question of gene function

Representing evolution of related genes Start with Darwin’s basic model: –Copying An ancestral “species” “splits” into two separate species –Divergence Each copy (species) changes independently over generations –NATURAL SELECTION: adaptation to different environment

Darwin’s species tree Number of generations/time along one axis Amount of divergence along other axis Characters in common are due to inheritance –Also tells us something about common ancestor

Representing evolution of related genes “Gene families” Add detail from population genetics/molecular evolution to apply to genes –Copying An ancestral species “splits” into two separate species –SPECIATION A gene is duplicated in one population and subsequently inherited –DUPLICATION –Divergence Each copy (gene sequence) changes independently over generations –NATURAL SELECTION: sequence substitutions to adapt to new function/role –NEUTRAL DRIFT: accumulation of “neutral” substitutions

How does this relate to gene function? Copying –An ancestral species “splits” into two separate species SPECIATION: likely to continue performing ancestral function –BUT not always –A gene is duplicated in one population and subsequently inherited DUPLICATION: “redundant gene” free from previous constraints can adapt to a new function –BUT still inherits some aspects of ancestral function Divergence –Each “new” (gene sequence) changes independently over generations NATURAL SELECTION: sequence substitutions adapt to new/modified function/role NEUTRAL DRIFT: sequence changes from accumulation of “neutral” substitutions. This is the MAJOR source of sequence differences!

A gene tree Only one “informative” axis: rate of sequence evolution –For neutral changes this can often act as a “molecular clock” –Non-neutral changes will speed up the rate of evolution E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 D.m. A.g. S.p. S.c. MET12 C.e. D.r. G.g. H.s. MTHFR R.n. M.m.

So what? Practical considerations

OrthoMCL “ortholog cluster” An “ortholog cluster” is made by one or more “slices” through the protein family tree Some combination of evolutionary rates and history of duplications Might miss genes that could be efficiently annotated at the same time From a strict evolutionary standpoint, orthologs are separated ONLY by speciation events; TIGRFAMs has coined the term “equivalog” for functionally conserved groups E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 D.m. A.g. S.p. S.c. MET12 C.e. D.r. G.g. H.s. MTHFR R.n. M.m.

“ISS” Inference from sequence similarity A class of database search algorithm (e.g. BLAST) has become a metaphor –Implies “genes have similar functions because they have similar sequences” –Function is best determined using pairwise comparison

“ISS” More properly, ISS of function is inheritance! –“related genes have a common function because their common ancestor had that function, which was inherited by its descendants” –ISS is not just a statement about one gene. It is also making assertions about The common ancestor Inheritance of a “character” by –Both “pairwise similar” descendants –Other descendants

Homology inference in a tree inheritance and divergence of function E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 D.m. A.g. S.p. S.c. MET12 C.e. D.r. G.g. H.s. MTHFR R.n. M.m. “methylene tetrahydrofolate reductase activity” (m.f.) “methionine metabolic process” (b.p.)

Homology inference in a tree E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 D.m. A.g. S.p. S.c. MET12 C.e. D.r. G.g. H.s. MTHFR R.n. M.m. “methylene tetrahydrofolate reductase activity” (m.f.) “methionine metabolic process” (b.p.) NOT “methionine metabolic process” (b.p.) NOT “methylene tetrahydrofolate reductase activity” (m.f.)? NOT “methionine metabolic process” (b.p.)? “regulation of homocysteine metabolic process” (b.p.)

Homology inference in a tree E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 D.m. A.g. S.p. S.c. MET12 C.e. D.r. G.g. H.s. MTHFR R.n. M.m. COMBINES: 1.Evolutionary information (tree) 2.Experimental knowledge (GO annotations from literature) 3.Organism-specific biological knowledge (curators)

This is just an easy, self- consistent way of doing ISS! E.c. A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 D.m. A.g. S.p. S.c. MET12 C.e. D.r. G.g. H.s. MTHFR R.n. M.m. We have a picture of ALL the relationships rather than N flat lists that need to be reconciled

Tree annotation tool for RefGenome Pre-computed, searchable “library” of gene trees –Include “outgroup” organisms to help infer evolutionary histories –Gene members in any tree can be modified by curator feedback Tool for viewing tree and selecting “homology group” to be annotated Tool for viewing tree labeled with in-depth GO annotations from all MODs, and inferring ancestral functions and homology annotations Homology annotations will be supported by a tree node as evidence, trees will be available to scientific community HMMs will be constructed to allow other genome projects to infer GO terms, distributed by InterPro