1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.

Slides:



Advertisements
Similar presentations
Large scale genomes comparisons Bioinformatics aspects (Introduction) Fredj Tekaia Institut Pasteur EMBO Bioinformatic and Comparative.
Advertisements

Orthologs and paralogs Algorithmen der Bioinformatik WS 11/12.
Phylogenetic analysis To infer and study evolutionary history of homologous gene families Manuel Ruiz (CIRAD, Data Integration team) Alexis Dereeper (IRD)
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Tree of Life Chapter 26.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Orthology, paralogy and GO annotation Paul D. Thomas SRI International.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Classification of Living Things. 2 Taxonomy: Distinguishing Species Distinguishing species on the basis of structure can be difficult  Members of the.
Phylogenetic reconstruction
Types of homology BLAST
Comparative genomics Joachim Bargsten February 2012.
Molecular Evolution Revised 29/12/06
© Wiley Publishing All Rights Reserved. Phylogeny.
Xenolog: Homologs resulting from horizontal gene transfer.
Sequence Similarity Searching Class 4 March 2010.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Bioinformatics and Phylogenetic Analysis
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Phylogenetic trees Sushmita Roy BMI/CS 576
The diversity of genomes and the tree of life
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
Pairwise & Multiple sequence alignments
Pairwise Alignments Part 1 Biology 224 Instructor: Tom Peavy Sept 8
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Evolutionary Biology Concepts Molecular Evolution Phylogenetic Inference BIO520 BioinformaticsJim Lund Reading: Ch7.
Introduction to Phylogenetics
Construction of Substitution Matrices
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Using blast to study gene evolution – an example.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Phylogeny Ch. 7 & 8.
Globins. Globin diversity Hemoglobins ( , etc) Myoglobins (muscle) Neuroglobins (in CNS) Invertebrate globins Leghemoglobins flavohemoglobins.
A B C D E F G H I J K FigS1. Supplemental Figure S1. Evolutionary relationships of Arabidopsis and tomato Aux/IAA proteins. The evolutionary history was.
Phylogenetics.
Phylogeny & Systematics
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Phylogeny.
HomologyIf twp proteins are homologous, they have a common fold and a common ancestor If two proteins have >25% identity across their entire length, they.
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Sequence similarity, BLAST alignments & multiple sequence alignments
BLAST program selection guide
Basics of Comparative Genomics
Comparative Genomics.
Pipelines for Computational Analysis (Bioinformatics)
Phylogeny and Systematics
What do you with a whole genome sequence?
Homoeologs: What Are They and How Do We Infer Them?
Pairwise Sequence Alignment
Phylogenetics Chapter 26.
Basics of Comparative Genomics
Basic Local Alignment Search Tool
Phylogeny and the Tree of Life
Study phylogeny in the context of species evolution
Presentation transcript:

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of as within the same species, that originated by one or more gene duplication events (note no regard to function! and does NOT require one-to-one relationships) Orthology & Paralogy (etc. etc.)

2 ABCDE SPECIES TREE A1B1C1D1E1 GENE TREE Clear case of orthology: each gene 1 in each species is an ortholog Of the others - all descended from a single common ancestor Ancestral Gene 1 Ancestral species

3 A1B1C1D1 E1 GENE TREE Ancestral Gene 1 C2D2 ABCDE SPECIES TREE Ancestral species Duplication event along branch to species C & D C1 and C2 are paralogs, D1 and D2 are paralogs What about A1 to C1? To C2? Gene duplication along this species branch

4 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, within the same species, that originated by one or more gene duplication events (note no regard to function!) Also now many subtle variants: Outparalogs: cross-species paralogs (i.e. gene duplication BEFORE speciation) Inparalogs: lineage-specific duplication (i.e. duplication AFTER speciation) Ohnolog: duplicates originating from a whole-genome duplication (WGD) Xenolog: genes related by horizontal gene transfer between species Orthology & Paralogy (etc. etc.)

5 Phenology vs. Phylogeny Phenology: tree based on similarity of characteristics 1.Align protein & score alignment (# of identical and ‘conserved’ amino acids) 2.Build a tree based on sequence similarity A1B1C1C2 A1 is more similar to C1 than C2 - A1 & C1 are likely (* but not guaranteed!) more similar functionally Phylogeny: tree based on evolutionary history A1B1C1C2 But historically, A1 is equally distant to C1 and C2 1.Requires inferring history across the species

6 Species A Gene A1 Gene A2... Gene An Species B 1.BLAST Gene A1 against Species B genome 2.Take top BLAST hit in Species B and use as the query against Species A 3.If Gene A1 is the top blast hit in the genome, then call A1 & B4 orthologs Gene B1 Gene B2... Gene Bn Methods of orthology prediction 1. Reciprocal best-BLAST hits (RBH): simplest method

7 Methods of orthology prediction 1. Reciprocal best-BLAST hits (RBH): simplest method Species ASpecies B 1.BLAST Gene A1 against Species B genome 2.Take top BLAST hit in Species B and use as the query against Species A 3.If Gene A1 is the top blast hit in the genome, then call A1 & B4 orthologs Gene B1 Gene B2... Gene Bn Gene A1 Gene A2... Gene An

8 Problems with RBH * Clear cases where the top BLAST hit is NOT the ortholog e.g. top hits can be highly conserved common domains * Gene duplications in one species can completely obscure orthologous hits * Orthologs with very low sequence homology can be missed altogether

9 Methods of orthology prediction 2. Reciprocal Smallest Distance (RSD): slightly more complicated Species ASpecies B 1.BLAST Gene A1 against Species B genome 2.Take X number of top BLAST hits (user determined) Gene B1 Gene B2... Gene Bn Gene A1 Gene A2... Gene An

10 1.BLAST Gene A1 against Species B genome 2.Take X number of top BLAST hits (user determined) 3.Do a global multiple alignment - throw out proteins with >Y% gapped positions 2. Reciprocal Smallest Distance (RSD): slightly more complicated Methods of orthology prediction

11 1.BLAST Gene A1 against Species B genome 2.Take X number of top BLAST hits (user determined) 3.Do a global multiple alignment - throw out proteins with <Y% gapped positions 4.Take remaining proteins and find the single one with the closest evolutionary distance 2. Reciprocal Smallest Distance (RSD): slightly more complicated Methods of orthology prediction

12 Species ASpecies B Gene B1 Gene B2... Gene Bn Gene A1 Gene A2... Gene An 1.BLAST Gene A1 against Species B genome 2.Take X number of top BLAST hits (user determined) 3.Do a global multiple alignment - throw out proteins with <Y% gapped positions 4.Take remaining proteins and find the single one with the closest evolutionary distance 5.Final reciprocal BLAST using remaining gene in Species B as query against Genome A 2. Reciprocal Smallest Distance (RSD): slightly more complicated Methods of orthology prediction

13 Problems with RSD * Clear cases where the top BLAST hit is NOT the ortholog e.g. top hits can be highly conserved common domains * Gene duplications in one species can completely obscure orthologous hits * Orthologs with very low sequence homology can be missed altogether

14 3. Newest methods take synteny into account Methods of orthology prediction Syntenic = conserved gene/sequence order Gene A1A2A3A4 Gene B1B2B3B4

15 Problems with Synteny-based Methods * Clear cases where the top BLAST hit is NOT the ortholog e.g. top hits can be highly conserved common domains * Gene duplications in one species less likely to obscure things * Orthologs with low sequence homology not part of a larger duplication could still be missed

16 Methods of orthology prediction 4. Clusters of Orthologs (COG) approach: - Addresses the restriction of 1:1 orthologs - Identifies inparalogs and then id’s orthologous relationships between groups SpeciesABCD Several approaches can assign COGs across many species at once (InParanoid, Fuzzy RB)

Lots of different databases of orthologs (esp. for model organisms)

Of course, different methods of orthology assignment can give very different results

19 AND … genome errors can really obscure things Bad genome annotations can affect orthology & paralogy relationships - missing genes, fused genes, incorrect start/stop annotations Bad assembly can affect ortho clusters: - amplifications or decreases of gene family numbers

20 Why is orthology-paralogy so important? Allows us to study the history of protein evolution & infer constraints A1B1C1D1 E1 GENE TREE Ancestral Gene 1 C2D2 Gene duplication along this species branch A2 Separate gene duplication in Species A

21

22 Glucocorticoid Receptor (GR) Mineralocorticoid Receptor (MR) LigandGoverns CortisolStress Response Aldosterone (tetrapods) DOC (teleosts) Electrolyte Homeostasis * Teleosts don’t make aldosterone

23 Figure 1 Blue = Aldo binding Red = Cortisol ONLY

24 Two amino-acid changes in AncCR can alter specificity Blue = DOC Red = Cortisol Green = Aldo S106P likely occurred FIRST, then L111Q

25 Model for evolution of ligand binding & hormone response 1.Ancestral protein could bind Aldo, even though no Aldo present 2.Duplication ~450 mya = redundant receptors 3.Two successive changes in GR = switch to Cortisol Specificity 4.Emergence of Aldosterone Hormone