Finding Orthologous Groups René van der Heijden
What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)? Several approaches to find orthologous genes High-resolution orthology Steps involved Things to think about (homework)
Homology Genes are homologous if and only if they derive from the same ancestral gene Sufficient sequence similarity proofs homology Very dissimilar sequences: PSI blast, HMM searches
Homologous genes tend to have similar functions The usual range
Homologous genes tend to have similar functions Accurate function prediction requires something better than homology Orthology
Orthology “This gene in that other species …” We don’t have chicken genes ! They mean: the corresponding gene ? Why that particular gene ? Sure this actually is the gene ? Sure that all n orthologs are correct ?
Duplications, Speciations, and Orthology Evolution results in: Growing number of genes –Gene duplications –Horizontal gene transfer –De novo generation Growing number of species The fate of gene duplicates: Perish Find a new functional niche Tendency for functional expansion
Duplications, Speciations, and Orthology Two genes in two species are orthologous if they derive from one gene in their last common ancestor Orthologous genes are likely to have the same function Much stronger than “tend to have similar function”
the line represents a gene in some ancestral species a long long time ago in a land far far away speciation event there is a speciation event resulting in two species orthologous with the same, orthologous gene time one of the genes gets duplicated resulting in two paralogous genes another speciation event … but one of the paralogous genes is lost in one of the new species another speciation event current set of genes with apparent history Orthologous genes orthologs paralogs
Duplications, Speciations, and Orthology primal ancestor present genes evolutionary distance
Homologs, Orthologs, and Paralogs Homologous: one common ancestral gene Orthologous: separated by a speciation event Paralogous: separated by a duplication event Orthologs and Paralogs must be Homologs Are there homologous genes which are not orthologous nor paralogous? The view on orthology and paralogy is relative to a certain speciation
Inparalogs and Outparalogs Both, In- and Outparalogous genes are separated by a gene duplication event For Inparalogs, the duplication event is not followed by speciation(s) Outparalogs are separated by a duplication event, followed by speciation(s) Inparalogs are recent paralogs Outparalogs are more ancient paralogs Are Inparalogs Orthologs ? Depends on your definition: Yes: two genes are orthologous if they derive from one gene in the last common ancestor No: two genes are orthologous if they are only separated by cell division events
Reading Gene-Trees Although genes spec1,1 and spec2,1 are closer relatives, their distance is larger than that between spec1,1 and spec3,1 The tree suggests at least 2 gene losses
In-, and Outparalogs, Orthologs, and Co-orthologs
www = What, Why, and hoW? What: Orthologous genes are separated by cell division only Why: Orthologous genes are likely to have the same function How: Yes, how can orthologous relations be established ?
Several approaches The COG approach InParanoid Tree-based methods
COG approach Based on blast hits Establishment and extension of triangles:
COG approach II Extension of orthologous groups
InParanoid I Method denotes –IN- and OUTparalogs –For TWO species Find all hits from species A on B Find all hits from species B on A Find all bi-directional best hits (BBH) –These form putative orthologs
InParanoid II Find all hits from A on A Find all hits from B on B Find all InParalogs –These are all hits better than the orthologs –Better => more recently split
Detecting orthologous genes Usual methods based on blast hit quality: e.g. bi-directional best hit (BBH) BBH ortholog BBH ortholog
Genes with promiscuous domains Gene A may hit on gene B because of a shared domain X Gene B may hit on gene C because of a shared domain Y Promiscuous domains require (manual) curation
Tree-based methods 1.Get all homologous genes 2.Make multiple alignments 3.Generate phylogenetic gene trees 4.Analyze trees Uncertainty in multiple alignment? Different methods for distance calculations Superpose a trusted species tree? How to assess a level of accuracy?
The Phylogenetic Gene-Tree Multiple alignment for all genes Distance matrix calculation –Kimura correction –PAM model –Categories model Large trees: distance-based methods –Neighbor Joining
Uncertainty in trees Evolutionary noise –Differing rates of evolution –Convergent evolution (low complexity, coiled coils) –Promiscuous domains (recombination, fusion, fission) Use of heuristic methods –Multiple alignment –Tree making
Analyze trees … but don’t trust them fully Rigid analysis suggests many duplications and losses Presume scp branch is wrongly placed! If this is correct …. this can’t be
Three orthologous groups suggesting 15 gene losses Considering one wrongly placed gene leaves only 2 gene losses Analyze trees … but don’t trust them fully And if we accept wrong placement of branches …
Horizontal gene-transfer!
Remember … “ In-, and Outparalogs, Orthologs, and Co-orthologs”
Levels of Orthology
High-res versus Low-res Many, Complete, and Closely related genomes Use phylogenetic trees Challenge: Automatic Orthology assignment
Differential gene-loss
Things to think about (homework) Select a partner Collect a gene tree (and some copies) Carefully deduce which nodes are duplications and which are speciations Denote which genes are orthologous to each other (orthologous groups) Select interesting parts to predict what –The COG procedure would say –InParanoid would say –What would have happened if some genes (or species) where not involved in the analysis
Homework: also think about …