Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?

Slides:



Advertisements
Similar presentations
CSE-700 Parallel Programming Assignment 6 POSTECH Oct 19, 2007 박성우.
Advertisements

Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor.
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Phylogenetic analysis To infer and study evolutionary history of homologous gene families Manuel Ruiz (CIRAD, Data Integration team) Alexis Dereeper (IRD)
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Tree of Life Chapter 26.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Phylogenetic reconstruction
Types of homology BLAST
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Comparative genomics Joachim Bargsten February 2012.
Molecular Evolution Revised 29/12/06
Orthology Analysis Erik Sonnhammer Center for Genomics and Bioinformatics Karolinska Institutet, Stockholm.
© Wiley Publishing All Rights Reserved. Phylogeny.
M ulti P aranoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple Proteomes Andrey Alexeyenko Ivica Tamas Gang Liu Erik L.L. Sonnhammer.
Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication.
Sequence alignment: Removing ambiguous positions: Generation of pseudosamples: Calculating and evaluating phylogenies: Comparing phylogenies: Comparing.
Bioinformatics and Phylogenetic Analysis
The Tree of Life From Ernst Haeckel, 1891.
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways.
CS273a Lecture 10, Aut 08, Batzoglou Multiple Sequence Alignment.
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Gene transfer Organismal tree: species B species A species C species D Gene Transfer seq. from B seq. from A seq. from C seq. from D molecular tree: speciation.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic trees Sushmita Roy BMI/CS 576
The diversity of genomes and the tree of life
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Building and visualizing phylogeny Henrik Lantz Dept. of Medical Biochemistry and Microbiology, BMC, Uppsala University.
Construction of Substitution Matrices
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Using blast to study gene evolution – an example.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Phylogenetics.
Phylogeny & Systematics
Classification and Phylogenetic Relationships
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
ASSEMBLY AND ALIGNMENT-FREE METHOD OF PHYLOGENY RECONSTRUCTION FROM NGS DATA Huan Fan, Anthony R. Ives, Yann Surget-Groba and Charles H. Cannon.
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Gene3D, Orthology and Homology-Based Inheritance of Protein-Protein Interactions Corin Yeats
Lecture 14 CS5661 Neighbor Joining Generates unrooted tree, allowing for unequal branches Given: Distance matrix for sequences Steps: Repeat 1-3 till all.
Phylogeny and the Tree of Life
Evolutionary genomics can now be applied beyond ‘model’ organisms
BLAST program selection guide
Basics of Comparative Genomics
Genome Annotation Continued
The Tree of Life From Ernst Haeckel, 1891.
Why could a gene tree be different from the species tree?
Phylogenetics Chapter 26.
Basics of Comparative Genomics
Phylogeny and the Tree of Life
Presentation transcript:

Finding Orthologous Groups René van der Heijden

What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)? Several approaches to find orthologous genes High-resolution orthology Steps involved Things to think about (homework)

Homology Genes are homologous if and only if they derive from the same ancestral gene Sufficient sequence similarity proofs homology Very dissimilar sequences: PSI blast, HMM searches

Homologous genes tend to have similar functions The usual range

Homologous genes tend to have similar functions Accurate function prediction requires something better than homology Orthology

Duplications, Speciations, and Orthology Evolution results in: Growing number of genes –Gene duplications –Horizontal gene transfer –De novo generation Growing number of species The fate of gene duplicates: Perish Find a new functional niche Tendency for functional expansion

Duplications, Speciations, and Orthology Two genes in two species are orthologous if they derive from one gene in their last common ancestor Orthologous genes are likely to have the same function Much stronger than “tend to have similar function”

Duplications, Speciations, and Orthology primal ancestor present genes evolutionary distance

Homologs, Orthologs, and Paralogs Homologous: one common ancestral gene Orthologous: separated by a speciation event Paralogous: separated by a duplication event Orthologs and Paralogs must be Homologs Are there homologous genes which are not orthologous nor paralogous? The view on orthology and paralogy is relative to a certain speciation

Inparalogs and Outparalogs Both, In- and Outparalogous genes are separated by a gene duplication event For Inparalogs, the duplication event is not followed by speciation(s) Outparalogs are separated by a duplication event, followed by speciation(s) Inparalogs are recent paralogs Outparalogs are more ancient paralogs Are Inparalogs Orthologs ? Depends on your definition: Yes: two genes are orthologous if they derive from one gene in their last common ancestor No: two genes are orthologous if they are only separated by cell division events

Reading Gene-Trees Although genes spec1,1 and spec2,1 are closer relatives, their distance is larger than that between spec1,1 and spec3,1 The tree suggests at least 2 gene losses

In-, and Outparalogs, Orthologs, and Co-orthologs

More examples

www = What, Why, and hoW? What: Orthologous genes are separated by cell division only Why: Orthologous genes are likely to have the same function How: Yes, how can orthologous relations be established ?

Several approaches The COG approach InParanoid Tree-based methods

COG approach Based on blast hits Establishment and extension of triangles:

COG approach II Extension of orthologous groups

InParanoid I Method denotes –IN- and OUTparalogs –For TWO species Find all hits from species A on B Find all hits from species B on A Find all bi-directional best hits (BBH) –These for putative orthologs

InParanoid II Find all hits from A on A Find all hits from B on B Find all InParalogs –These are all hits better than the orthologs –Better => more recently split

InParanoid III Putative orthologous pairs are curated by an outgroup species C InParalogs are given a confidence value Bootstrapping is used to give confidence values for orthologous pairs

Genes with promiscuous domains Gene A may hit on gene B because of a shared domain X Gene B may hit on gene C because of a shared domain Y Promiscuous domains require (manual) curation

Tree-based methods 1.Get all homologous genes 2.Make multiple alignments 3.Generate phylogenetic gene trees 4.Analyze trees Uncertainty in multiple alignment? Different methods for distance calculations Superpose a trusted species tree? How to assess a level of accuracy?

The Phylogenetic Gene-Tree Multiple alignment for all genes Distance matrix calculation –Kimura correction –PAM model –Categories model Large trees: distance-based methods –Neighbor Joining

Uncertainty in trees Evolutionary noise –Differing rates of evolution –Convergent evolution (low complexity, coiled coils) –Promiscuous domains (recombination, fusion, fission) Use of heuristic methods –Multiple alignment –Tree making

Analyze trees … but don’t trust them fully Rigid analysis suggests many duplications and losses Presume scp branch is wrongly placed! If this is correct …. this can’t be

Three orthologous groups suggesting 15 gene losses Considering one wrongly placed gene leaves only 2 gene losses Analyze trees … but don’t trust them fully And if we accept wrong placement of branches …

High-res versus Low-res Many, Complete, and Closely related genomes Challenge: Automatic Orthology assignment

Things to think about (homework) Select a partner Collect a gene tree (and some copies) Carefully deduce which nodes are duplications and which are speciations Denote which genes are orthologous to each other (orthologous groups) Select interesting parts to predict what –The COG procedure would say –InParanoid would say –What would have happened if some genes (or species) where not involved in the analysis

Homework: also think about …