Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?

Slides:



Advertisements
Similar presentations
CSE-700 Parallel Programming Assignment 6 POSTECH Oct 19, 2007 박성우.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Phylogenetic analysis To infer and study evolutionary history of homologous gene families Manuel Ruiz (CIRAD, Data Integration team) Alexis Dereeper (IRD)
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
THE EVOLUTIONARY HISTORY OF BIODIVERSITY
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Tree of Life Chapter 26.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Phylogenetic reconstruction
Types of homology BLAST
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Comparative genomics Joachim Bargsten February 2012.
Chapter 18 Classification
Molecular Evolution Revised 29/12/06
© Wiley Publishing All Rights Reserved. Phylogeny.
M ulti P aranoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple Proteomes Andrey Alexeyenko Ivica Tamas Gang Liu Erik L.L. Sonnhammer.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication.
Bioinformatics and Phylogenetic Analysis
The Tree of Life From Ernst Haeckel, 1891.
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
"Nothing in biology makes sense except in the light of evolution" Theodosius Dobzhansky.
FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways.
CS273a Lecture 10, Aut 08, Batzoglou Multiple Sequence Alignment.
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Gene transfer Organismal tree: species B species A species C species D Gene Transfer seq. from B seq. from A seq. from C seq. from D molecular tree: speciation.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Short Primer on Comparative Genomics Today: Special guest lecture 12pm, Alway M108 Comparative genomics of animals and plants Adam Siepel Assistant Professor.
Phylogenetic trees Sushmita Roy BMI/CS 576
The diversity of genomes and the tree of life
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
P HYLOGENETIC T REE. OVERVIEW Phylogenetic Tree Phylogeny Applications Types of phylogenetic tree Terminology Data used to build a tree Building phylogenetic.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Introduction to Phylogenetic Trees
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Using blast to study gene evolution – an example.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Phylogeny Ch. 7 & 8.
Classification. Cell Types Cells come in all types of shapes and sizes. Cell Membrane – cells are surrounded by a thin flexible layer Also known as a.
Phylogeny & Systematics
Classification and Phylogenetic Relationships
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
ASSEMBLY AND ALIGNMENT-FREE METHOD OF PHYLOGENY RECONSTRUCTION FROM NGS DATA Huan Fan, Anthony R. Ives, Yann Surget-Groba and Charles H. Cannon.
Phylogeny.
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Lesson Overview Lesson Overview Modern Evolutionary Classification 18.2.
Gene3D, Orthology and Homology-Based Inheritance of Protein-Protein Interactions Corin Yeats
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Evolutionary genomics can now be applied beyond ‘model’ organisms
BLAST program selection guide
Basics of Comparative Genomics
Genome Annotation Continued
Why could a gene tree be different from the species tree?
Phylogeny and Systematics
Phylogenetics Chapter 26.
Basics of Comparative Genomics
Evolution Biology Mrs. Johnson.
Phylogeny and the Tree of Life
Presentation transcript:

Finding Orthologous Groups René van der Heijden

What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)? Several approaches to find orthologous genes High-resolution orthology Steps involved Things to think about (homework)

Homology Genes are homologous if and only if they derive from the same ancestral gene Sufficient sequence similarity proofs homology Very dissimilar sequences: PSI blast, HMM searches

Homologous genes tend to have similar functions The usual range

Homologous genes tend to have similar functions Accurate function prediction requires something better than homology Orthology

Orthology “This gene in that other species …” We don’t have chicken genes ! They mean: the corresponding gene ? Why that particular gene ? Sure this actually is the gene ? Sure that all n orthologs are correct ?

Duplications, Speciations, and Orthology Evolution results in: Growing number of genes –Gene duplications –Horizontal gene transfer –De novo generation Growing number of species The fate of gene duplicates: Perish Find a new functional niche Tendency for functional expansion

Duplications, Speciations, and Orthology Two genes in two species are orthologous if they derive from one gene in their last common ancestor Orthologous genes are likely to have the same function Much stronger than “tend to have similar function”

the line represents a gene in some ancestral species a long long time ago in a land far far away speciation event there is a speciation event resulting in two species orthologous with the same, orthologous gene time one of the genes gets duplicated resulting in two paralogous genes another speciation event … but one of the paralogous genes is lost in one of the new species another speciation event current set of genes with apparent history Orthologous genes orthologs paralogs

Duplications, Speciations, and Orthology primal ancestor present genes evolutionary distance

Homologs, Orthologs, and Paralogs Homologous: one common ancestral gene Orthologous: separated by a speciation event Paralogous: separated by a duplication event Orthologs and Paralogs must be Homologs Are there homologous genes which are not orthologous nor paralogous? The view on orthology and paralogy is relative to a certain speciation

Inparalogs and Outparalogs Both, In- and Outparalogous genes are separated by a gene duplication event For Inparalogs, the duplication event is not followed by speciation(s) Outparalogs are separated by a duplication event, followed by speciation(s) Inparalogs are recent paralogs Outparalogs are more ancient paralogs Are Inparalogs Orthologs ? Depends on your definition: Yes: two genes are orthologous if they derive from one gene in the last common ancestor No: two genes are orthologous if they are only separated by cell division events

Reading Gene-Trees Although genes spec1,1 and spec2,1 are closer relatives, their distance is larger than that between spec1,1 and spec3,1 The tree suggests at least 2 gene losses

In-, and Outparalogs, Orthologs, and Co-orthologs

www = What, Why, and hoW? What: Orthologous genes are separated by cell division only Why: Orthologous genes are likely to have the same function How: Yes, how can orthologous relations be established ?

Several approaches The COG approach InParanoid Tree-based methods

COG approach Based on blast hits Establishment and extension of triangles:

COG approach II Extension of orthologous groups

InParanoid I Method denotes –IN- and OUTparalogs –For TWO species Find all hits from species A on B Find all hits from species B on A Find all bi-directional best hits (BBH) –These form putative orthologs

InParanoid II Find all hits from A on A Find all hits from B on B Find all InParalogs –These are all hits better than the orthologs –Better => more recently split

Detecting orthologous genes Usual methods based on blast hit quality: e.g. bi-directional best hit (BBH) BBH ortholog BBH ortholog

Genes with promiscuous domains Gene A may hit on gene B because of a shared domain X Gene B may hit on gene C because of a shared domain Y Promiscuous domains require (manual) curation

Tree-based methods 1.Get all homologous genes 2.Make multiple alignments 3.Generate phylogenetic gene trees 4.Analyze trees Uncertainty in multiple alignment? Different methods for distance calculations Superpose a trusted species tree? How to assess a level of accuracy?

The Phylogenetic Gene-Tree Multiple alignment for all genes Distance matrix calculation –Kimura correction –PAM model –Categories model Large trees: distance-based methods –Neighbor Joining

Uncertainty in trees Evolutionary noise –Differing rates of evolution –Convergent evolution (low complexity, coiled coils) –Promiscuous domains (recombination, fusion, fission) Use of heuristic methods –Multiple alignment –Tree making

Analyze trees … but don’t trust them fully Rigid analysis suggests many duplications and losses Presume scp branch is wrongly placed! If this is correct …. this can’t be

Three orthologous groups suggesting 15 gene losses Considering one wrongly placed gene leaves only 2 gene losses Analyze trees … but don’t trust them fully And if we accept wrong placement of branches …

Horizontal gene-transfer!

Remember … “ In-, and Outparalogs, Orthologs, and Co-orthologs”

Levels of Orthology

High-res versus Low-res Many, Complete, and Closely related genomes Use phylogenetic trees Challenge: Automatic Orthology assignment

Differential gene-loss

Things to think about (homework) Select a partner Collect a gene tree (and some copies) Carefully deduce which nodes are duplications and which are speciations Denote which genes are orthologous to each other (orthologous groups) Select interesting parts to predict what –The COG procedure would say –InParanoid would say –What would have happened if some genes (or species) where not involved in the analysis

Homework: also think about …