FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways.

Slides:



Advertisements
Similar presentations
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Advertisements

Phylogenetic analysis To infer and study evolutionary history of homologous gene families Manuel Ruiz (CIRAD, Data Integration team) Alexis Dereeper (IRD)
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
Chapter 25: Phylogeny and Systematics
Gramene Comparative & Phylogenomics Resources for Plants Joshua C. Stein 1, William Spooner 1, Sharon Wei 1, Liya Ren 1, Doreen Ware 1,2 1 Cold Spring.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Phylogenetic Trees Systematics, the scientific study of the diversity of organisms, reveals the evolutionary relationships between organisms. Taxonomy,
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Orthology, paralogy and GO annotation Paul D. Thomas SRI International.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Phylogenetic reconstruction
Comparative genomics Joachim Bargsten February 2012.
Chapter 18 Classification
Chapter 26 – Phylogeny & the Tree of Life
Molecular Evolution Revised 29/12/06
© Wiley Publishing All Rights Reserved. Phylogeny.
M ulti P aranoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple Proteomes Andrey Alexeyenko Ivica Tamas Gang Liu Erik L.L. Sonnhammer.
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
Benchmarking Orthology in Eukaryotes Nijmegen Tim Hulsen.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Bioinformatics and Phylogenetic Analysis
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
"Nothing in biology makes sense except in the light of evolution" Theodosius Dobzhansky.
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
Protein Modules An Introduction to Bioinformatics.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Phylogenetic trees Sushmita Roy BMI/CS 576
The diversity of genomes and the tree of life
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Human Impact on the Evolution of the Cheeta!!
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Genomics in Drug Organon, Oss Tim Hulsen.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Yeast genome sequencing: the power of comparative genomics MEDG 505, 03/02/04, Han Hao Molecular Microbiology (2004)53(2), 381 – 389.
Genome Analysis II Comparative Genomics Jiangbo Miao Apr. 25, 2002 CISC889-02S: Bioinformatics.
Genomic and comparative genomic analysis BIO520 BioinformaticsJim Lund.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Using blast to study gene evolution – an example.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970) Comparative genomics, genome context.
1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Organon Tim Hulsen, Oss, November 11, 2003.
Classification. Cell Types Cells come in all types of shapes and sizes. Cell Membrane – cells are surrounded by a thin flexible layer Also known as a.
Classification.
Phylogenetics.
Phylogeny & Systematics
Classification and Phylogenetic Relationships
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Chapter 25: Phylogeny and Systematics. “Taxonomy is the division of organisms into categories based on… similarities and differences.” p. 495, Campbell.
Phylogeny.
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Taxonomy, Classification... and some phylogeny too!
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Evolutionary genomics can now be applied beyond ‘model’ organisms
BLAST program selection guide
Basics of Comparative Genomics
Genome Annotation Continued
Chapter 26 Phylogeny and the Tree of Life
Phylogeny and Systematics
Volume 22, Issue 6, Pages (June 2012)
Phylogenetics Chapter 26.
Gautam Dey, Tobias Meyer  Cell Systems 
Basics of Comparative Genomics
Phylogeny and the Tree of Life
Presentation transcript:

FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways in Saccharomyces cerevisiae

What is this presentation about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)? Why high-resolution orthology? Automated high-resolution orthology detection The FOG database and some applications

Orthology “This gene in that other species …” We don’t have chicken genes ! They mean: the corresponding gene ? Why that particular gene ? Sure this actually is the gene ? Sure that all n orthologs are correct ?

the line represents a gene in some ancestral species a long long time ago in a land far far away speciation event there is a speciation event resulting in two species orthologous with the same, orthologous gene time one of the genes gets duplicated resulting in two paralogous genes another speciation event … but one of the paralogous genes is lost in one of the new species another speciation event current set of genes with apparent history Orthologous genes orthologs paralogs

Duplications, Speciations, and Orthology Two genes in two species are orthologous if they derive from one gene in their last common ancestor Orthologous genes are likely to have the same function

Detecting orthologous genes Usual methods based on blast hit quality: e.g. bi-directional best hit (BBH) BBH ortholog BBH ortholog

KOG clusters Based on triangle of BBH between genes of three species InParalogs are added Triangles are extended by other genes and other species

KOG statistics These large KOG clusters must have multiple representatives per species Low Resolution: There must be functional specialization within these clusters!

High-res versus Low-res Many, Complete, and Closely related genomes Challenge: Automatic Orthology assignment

Gene Families Use PSI-blast to recognize (distant) homologs Split gene set into families of homologous genes Challenge: Promiscuous domains Multi domain genes occur very often in Eukaryotic genomes

Gene Families Promiscuous domains cause genes to be only partially homologous: –Gene A-B is partially homolgous to gene A-C, as is gene B-C Merging everything with homologous parts generates far too large gene families: –Not possible to obtain proper multiple alignments More advanced technique for separating multi- domain genes into gene families

Generating Gene Families More advanced technique for the merging of genes into gene families is not functional yet Fall back on ‘known’ gene families using KOG: –Low resolution orthology assignments for Eukaryotes –Some inclusive families with many genes per species Some statistics: 15 Fungal species with genes in total Divided into KOG clusters (gene families) Involving genes (= 68%)

Uncertainty in trees Evolutionary noise –Differing rates of evolution –Convergent evolution (low complexity, coiled coils) –Promiscuous domains (recombination, fusion, fission) Use of heuristic methods –Multiple alignment –Tree making

Reading Gene-Trees Although genes spec1,1 and spec2,1 are closer relatives, their distance is larger than that between spec1,1 and spec3,1 The tree suggests at least 2 gene losses

Analyze trees … but don’t trust them fully Rigid analysis suggests many duplications and losses Presume scp branch is wrongly placed! If this is correct …. this can’t be

Three orthologous groups suggesting 15 gene losses Considering one wrongly placed gene leaves only 2 gene losses Analyze trees … but don’t trust them fully And if we accept wrong placement of branches …

Automatic Orthology assignment LOFT: Levels of Orthology From Trees

Result Collection of genes is split into KOG families KOG families are aligned and phylogenetic trees are derived Phylogenetic trees are analyzed using LOFT resulting in high-resolution orthology

Result

Can LOFT be trusted?

It seems okay!

Applications We now have FOG: a complete set of high resolution orthology assignments for fungi We ‘know’ which orthologous genes are present and absent in which species Phyletic distribution

Complex I

Phyletic distribution of mitochondrial orthologous groups

Phylogenetic Tree for Mitochondrial Carrier Proteins

Orthologous group 24 is an uncharacterized mitochondrial carrier It is present in all fungi, except in Ashbya gossypii In yeast this is known as YMC1, unknown function

YMC1: predicted glycine/serine antiporter There are three S.cerevisiae genes with the same phyletic distribution: –subunit glycine decarboxylase –other subunit glycine decarboxylase –gene with unknown function