Comparative Genomics and New Evolutionary Biology

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
CITE EVIDENCE THAT ORGANISMS ARE LINKED BY LINES OF DESCENT FROM COMMON ANCESTRY LEARNING GOAL.
THE EVOLUTIONARY HISTORY OF BIODIVERSITY
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic reconstruction
Chapter 26 – Phylogeny & the Tree of Life
Molecular Evolution Revised 29/12/06
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
Phylogeny and the Tree of Life
Scientific FieldsScientific Fields  Different fields of science have contributed evidence for the theory of evolution  Anatomy  Embryology  Biochemistry.
and the three domain system
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
The Origin of Eukaryotic Cells  With lots of perplexities and guesses, researchers did many experiments to bring it to light.
Fluidity of the 16S rRNA Gene Sequence within Aeromonas Strains Alessia Morandi Institute for Infectious Diseases University of Berne.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
BINF6201/8201 Molecular phylogenetic methods
3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Phylogenetic Trees: Common Ancestry and Divergence 1B1: Organisms share many conserved core processes and features that evolved and are widely distributed.
Evolution of Genomes Evolution of the eukaryotic cell Human evolution.
Systematics and the Phylogenetic Revolution Chapter 23.
Introduction to Phylogenetic Trees
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
26.1 Organisms Evolve Through Genetic Change Occurring Within Populations. “Nothing in Biology makes sense except in the light of Evolution” –Theodosius.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Phylogeny Ch. 7 & 8.
Agenda Microevolution Test Reflection
Classification. Cell Types Cells come in all types of shapes and sizes. Cell Membrane – cells are surrounded by a thin flexible layer Also known as a.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Major characteristics used in taxonomy
Universal Tree of Life  Universal tree ids the roadmap of life. It depicts the evolutionary history of the cells of all organism and the criteria reveals.
Molecular Clocks and Continued Research
Taxonomy & Phylogeny. B-5.6 Summarize ways that scientists use data from a variety of sources to investigate and critically analyze aspects of evolutionary.
Section 2: Modern Systematics
TOPIC 7- EVIDENCE FOR THE THEORY OF EVOLUTION
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments Xinjun Zhang.
Phylogenetic genome analysis, phylogenomics
5.3: Classification & biodiversity
The process of evolution drives the diversity and unity of life
Evolutionary genomics can now be applied beyond ‘model’ organisms
Phylogenetic basis of systematics
The Science of Biology Chapter 1.
Section 2: Modern Systematics
Domains & Dogma.
The Major Lineages of Life
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Agenda 10/8 Seashell Sort Phylogeny Lecture Phylogenetics Pracice
Domains & Dogma.
Summary and Recommendations
5 kingdoms.
Chapter 20 Three Domains of Life.
Chapter 19 Molecular Phylogenetics
Phylogenics & Molecular Clocks
Gautam Dey, Tobias Meyer  Cell Systems 
Unit Genomic sequencing
Chapter 26 Phylogeny and the Tree of Life
Chapter 20 Phylogeny and the Tree of Life
Higher Biology Unit 1: 1.7 Evolution.
Summary and Recommendations
Domains & Dogma.
Essential knowledge 1.B.1:
Phylogeny and the Tree of Life
Presentation transcript:

Comparative Genomics and New Evolutionary Biology 09/26/2008

How does comparative genomics change our view of the evolution of life Traditionally, we believe that genomes are stable and evolve gradually through veridical inheritances. Now we believe that genomes are in flux, gene loss and HGT are major forces shaping the genomes, rather than isolated incidents of little consequences. Comparative genomics is revealing the true complexity of evolution, and is shaking many traditional concept of evolution, e.g. it uproots the Tree of life, but it also provide data to build a better and more realistic tree. New theoretical and algorithmic developments lie ahead to integrate and interpret these data. The availability of genome sequences from many diverse phylogenetic groups will provide possibility of reconstructing genomes of ancestral life form, including the last Universal Common Ancestor (LUCA) of all the extant life forms on this planet.

The three domains of life The theory of three domains of life was original proposed by Carl Woese in 1980s based on the analysis of the phylogenetic trees of rRNA sequences then available. http://www.life.uiuc.edu/micro/images/figures/enlarge/woese.gif

The three domains of life The theory is also supported by other evidences: Archaea have unique membrane lipid compositions than bacteria Archaea phenotypically looks like bacteria, and clearly are prokaryotes, but they are more similar to eukaryote in some other respects: ---- their ribosomes share a number of proteins with eukaryotes, but not with bacteria; ---- share RNA polymerase with eukaryotes; ---- the presence of histone in DNA structure; ---- similar organization of the DNA replication apparatus

The three domains of life supported by comparative genomics Unique core COGs of the three domains of life

The three domains of life supported by comparative genomics Sequence similarity among archaea, between archaea and bacteria, and between archaea and eukaryotes, is in decreasing order. A-A Hit A-B A-E Score

The three domains of life supported by comparative genomics The distribution of 310 COG shared by 13 archaeal genomes reveals that archaea are bacterial in shell but eukaryotic in core. Achaea specific A+B-E Metabolism related B+A+E Information process related A+E-B

The three domains of life supported by comparative genomics The distribution of 310 COG shared by 13 archaeal genomes reveals that archaea are “bacterial in metabolism and much of cell biology and eukaryotic in basal information processing systems”. A+E-B A+B-E A+E-B A+B-E A+B-E A+B-E A+E-B A+E-B

The three domains of life supported by comparative genomics Two important conclusions from comparative genomics analyses of the three domains of life. Archaea and bacteria share a substantial gene pool, part of which is ancient heritage of the common ancestor of these two domains, and part is the result of HGT; There is a small but critically important core of proteins in archaea, primarily involved in information processing, that reflects shared history of archaea and eukaryotes.

Prevalence of lineage-specific gene loss and horizontal gene transfer in evolution Lineage-specific gene loss and horizontal gene transfer are two common evolutionary phenomena, but the prevalence of HGT is only fully recognized through comparative genomics analyses. Correlation between the similarity in organisms’ lifestyles and the apparent number of genes they share. Bacterial hyperthermopiles have 15-20% their genes of archaeal origin while their close relative mesophiles only have 1-5% archaeal genes; Mesophilic methanogen archaea have ~30% their genes of bacterial origin, while their hyperthermophilic relatives only have ~3% mesophilic bacterial genes. This larger scale sharing in genes can only be explained by HGT.

Correlation between the similarity in organisms’ lifestyles and the number of genes they share Bacterial hyperthermophile Thermoanaerobacter tengcongensis 258 (yellow dots) (10%) of 2588 genes are more similar to archaeal genes than their bacterial homologs Bacterial mesophile Bacillus sutilus Only 174 (4.2%) of 4112 genes are more similar to archaeal genes than their bacterial homologs

Correlation between the similarity in organisms’ lifestyles and the number of genes they share Archaeal hyperthermophile Methanopyrus. kandleri Only 98 (6%) (blue dots) of 1687 genes are more similar to bacterial genes than their archaeal homologs Archaeal mesophile Methanopyrus. acetivorans 1453 (32%) of 4540 genes are more similar to bacterial genes than their archaeal homologs

The indicatives of HGT Surrogate criteria Sequence similarity Codon bias Unexpected conservation of gene order between distant species Phylogenetic tree criteria Disagreement between the species tree and the gene tree.

HGT can occur in essential genes HGT is more prevalent in no essential genes, and this is explained by the complexity hypothesis: Genes coding for protein subunits of macromolecular complexes or, more generally, proteins involved in a wide range of interactions, are less subject to HGT. But HGT is also found for essential genes. Glutamate and glutamine animoacyl-tRNA synthetases (E/Q-aaRS) E-aaRS Transamidase E + tRNAE E-tRNAE Q-tRNAE Q-aaRS Q + tRNAQ Q-tRNAQ

HGT for aminoacyl-tRNA synthetases HGT for glutamate and glutamine animoacyl-tRNA synthetases (Q-aaRS); g-proteobacteria acquired Q-aaRS from eukaryotes

HGT for aminoacyl-tRNA synthetases HGT for tryptophan animoacyl-tRNA synthetases (W-aaRS); Archeeon P. horikoshii acquired W-aaRS from eukaryotes

HGT between prokaryotes and animals Phylogenetic tree of monoaminoxidases (MAO).

HGT or gene loss or vertical inheritance ? If the species tree are known, then the distribution of a COG on the tree can give a possible scenario of the evolution of the COG. Eubacteria In this COG 1 can be easily explained a vertical evolution. However, to explain the evolution of COG 2 has to invoke HGT and gene loss. If COG 2 emerged at root 2, then two HGT to archaea can explain the distribution. 2 1 COG1 3 COG2 Archaea If COG 2 emerged at root 1, then 4 (in eubacteria)+4 (in archaea) = 8 gene losses are needed to explain the distribution.

A simple algorithm for computing the number of gene loss and TGH Mixed scenarios can also explain the observed distribution, e.g., if COG 2 emerged at root 3. then it would take one gene loss and in archaea and one HGT to eubacteria Designate Incompatibility Quotient for gene i to measure the most parsimonious number of gene loss and HGT events to reconcile its gene tree and given species tree, where l is the number of gene loss, h is the number of HGT events in the minimal (most parsimonious) evolutionary scenario for the given gene, e.g. a COG, and g is “HGT penalty”.

A simple algorithm for computing the number of gene loss and TGH Number of gene loss and HGT events in most parsimonious evolutionary scenarios for COGs at g=1. This data suggest that most COG have undergone HGT and gene loss events.

Tree of life: before and after comparative genomics Phylogenetic trees in the pre-genomic era Charles Darwin’s tree of life: conceptual evolutionary tree of life

Phylogenetic trees in the pre-genomic era Ernst Haeckel’s tree of life: based on the similarity of morphological features Phylogenetic trees based on molecular sequences: ---- first molecular tree was based on cytochromes c and globins ---- widely accepted Tree of Life was based on small subunit ribosome RNA (rRNA) sequences These trees were build based on the molecular clock assumption: genes evolved at constant rate as long as the function of the gene product remains unchanged. Molecular phylogenetic tree was equated with the species tree assuming that the possibility of finding an optimal molecular marker for deciphering the history of life, e.g., rRNA.

Tree of life: after comparative genomics Comparative genomics threatens the species tree concept With multiple completed genome sequences available, detailed analyses of protein families revealed that, there was not reliable phylogenetic signal in the trees even after probable HGTs were removed Even there is no consensus phylogeny for the archaea in the conserved core of archaeal genomes. The three main problems with using single genes to infer a species tree are --- insufficient number of informative sites, --- variability of evolutionary rates in different lineages --- the effect of HGT.

Tree of life: after comparative genomics Comparative genomics threatens the species tree concept Thus there are some concerns that comparative genomics might uproot the very concept of a Tree of Life, at least for the prokaryotes. However, this challenge on the Tree of Life might also offer a way to salvage the concept itself by considering the entire body of information contained in the genomes or a rationally selected substantial part of this information.

Methods for Construction of Consensus Genome Trees Criterion/Approach Method(s) Principal results Ref. Gene content Parsimony, distance methods Trees reflect partly phylogeny and partly similar lifestyles. Phylogenetic signal enhanced when distances normalized by genome size, but resolution limited. [230,358, 373,519, 605,786, 835,915] Gene order Results similar to gene content; effect of HGT noticeable. [470, 915] Mean similarity between orthologs Distance methods Trees appear to reflect largely phylogenetic relationships; limited resolution but some putative new lineages detected. [148,470, 915] Concatenated alignments of proteins less prone to HGT (e.g. ribosomal) Maximum likelihood, distance methods Results largely compatible with the mean similarity approach, but with better resolution; several potential new lineages detected. [119,332, 552,915] Consensus of phylogenetic analysis of multiple orthologous sets Used to verify the above approaches. Most of the new lineages strongly suggested by genome trees supported. [915]

A Consensus Genome Trees Based on results of various genome-tree analyses, in particular, ---- trees made using the median similarity between orthologs ---- those based on concatenated alignments Different groups have attempted to depict the apparent consensus.

Another consensus genome tree Francesca D. Ciccarelli, et al. Science 311, 1283 (2006); The tree has its basis in a concatenation of 31 orthologs occurring in 191 species with sequenced genomes The tree suggests a thermophilic last universal common ancestor.

Post-genomic View of the Tree of Life The simple notion of a single Tree of Life that would accurately and completely depict the evolution of all life forms is gone forever. However, there is a phylogenetic signal in the sequences of prokaryotic proteins, but it is weak because of massive gene loss and HGT. It seems that, to capture this faint signal, analysis of genome-wide protein sets or carefully selected subsets is required. The concept of the Tree of Life is bound to change in the post-genomic world. It cannot be thought of as a definitive “species tree” anymore, but only as a central trend in the rich patchwork of evolutionary history, replete with gene loss and HGT.