MCB 3421 class 26.

Slides:



Advertisements
Similar presentations
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
MCB 5472 Supertrees vs Supermatrix Assembly of Gene Families Peter Gogarten Office: BSP 404 phone: ,
Maria Poptsova University of Connecticut Dept. of Molecular and Cell Biology August 18, 2006, Stanford University, CA AUTOMATED ASSEMBLY OF GENE FAMILIES.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic reconstruction
Types of homology BLAST
A Web Interface to analyse SOM of Bipartitions of Gene Phylogenies - A Walk Through J. Peter Gogarten, Maria Poptsova Dept. of Molecular and Cell Biology.
New Tools for Visualizing Genome Evolution Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island J. Peter Gogarten Dept. of Molecular.
Molecular Evolution Revised 29/12/06
© Wiley Publishing All Rights Reserved. Phylogeny.
M ulti P aranoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple Proteomes Andrey Alexeyenko Ivica Tamas Gang Liu Erik L.L. Sonnhammer.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Sequence alignment: Removing ambiguous positions: Generation of pseudosamples: Calculating and evaluating phylogenies: Comparing phylogenies: Comparing.
MCB 5472 Assembly of Gene Families Peter Gogarten Office: BSP 404 phone: ,
Bioinformatics and Phylogenetic Analysis
MCB 5472 Gene Families, Super Trees and Super Matrices Peter Gogarten Office: BSP 404 phone: ,
Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
Example of bipartition analysis for five genomes of photosynthetic bacteria (188 gene families) total 10 bipartitions R: Rhodobacter capsulatus, H: Heliobacillus.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
MCB 372 #12: Tree, Quartets and Supermatrix Approaches Collaborators: Olga Zhaxybayeva (Dalhousie) Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
MCB 372 #14: Student Presentations, Discussion, Clustering Genes Based on Phylogenetic Information J. Peter Gogarten University of Connecticut Dept. of.
Bioinformatics tools for phylogeny and visualization
Phylogenetic trees Sushmita Roy BMI/CS 576
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
MCB5472 Computer methods in molecular evolution Lecture 3/22/2014.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Molecular phylogenetics
Coalescence and the Cenancestor J. Peter Gogarten University of Connecticut Department of Molecular and Cell Biology.
Pollen transcript unigene identifier log 2 -fold change Annotation (BLAST) Unigene L. longiflorum chloroplast, complete genome Unigene
SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.
MCB5472 Computer methods in molecular evolution Lecture 4/21/2014.
Identify gene markers for different taxonomic groups in Archaea and Bacteria Genomes Dongying Wu 1,2, Jonathan A. Eisen 1,2 1. DOE Joint Genome Institute,
MCB 3421 class 25. student evaluations Please follow this link to the on-line surveys that are open for you this semester.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
BINF6201/8201 Molecular phylogenetic methods
3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
The bootstrap, consenus-trees, and super-trees Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetic analyses of cyanobacterial genomes: Quantification of horizontal gene transfer events Olga Zhaxybayeva, J. Peter Gogarten, Robert L. Charlebois,
Using blast to study gene evolution – an example.
MCB5472 Computer methods in molecular evolution Slides for comp lab 4/2/2014.
MCB 3421 class 26.
Phylogeny & Systematics
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
CS 598 AGB Supertrees Tandy Warnow. Today’s Material Supertree construction: given set of trees on subsets of S (the full set of taxa), construct tree.
First & Last Name August X, 2000 Evolution
Phylogenetic genome analysis, phylogenomics
Introduction to Bioinformatics Resources for DNA Barcoding
Phylogeny - based on whole genome data
phylogenetic inferences
Phylogenetic Inference
Phylogenomic Analysis of Spiders Reveals Nonmonophyly of Orb Weavers
Phylogenetic Trees.
Dr Tan Tin Wee Director Bioinformatics Centre
Comments on bipartitions, quartets and supertrees
Chapter 19 Molecular Phylogenetics
Phylogeny.
Molecular data assisted morphological analyses
Gautam Dey, Tobias Meyer  Cell Systems 
Phylogenetic tree of 38 Pseudomonas type strains, based on the V3-V5 region sequence of the 16S rRNA gene (V3 primer, positions 442 to 492; and V5 primer,
Phylogenetic comparison among selected Pasteurella multocida and Haemophilus influenzae species with completed genome sequences. Phylogenetic comparison.
Presentation transcript:

MCB 3421 class 26

student evaluations Please follow this link to the on-line surveys that are open for you this semester.

Decomposition of Phylogenetic Data Phylogenetic information present in genomes Break information into small quanta of information (bipartitions or embedded quartets) Analyze spectra to detect transferred genes and plurality consensus.

BIPARTITION OF A PHYLOGENETIC TREE Bipartition (or split) – a division of a phylogenetic tree into two parts that are connected by a single branch. It divides a dataset into two groups, but it does not consider the relationships within each of the two groups. Yellow vs Rest * * * . . . * * 95 compatible to illustrated bipartition Orange vs Rest . . * . . . . * * * * . . . . . incompatible to illustrated bipartition

“Lento”-plot of 34 supported bipartitions (out of 4082 possible) 13 gamma- proteobacterial genomes (258 putative orthologs): E.coli Buchnera Haemophilus Pasteurella Salmonella Yersinia pestis (2 strains) Vibrio Xanthomonas (2 sp.) Pseudomonas Wigglesworthia There are 13,749,310,575 possible unrooted tree topologies for 13 genomes

C D C C D D A B B B A A B C C D C D D A A B A B B N=4(0) N=5(1) N=8(4) 0.01 0.01 N=4(0) N=5(1) N=8(4) 0.01 A 0.01 0.01 B B B A A B C C D C D D A A B A B B N=13(9) N=23(19) N=53(49) From: Mao F, Williams D, Zhaxybayeva O, Poptsova M, Lapierre P, Gogarten JP, Xu Y (2012) BMC Bioinformatics 13:123, doi:10.1186/1471-2105-13-123

Results : Maximum Bootstrap Support value for Bipartition separating (AB) and (CD) Maximum Bootstrap Support value for embedded Quartet (AB),(CD)

Bootstrap support values for embedded quartets + : tree calculated from one pseudo-sample generated by bootstraping from an alignment of one gene family present in 11 genomes : embedded quartet for genomes 1, 4, 9, and 10 . This bootstrap sample supports the topology ((1,4),9,10). 1 9 1 9 1 10 4 10 10 4 9 4  Zhaxybayeva et al. 2006, Genome Research, 16(9):1099-108 Quartet spectral analyses of genomes iterates over three loops: Repeat for all bootstrap samples. Repeat for all possible embedded quartets. Repeat for all gene families.

effective population size about 1013 2*Ne generations >> 10 billion years

Illustration of one component of a quartet spectral analyses Summary of phylogenetic information for one genome quartet for all gene families Total number of gene families containing the species quartet Number of gene families supporting the same topology as the plurality (colored according to bootstrap support level) Number of gene families supporting one of the two alternative quartet topologies

Quartet decomposition analysis of 19 Prochlorococcus and marine Synechococcus genomes. Quartets with a very short internal branch or very long external branches as well those resolved by less than 30% of gene families were excluded from the analyses to minimize artifacts of phylogenetic reconstruction.

Plurality consensus calculated as supertree (MRP) from quartets in the plurality topology.

NeighborNet (calculated with SplitsTree 4.0) Plurality neighbor-net calculated as supertree (from the MRP matrix using SplitsTree 4.0) from all quartets significantly supported by all individual gene families (1812) without in-paralogs.

From: Delsuc F, Brinkmann H, Philippe H. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet. 2005 May;6(5):361-75.

Supertree vs. Supermatrix Trends Ecol Evol. 2007 Jan;22(1):34-41 The supermatrix approach to systematics Alan de Queiroz John Gatesy: From: Schematic of MRP supertree (left) and parsimony supermatrix (right) approaches to the analysis of three data sets. Clade C+D is supported by all three separate data sets, but not by the supermatrix. Synapomorphies for clade C+D are highlighted in pink. Clade A+B+C is not supported by separate analyses of the three data sets, but is supported by the supermatrix. Synapomorphies for clade A+B+C are highlighted in blue. E is the outgroup used to root the tree.

B) Generate 100 datasets using Evolver with certain amount of HGTs A) Template tree C) Calculate 1 tree using the concatenated dataset or 100 individual trees D) Calculate Quartet based tree using Quartet Suite Repeated 100 times…

Supermatrix versus Quartet based Supertree inset: simulated phylogeny

From: Lapierre P, Lasek-Nesselquist E, and Gogarten JP (2012) Note : Using same genome seed random number will reproduce same genome history From: Lapierre P, Lasek-Nesselquist E, and Gogarten JP (2012) The impact of HGT on phylogenomic reconstruction methods Brief Bioinform [first published online August 20, 2012] doi:10.1093/bib/bbs050

HGT EvolSimulator Results

See http://bib. oxfordjournals. org/content/15/1/79 See http://bib.oxfordjournals.org/content/15/1/79.full for more information. What is the bottom line?

Johann Heinrich Füssli Odysseus vor Scilla und Charybdis From: http://en.wikipedia.org/wiki/File:Johann_Heinrich_F%C3%BCssli_054.jpg

Evolution of the Holobiont Holobiont: Host + all its symbionts (mutualistic, commensal, parasitic) Microbiome: Sum of all genes contained in the symbionts Microbiota: Sum of all symbiotic organisms Hologenome: Microbiome + host genome Selection acts on the holobiont The holobiont can adapt through changing it symbionts To what extend do examples for holobiont evolution represent evolution by natural selection, Lamarckian evolution, or constructive neutral evolution.

Holobiont evolution – case A Bacterial parasites on seaweed HGT Human gut symbiont

Holobiont Evolution – case B Hygene / old “friends hypothesis coevolution / arms race between immune system and parasites Parasite: survive in host -> minimize host’s immune response -> produce immune response modulating substances Host: Keep immune system effective -> increase immune response to remain effective in presence of parasites’ (or symbionts’) immune system modulating influence without parasite/symbiont immune system over reacts

Examples B1 is an ortholog to C1 and to A1 C2 is a paralog to C3 and to B1; BUT A1 is an ortholog to both B1, B2,and to C1, C2, and C3 From: Walter Fitch (2000): Homology: a personal view on some of the problems, TIG 16 (5) 227-231

Types of Paralogs: In- and Outparalogs …. all genes in the HA* set are co-orthologous to all genes in the WA* set. The genes HA* are hence ‘inparalogs’ to each other when comparing human to worm. By contrast, the genes HB and HA* are ‘outparalogs’ when comparing human with worm. However, HB and HA*, and WB and WA* are inparalogs when comparing with yeast, because the animal–yeast split pre-dates the HA*–HB duplication. From: Sonnhammer and Koonin: Orthology, paralogy and proposed classification for paralog TIG 18 (12) 2002, 619-620

Selection of Orthologous Gene Families All automated methods for assembling sets of orthologous genes are based on sequence similarities. BLAST hits Triangular circular BLAST significant hits (COG, or Cluster of Orthologous Groups) Sequence identity of 30% and greater (SCOP database) Similarity complemented by HMM-profile analysis Pfam database Reciprocal BLAST hit method

Strict Reciprocal BLAST Hit Method 2’ 1 2 1 2 3 4 3 4 0 gene family 1 gene family often fails in the presence of paralogs

Families of ATP-synthases Phylogenetic Tree Family of ATP-A Sulfolobus solfataricus ATP-A Methanosarcina mazei Bacillus subtilis ATP-A ATP-A ATP-A Escherichia coli Bacillus subtilis ATP-F ATP-B Escherichia coli ATP-F Escherichia coli ATP-B ATP-B Bacillus subtilis ATP-B Sulfolobus solfataricus Family of ATP-F Methanosarcina mazei Family of ATP-B

BranchClust Algorithm genome 1 genome i genome 2 hits BLAST genome 3 genome N dataset of N genomes superfamily tree www.bioinformatics.org/branchclust

BranchClust Algorithm www.bioinformatics.org/branchclust

BranchClust Algorithm Data Flow Download n complete genomes (ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria) In fasta format (*.faa) Align with ClustalW Reconstruct superfamily tree ClustalW –quick distance method Phyml – Maximum Likelihood Put all n genomes in one database Search all ORF against database, consisting of n genomes Parse with BranchClust Gene families Parse BLAST-output with the requirement that all members of a superfamily should have an E-value better than a cut-off Superfamilies www.bioinformatics.org/branchclust

BranchClust Algorithm Implementation and Usage The BranchClust algorithm is implemented in Perl with the use of the BioPerl module for parsing trees and is freely available at http://bioinformatics.org/branchclust Required: 1.Bioperl module for parsing trees  Bio::TreeIO 2. Taxa recognition file gi_numbers.out must be present in the current directory. For information on how to create this file, read the Taxa recognition file section on the web-site. 3. Blastall from NCB needs to be installed. www.bioinformatics.org/branchclust