MCB 3421 class 26.

Slides:



Advertisements
Similar presentations
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Evolutionary Analysis. Tree Mathematical structure Model evolutionary history.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Phylogenetic reconstruction
A Web Interface to analyse SOM of Bipartitions of Gene Phylogenies - A Walk Through J. Peter Gogarten, Maria Poptsova Dept. of Molecular and Cell Biology.
New Tools for Visualizing Genome Evolution Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island J. Peter Gogarten Dept. of Molecular.
Molecular Evolution Revised 29/12/06
© Wiley Publishing All Rights Reserved. Phylogeny.
The Cobweb of life revealed by Genome-Scale estimates of Horizontal Gene Transfer Fan Ge, Li-San Wang, Junhyong Kim Mourya Vardhan.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication.
Sequence alignment: Removing ambiguous positions: Generation of pseudosamples: Calculating and evaluating phylogenies: Comparing phylogenies: Comparing.
Sequence alignment: Removing ambiguous positions: Generation of pseudosamples: Calculating and evaluating phylogenies: Comparing phylogenies: Comparing.
MCB 5472 Assembly of Gene Families Peter Gogarten Office: BSP 404 phone: ,
MCB 5472 Gene Families, Super Trees and Super Matrices Peter Gogarten Office: BSP 404 phone: ,
Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.
The (Supertree) of Life: Procedures, Problems, and Prospects Presented by Usman Roshan.
Steps of the phylogenetic analysis
Branches, splits, bipartitions In a rooted tree: clades (for urooted trees sometimes the term clann is used) Mono-, Para-, polyphyletic groups, cladists.
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
Bas E. Dutilh Phylogenomics Using complete genomes to determine the phylogeny of species.
Example of bipartition analysis for five genomes of photosynthetic bacteria (188 gene families) total 10 bipartitions R: Rhodobacter capsulatus, H: Heliobacillus.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Trees – what might they mean? Calculating a tree is comparatively easy, figuring out what it might mean is much more difficult. If this is the probable.
Cenancestor (aka LUCA or MRCA) can be placed using the echo remaining from the early expansion of the genetic code. reflects only a single cellular component.
MCB 372 #12: Tree, Quartets and Supermatrix Approaches Collaborators: Olga Zhaxybayeva (Dalhousie) Jinling Huang (ECU) Tim Harlow (UConn) Pascal Lapierre.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Short Primer on Comparative Genomics Today: Special guest lecture 12pm, Alway M108 Comparative genomics of animals and plants Adam Siepel Assistant Professor.
MCB 372 #14: Student Presentations, Discussion, Clustering Genes Based on Phylogenetic Information J. Peter Gogarten University of Connecticut Dept. of.
Bioinformatics tools for phylogeny and visualization
Phylogenetic trees Sushmita Roy BMI/CS 576
MCB5472 Computer methods in molecular evolution Lecture 3/22/2014.
Molecular phylogenetics
Coalescence and the Cenancestor J. Peter Gogarten University of Connecticut Department of Molecular and Cell Biology.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery.
MCB5472 Computer methods in molecular evolution Lecture 4/21/2014.
COMPUTATIONAL MODELS FOR PHYLOGENETIC ANALYSIS K. R. PARDASANI DEPTT OF APPLIED MATHEMATICS MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY (MANIT) BHOPAL.
MCB 3421 class 25. student evaluations Please follow this link to the on-line surveys that are open for you this semester.
BINF6201/8201 Molecular phylogenetic methods
3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
The bootstrap, consenus-trees, and super-trees Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetic analyses of cyanobacterial genomes: Quantification of horizontal gene transfer events Olga Zhaxybayeva, J. Peter Gogarten, Robert L. Charlebois,
Molecular Phylogeny. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
ATPase dataset from last Friday Alignment clustal vs muscle Conserved part are aligned reproducibly.
MCB5472 Computer methods in molecular evolution Slides for comp lab 4/2/2014.
Understanding sets of trees CS 394C September 10, 2009.
Phylogeny & Systematics
Bayes’ Theorem Reverend Thomas Bayes ( ) Posterior Probability represents the degree to which we believe a given model accurately describes the.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
SupreFine, a new supertree method Shel Swenson September 17th 2009.
CS 598 AGB Supertrees Tandy Warnow. Today’s Material Supertree construction: given set of trees on subsets of S (the full set of taxa), construct tree.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Darwin’s Tree of Life, July million species Phylogenetic inference from genomic.
Phylogenetic genome analysis, phylogenomics
MCB 3421 class 26.
Phylogenomic Analysis of Spiders Reveals Nonmonophyly of Orb Weavers
Phylogenetic Trees.
Comments on bipartitions, quartets and supertrees
Chapter 19 Molecular Phylogenetics
Phylogenetic tree of 38 Pseudomonas type strains, based on the V3-V5 region sequence of the 16S rRNA gene (V3 primer, positions 442 to 492; and V5 primer,
Phylogenetic tree of 38 Pseudomonas type strains, based on a concatenated nine-gene MLST analysis. Phylogenetic tree of 38 Pseudomonas type strains, based.
Presentation transcript:

MCB 3421 class 26

student evaluations Please go to husky CT and complete student evaluations ! Current count: Friday morning: 3 Friday afternoon: 4

UNC reads Edinburgh reads both mapped on the UNC assembly

Decomposition of Phylogenetic Data Phylogenetic information present in genomes Break information into small quanta of information (bipartitions or embedded quartets) Analyze spectra to detect transferred genes and plurality consensus.

BIPARTITION OF A PHYLOGENETIC TREE Bipartition (or split) – a division of a phylogenetic tree into two parts that are connected by a single branch. It divides a dataset into two groups, but it does not consider the relationships within each of the two groups. Yellow vs Rest * * * . . . * * 95 compatible to illustrated bipartition Orange vs Rest . . * . . . . * * * * . . . . . incompatible to illustrated bipartition

“Lento”-plot of 34 supported bipartitions (out of 4082 possible) 13 gamma- proteobacterial genomes (258 putative orthologs): E.coli Buchnera Haemophilus Pasteurella Salmonella Yersinia pestis (2 strains) Vibrio Xanthomonas (2 sp.) Pseudomonas Wigglesworthia There are 13,749,310,575 possible unrooted tree topologies for 13 genomes

“Lento”-plot of supported bipartitions (out of 501 possible) 10 cyanobacteria: Anabaena Trichodesmium Synechocystis sp. Prochlorococcus marinus (3 strains) Marine Synechococcus Thermo- synechococcus elongatus Gloeobacter Nostoc punctioforme Number of datasets Based on 678 sets of orthologous genes Zhaxybayeva, Lapierre and Gogarten, Trends in Genetics, 2004, 20(5): 254-260.

C D C C D D A B B B A A B C C D C D D A A B A B B N=4(0) N=5(1) N=8(4) 0.01 0.01 N=4(0) N=5(1) N=8(4) 0.01 A 0.01 0.01 B B B A A B C C D C D D A A B A B B N=13(9) N=23(19) N=53(49) From: Mao F, Williams D, Zhaxybayeva O, Poptsova M, Lapierre P, Gogarten JP, Xu Y (2012) BMC Bioinformatics 13:123, doi:10.1186/1471-2105-13-123

Methodology : Input tree Repeat 100 times Seq-Gen Consense Aligned Simulated AA Sequences (200,500 and 1000 AA) Seq-Gen WAG, Cat=4 Alpha=1 Seqboot 100 Bootstraps ML Tree Calculation FastTree, WAG, Cat=4 Repeat 100 times Extract Highest Bootstrap support separating AB><CD Consense Extract Bipartitions For each individual trees Count How many trees embedded quartet AB><CD is supported

Results : Maximum Bootstrap Support value for Bipartition separating (AB) and (CD) Maximum Bootstrap Support value for embedded Quartet (AB),(CD)

Bootstrap support values for embedded quartets + : tree calculated from one pseudo-sample generated by bootstraping from an alignment of one gene family present in 11 genomes : embedded quartet for genomes 1, 4, 9, and 10 . This bootstrap sample supports the topology ((1,4),9,10). 1 9 1 9 1 10 4 10 10 4 9 4  Zhaxybayeva et al. 2006, Genome Research, 16(9):1099-108 Quartet spectral analyses of genomes iterates over three loops: Repeat for all bootstrap samples. Repeat for all possible embedded quartets. Repeat for all gene families.

Illustration of one component of a quartet spectral analyses Summary of phylogenetic information for one genome quartet for all gene families Total number of gene families containing the species quartet Number of gene families supporting the same topology as the plurality (colored according to bootstrap support level) Number of gene families supporting one of the two alternative quartet topologies

Quartet decomposition analysis of 19 Prochlorococcus and marine Synechococcus genomes. Quartets with a very short internal branch or very long external branches as well those resolved by less than 30% of gene families were excluded from the analyses to minimize artifacts of phylogenetic reconstruction.

Plurality consensus calculated as supertree (MRP) from quartets in the plurality topology.

NeighborNet (calculated with SplitsTree 4.0) Plurality neighbor-net calculated as supertree (from the MRP matrix using SplitsTree 4.0) from all quartets significantly supported by all individual gene families (1812) without in-paralogs.

From: Delsuc F, Brinkmann H, Philippe H. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet. 2005 May;6(5):361-75.

Supertree vs. Supermatrix Trends Ecol Evol. 2007 Jan;22(1):34-41 The supermatrix approach to systematics Alan de Queiroz John Gatesy: From: Schematic of MRP supertree (left) and parsimony supermatrix (right) approaches to the analysis of three data sets. Clade C+D is supported by all three separate data sets, but not by the supermatrix. Synapomorphies for clade C+D are highlighted in pink. Clade A+B+C is not supported by separate analyses of the three data sets, but is supported by the supermatrix. Synapomorphies for clade A+B+C are highlighted in blue. E is the outgroup used to root the tree.

Johann Heinrich Füssli Odysseus vor Scilla und Charybdis From: http://en.wikipedia.org/wiki/File:Johann_Heinrich_F%C3%BCssli_054.jpg

B) Generate 100 datasets using Evolver with certain amount of HGTs A) Template tree C) Calculate 1 tree using the concatenated dataset or 100 individual trees D) Calculate Quartet based tree using Quartet Suite Repeated 100 times…

Supermatrix versus Quartet based Supertree inset: simulated phylogeny

From: Lapierre P, Lasek-Nesselquist E, and Gogarten JP (2012) Note : Using same genome seed random number will reproduce same genome history From: Lapierre P, Lasek-Nesselquist E, and Gogarten JP (2012) The impact of HGT on phylogenomic reconstruction methods Brief Bioinform [first published online August 20, 2012] doi:10.1093/bib/bbs050

HGT EvolSimulator Results

See http://bib. oxfordjournals. org/content/15/1/79 See http://bib.oxfordjournals.org/content/15/1/79.full for more information.

Examples B1 is an ortholog to C1 and to A1 C2 is a paralog to C3 and to B1; BUT A1 is an ortholog to both B1, B2,and to C1, C2, and C3 From: Walter Fitch (2000): Homology: a personal view on some of the problems, TIG 16 (5) 227-231

Types of Paralogs: In- and Outparalogs …. all genes in the HA* set are co-orthologous to all genes in the WA* set. The genes HA* are hence ‘inparalogs’ to each other when comparing human to worm. By contrast, the genes HB and HA* are ‘outparalogs’ when comparing human with worm. However, HB and HA*, and WB and WA* are inparalogs when comparing with yeast, because the animal–yeast split pre-dates the HA*–HB duplication. From: Sonnhammer and Koonin: Orthology, paralogy and proposed classification for paralog TIG 18 (12) 2002, 619-620