The Cobweb of life revealed by Genome-Scale estimates of Horizontal Gene Transfer Fan Ge, Li-San Wang, Junhyong Kim Mourya Vardhan.

Slides:



Advertisements
Similar presentations
The Cobweb of Life Revealed by Genome-Scale Estimates of Horizontal Gene Transfer By Fan Ge, Li-San Wang, Junhyong Kim Published: August 30, 2005 Presented.
Advertisements

A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Introduction to Phylogenies
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
“Inferring Phylogenies” Joseph Felsenstein Excellent reference
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
 Once you know the correlation coefficient for your sample, you might want to determine whether this correlation occurred by chance.  Or does the relationship.
The Statistical Significance of Max-gap Clusters Rose Hoberman David Sankoff Dannie Durand.
Bioinformatics and Phylogenetic Analysis
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Example of bipartition analysis for five genomes of photosynthetic bacteria (188 gene families) total 10 bipartitions R: Rhodobacter capsulatus, H: Heliobacillus.
Statistics 101 Class 9. Overview Last class Last class Our FAVORATE 3 distributions Our FAVORATE 3 distributions The one sample Z-test The one sample.
Lecture 9 Today: –Log transformation: interpretation for population inference (3.5) –Rank sum test (4.2) –Wilcoxon signed-rank test (4.4.2) Thursday: –Welch’s.
Probabilistic methods for phylogenetic trees (Part 2)
Bell Work Dogs of a certain breed can have black fur or white fur. Black fur is dominant, but the breeder only wants puppies with white fur. Cross two.
Processing & Testing Phylogenetic Trees. Rooting.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
4.1 Introducing Hypothesis Tests 4.2 Measuring significance with P-values Visit the Maths Study Centre 11am-5pm This presentation.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
- Interfering factors in the comparison of two sample means using unpaired samples may inflate the pooled estimate of variance of test results. - It is.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Comparing Two Population Means
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Yeast genome sequencing: the power of comparative genomics MEDG 505, 03/02/04, Han Hao Molecular Microbiology (2004)53(2), 381 – 389.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
1 Section 9-4 Two Means: Matched Pairs In this section we deal with dependent samples. In other words, there is some relationship between the two samples.
Evolutionary Biology Concepts Molecular Evolution Phylogenetic Inference BIO520 BioinformaticsJim Lund Reading: Ch7.
Introduction to Phylogenetics
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Phylogenetic analyses of cyanobacterial genomes: Quantification of horizontal gene transfer events Olga Zhaxybayeva, J. Peter Gogarten, Robert L. Charlebois,
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
We obtained breast cancer tissues from the Breast Cancer Biospecimen Repository of Fred Hutchinson Cancer Research Center. We performed two rounds of next-gen.
Gene tree discordance and multi-species coalescent models Noah Rosenberg December 21, 2007 James Degnan Randa Tao David Bryant Mike DeGiorgio.
BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.
Lecture 17: Phylogenetics and Phylogeography
Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.
Significance Tests for Max-Gap Gene Clusters Rose Hoberman joint work with Dannie Durand and David Sankoff.
Discrepancy between Data and Fit. Introduction What is Deviance? Deviance for Binary Responses and Proportions Deviance as measure of the goodness of.
Case Selection and Resampling Lucila Ohno-Machado HST951.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
ASSEMBLY AND ALIGNMENT-FREE METHOD OF PHYLOGENY RECONSTRUCTION FROM NGS DATA Huan Fan, Anthony R. Ives, Yann Surget-Groba and Charles H. Cannon.
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Ch 6 Introduction to Formal Statistical Inference.
Area Test for Observations Indexed by Time L. B. Green Middle Tennessee State University E. M. Boczko Vanderbilt University.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Test of a Population Median. The Population Median (  ) The population median ( , P 50 ) is defined for population T as the value for which the following.
Test of a Population Median. The Population Median (  ) The population median ( , P 50 ) is defined for population T as the value for which the following.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Introduction to Bioinformatics Resources for DNA Barcoding
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
Lecture8 Test forcomparison of proportion
Nonparametric estimation of phylogenetic tree distributions
Elementary Statistics
Volume 11, Issue 3, Pages (March 2018)
Overview of Statistical Concepts and Procedures
Volume 11, Issue 3, Pages (March 2018)
Presentation transcript:

The Cobweb of life revealed by Genome-Scale estimates of Horizontal Gene Transfer Fan Ge, Li-San Wang, Junhyong Kim Mourya Vardhan

Outline Controversy : The extent of HGT affecting the core genealogical history Examination of this controversy by assessing the extent among core orthologous genes A novel statistical method : To asses the extent of HGT based on comparisons of tree topology

Introduction Horizontal gene transfer (HGT) refers to the transfer of genes between organisms in a manner other than traditional reproduction. Whole genome analyses of different prokaryotes have been thought to indicate rampant HGTs There is an on going debate over the estimation of HGT frequency and its impact on phylogeny Inference of HGT from tree comparisons should be done under a proper statistical framework

Methodology to assess the extent New method to explicitly test for phylogenetic incongruence due to horizontal transfer versus statistical tree errors Used Clusters of Orthologous Groups (COG) from NCBI databases Extracted most reliable COGs Built gene tree for every COG and integrated to construct W-G tree Comparisons of each gene tree with W-G tree to infer significant HGT Augmented this method to pairwise comparisons of gene trees to detect conflicts

High-Quality Gene Groups and the W-G Tree COG database is built by redoing sequence comparisons over 43 genomes This resulted in retention of 297 high quality COG entries out of 3852 To approximate the W-G tree, they used median tree estimator The estimate used boot strap values from bootstrap sampling

Detection of HGT events By comparison of estimated trees against other gene trees or against trees that represent the history of genomes, we infer HGTs Discrepancy in the trees maybe caused due to HGT or other errors Distance metrics are used to test discrepancies The paper explicitly asks if the discrepancies are caused by HGT events, as an additional precaution.

Comparison Metrics Maximum agreement subtree (MAST) - If two trees differ by branches, they share common subtree, the bound on size of the shared subtree can be calculated using MAST Symmetric Difference (SD) - Difference in the trees can be found by this metric

Interpretation of HGT events… Case 1: If both MAST and SD are low, trees are most likely not different Case 2: If both the metrics are large, can be either HGT events or errors Case 3: But if they have large SD and low MAST values, it is most likely an HGT event. Case 4: Large MAST and low SD cannot occur due to algorithmic reasons

SD and MAST scores for Gene Tree 1 and the W-G tree are 2 and 2, while the scores for Gene Tree 2 and the W-G tree are 8 and 2

The Hypothesis Test Hypothesis test Ɣ – difference of the two metrics Computed by generating null distribution by bootstrapping gene trees HGT was inferred when the observed Ɣ was significant with the p-value below the 5% level Simulation studies applied to each COG showed it detecting HGT events as follows, in a COG tree using the 5% significance HGT EventsRates

d s is the SD metric d m is the MAST metric m,n are the no. of branch splits X is the no. of taxa Used PAUP software to calculate

HGT Estimation via Comparisons between Each Gene Tree and the W-G Tree Hypothesis Test was applied to each COG Observations showed that the test does not significantly vary with the p- value At 5% level, 33/297 (11.1%) COGs showed putative HGTs These COGs are termed hCOGs

The Relationship between Detecting COG entries with HGT and the p-Values

HGT Estimation via Comparisons among Gene Trees Problem with comparing the Gene tree and W-G tree is that the results are sensitive to the W-G tree COG entries do not all share the same taxa If its a hCOG, it should test differently for all the comparisons 14,004 pairs of gene trees that contained greater than or equal to six shared taxa were compared At 5% level, 1,764/14,004 (12.6%) pairs were significant

Identification of transferred branches in gene trees. For each COG that tested positive for HGT events, transferred branches were found by exhaustive enumeration of possible subtree matches Searched for all combinations of branch prunings to find the ‘‘troublesome’’ branches If there’s only one way to prune to make the trees congruent, it is an HGT event

ColorHGT Rates Red>4% Yellow3%–4% Pink2%–3% Blue1%–2% Green1%

References 1. Goddard W, Kubicka E, Kubicki G, McMorris FR (1994) The agreement metric for labeled binary trees. Math Biosci 123: 215– Robinson DF, Foulds LR (1981) Comparison of phylogenetic trees. Math Biosci 53: 131– Conover WJ (1999) Practical nonparametric statistics, 3rd ed. New York: Wiley. 584 p. 4. Eisen JA (2000) Horizontal gene transfer among microbial genomes: New insights from complete genome analysis. Curr Opin Genet Dev 10: 606–611

Thank You!