Phylogenomics “The intersection of phylogenetics and genomics” The reconstruction of evolutionary relationships by comparing sequences of whole genomes or portions of genomes Several potential methods/strategies to discuss We will focus on: Ultraconserved element phylogenetics Transposable element phylogenetics RADSeq PhylomeDB
Phylogenomics UltraConserved Elements UCEs Bejerano et al. Science 304:1321-1325 “481 segments longer than 200 base pairs (bp) that are absolutely conserved (100% identity with no insertions or deletions) between orthologous regions of the human, rat, and mouse genomes” “Nearly all of these segments are also conserved in the chicken and dog genomes, with an average of 95 and 99% identity, respectively. Many are also significantly conserved in fish” “more than 5000 sequences of over 100 bp that are absolutely conserved among the three sequenced mammals”
Phylogenomics UltraConserved Elements UCEs have been associated with gene regulation and development generally assumed that UCEs must be important by the very nature of their near-universal conservation across extremely divergent taxa. However, gene knockouts of UCE loci in mice resulted in viable, fertile offspring, suggesting that their role in the biology of the genome may be cryptic.
Phylogenomics By definition, UCEs would be of minimal use in phylogenetics because of the low variability Linkage predicts that neighboring sequence that isn’t as highly conserved would be under less constraint UCEs serve as the anchors to access the neighboring sequence
UCE workflow http://ultraconserved.org/ Target enrichment of ultraconserved elements from arthropods provides a genomic perspective on relationships among Hymenoptera. Mol Ecol Res 2014 The evolution of peafowl and other taxa with ocelli (eyespots): A phylogenomic approach. Proc R Soc Lond B Biol Sci 281: 20140823. 2014. Target Capture and Massively Parallel Sequencing of Ultraconserved Elements (UCEs) for Comparative Studies at Shallow Evolutionary Time Scales. Syst Biol 63:83-95. 2014. A Phylogenomic Perspective on the Radiation of Ray-Finned Fishes Based upon Targeted Sequencing of Ultraconserved Elements (UCEs). PLoS ONE 8: e65923. 2014.
SINE accumulation in genomes Phylogenomics SINE accumulation in genomes Genome Subfamily 1 Subfamily 2 Time Subfamily 3 Because of the way they accumulate in a genome, TEs, especially retrotransposons, make excellent marker for phylogenetic analysis
SINEs as phylogenetic markers But… Which SINE families do you target and how do you identify them?
Phylogenomics Transposable element phylogenetics Identical by descent Known ancestral state Simple evolutionary model Neutral “Low-tech” Bi-allelic markers Consistency index = 1.00 Homoplasy index = 0.00
Phylogenomics ME-Scan
Phylogenomics ME-Scan validation
Phylogenomics RAD-Seq Restriction Site Associated DNA Sequencing Cresko and colleagues (PLoS ONE 2008;3:e3376, PLoS Genet 2010;6:e1000862, PNAS 2010;107:16196–200.) Akin to RFLP and AFLP except that you sequence the fragments Rapidly identify genome-wide suites of SNPs and other polymorphisms
Phylogenomics (A) Genomic DNA is sheared with a restriction enzyme. (B) P1 adapter is ligated to cut fragments. (C) Samples from multiple individuals are pooled together randomly sheared. Only a subset of the resulting fragments contains restriction sites and P1 adapters. (D) P2 adapter is ligated to all fragments. The P2 adapter has a divergent end. (E) PCR amplification with P1 and P2 primers. The P2 adapter will be completed only in the fragments ligated with P1 adapter, and so only these fragments will be fully amplified. (F) Pooled samples with different MIDs are separated bioinformatically and SNPs called (C/G SNP underlined). (G) As fragments are sheared randomly, paired end sequences from each sequenced fragment will cover a 300 - 400 bp region downstream of the restriction site.
Phylogenomics RAD-Seq and phylogenetics There is potential but there are problems “the most substantial obstacle to using RAD sequences for phylogenetics is determining orthology” “Deep divergences are problematic for two reasons: first, restriction sites change over time, with losses favored over gains, leading to a reduction in the number of orthologs retained across divergent taxa; second, evolutionary divergence of orthologous RAD sequences compromises the ability to infer their orthology based on sequences imilarity. Consequently, taxa that are phylogenetically isolated on long branches are less likely to retain orthologous restriction sites, and the RAD sequences they do retain will be more divergent, diminishing their representation in clusters.” “While correct nodes are more likely in general to be strongly supported, incorrect nodes can also have high bootstrap values, although this is not unique to RAD phylogenetics.” Probably still really good for phylogeography within species and among closely related species
Phylogenomics PhylomeDB Remember that gene tree/species tree problem? “given the plurality of evolutionary histories among genes encoded in a given genome, there is a need for the combined analysis of genome-wide collections of phylogenetic trees (phylomes).” Phylome – the complete collection of evolutionary histories of all genes in a genome Huerta-Cepas et al. 2007 Latest version of PhylomeDB is v4, Nucleic Acids Research 2013 phylomedb.org
Phylogenomics
Phylogenomics Phylome for gene family TP53 (screenshot from Huerta-Cepas et al. 2013) Speciation events Gene duplication events
Phylogenomics Alternative topology resolution using phylomes # trees (%) supporting the given phylogeny # trees (%) with PP >0.9 supporting the given phylogeny # gene families (%) supporting the given phylogeny