Download presentation
Presentation is loading. Please wait.
Published byRoy French Modified over 9 years ago
1
Christian Arnold Bioinformatics Group, University of Leipzig Bioinformatics Herbstseminar October 23th, 2009 Three Weeks of Experience at the formatics Institute
2
Content 1.The 10kTrees Project 2.Phylogenetic Targeting 3.Acknowledgements
3
1. The 10kTrees Project
4
Goals Updated primate phylogeny that includes phylogenetic uncertainty –Use newest available sequence data, include as much primate species as possible, and update regularly –Produce a set of >=10,000 primate-wide trees (with branch lengths) that are appropriate for taxonomically broad comparative research on primate behavior, ecology and morphology using Bayesian methods Make it accessible to other researchers
5
Methodology
6
Version 1 vs. Version 2 Version 1Version 2 Species187231 Genes 4 mitochondrial (COI, COII, CYTB and ND1) and 1 autosomal gene (SRY) 6 mitochondrial (12S rRNA, 16S rRNA, COI, COII, CYTB, cluster of other mitochondrial genes) and 3 autosomal genes (SRY, CCR5, MC1R) Genetic loci24 Total No. of Sites5134~9000 Collected sequences 413 out of 935 total (55.8% missing data) 1007 out of 2079 total (51.6% missing data) No. of constraints291 Generations8 millions60 millions Computing time ~ 48 days (16 processors in parallel, ~ 3 days each) ~ 2 years (32 processors in parallel, ~ 3 weeks each)
7
Preliminary consensus tree Green: Cercopithecines Blue: Hominoids Red: Platyrrhines Yellow: Tarsiers Brown: Strepsirrhines Rooted with Galeopterus variegatus
8
The 10kTrees Website http://10ktrees.fas.harvard.edu/
9
Current Progress Submitted to Evolutionary Anthropology, in press. Will be presented at the AAPA conference (April 2010) in Albuquerque, New Mexico Version 2 is almost finished Available at http://10kTrees.fas.harvard.edu
10
Summary Bayesian approach is time-consuming, but works well, even though data matrix is very sparse Increased number of sequences in Version 2 dramatically reduces need for constraints and improves quality of tree and branch lengths estimates Ongoing project Total number of downloaded trees since June 2009: 95800
11
2. Phylogenetic Targeting
12
Which species should we study?
13
For which species should we collect data in order to increase the size of comparative data sets ? Goals ?
14
Example 1/2 Hypothesis: Two characters (x and y) show correlated evolution Goal: Test this hypothesis comparatively (e.g. by using phylogenetically independent contrasts and correlation tests) Problem 1: Data has been only collected for x, but not for y Solution 1: Collect data for y and test hypothesis Problem 2: From which species should we collect data for y? Solution 2: Phylogenetic targeting!?
15
Example 2/2 Brain sizeCognitive data 4 ? 9 7 10 ? 3 ? 2 ? Collecting new data is time-consuming and expensive…
16
Methods Systematically generate all possible pairwise comparisons For every pairwise comparison, calculate character differences for the two species that form the pair and assign a score Determine set of phylogenetically independent pairs that maximizes the sum of all selected pair scores (maximal pairing)
17
Maximal pairing: Example
18
Time complexity:, for balanced trees: Decomposition of the maximal pairing
19
Simulation results 1/2 Random (Rnd) selection of species –Type 1 errors close to nominal level –Power: ~40%, independent of number of taxa –Uses 67% of available variation Phylogenetic targeting (PT) induced selection of species –Type 1 errors close to nominal level –Power: 67-81%, increases with number of taxa –Uses 89% of available variation Detecting correlated character evolution, based on selection of 12 species
20
Simulation results 2/2 PT Rnd 12 18 24 Number of selected species Fraction of available variation after sampling PT Rnd PT Rnd PT Rnd
21
Current Progress A revised version will be resubmitted to American Naturalist in the not too distant future TODO: Extend simulations and clarify some issues Available at http://phylotargeting.fas.harvard.edu
22
Summary A focused selection of species can save valuable time and money Phylogenetic targeting provides a very flexible approach and can address different questions in the context of limited resources Dynamic programming algorithms are everywhere
23
3. Acknowledgements
24
Harvard University Max-Planck Institute for Evolutionary Anthropology University of Leipzig Charlie Nunn Luke Matthews Peter F. Stadler Thanks!
25
Thank you for your attention! Questions? If not: Cheers (it’s early, but not too early…) Any Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.