Christian Arnold Bioinformatics Group, University of Leipzig Bioinformatics Herbstseminar October 23th, 2009 Three Weeks of Experience at the formatics.

Slides:



Advertisements
Similar presentations
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Advertisements

. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
An Introduction to Phylogenetic Methods
Multiple Sequence Alignment & Phylogenetic Trees.
Probabilistic Modeling of Molecular Evolution Using Excel, AgentSheets, and R Jeff Krause (Shodor)
BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.
Lichens and Ascomycota broadly Alternative markers to COI ITS.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Plant Molecular Systematics (Phylogenetics). Systematics classifies species based on similarity of traits and possible mechanisms of evolution, a change.
1) What evolutionary force creates adaptations A) mutation B) genetic drift C) selection D) migration.
Perfect phylogenetic networks, and inferring language evolution Tandy Warnow The University of Texas at Austin (Joint work with Don Ringe, Steve Evans,
Molecular Evolution Revised 29/12/06
The Cobweb of life revealed by Genome-Scale estimates of Horizontal Gene Transfer Fan Ge, Li-San Wang, Junhyong Kim Mourya Vardhan.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
Measuring Evolution Evidence for evolution –Estimate natural selection in the wild –Trace fossils –Infer phylogeny from behavior –Infer behavioral evolution.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Probabilistic methods for phylogenetic trees (Part 2)
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
“Species Trees”. What is the “species tree?” The true tree (when there is one) The population tree The dominant history ????
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Metagenomic Analysis Using MEGAN4
Input for the Bayesian Phylogenetic Workflow All Input values could be loaded as text file or typing directly. Only for the multifasta file is advised.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Population assignment likelihoods in a phylogenetic and demographic model. Jody Hey Rutgers University.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Announcements Urban Forestry project starts this week. Go through protocol. We'll be sending you off on your own. Please act responsibly. Peer review of.
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
Metagenomic Analysis Using MEGAN4 Peter R. Hoyt Director, OSU Bioinformatics Graduate Certificate Program Matthew Vaughn iPlant, University of Texas Super.
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Teaching about time a biologists perspective Biochemistry Physiology Ecology Evolution Origins of biodiversity and estimates of divergence times TIME Integrative.
Phylogenetic Analysis – Part 2. Outline   Why do we do phylogenetics (cladistics)?   How do we build a tree?   Do we believe the tree?   Applications.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
CHARACTERS USED IN RECONSTRUCTING PHYLOGENETIC TREES 1. Morphological “ Tiktaalik is the sister group of Acanthostega + Ichthyostega in one of the two.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Bayesian Evolutionary Analysis by Sampling Trees (BEAST) LEE KIM-SUNG Environmental Health Institute National Environment Agency.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Phylogenetic Analysis – Part 2. Outline   Why do we do phylogenetics (cladistics)?   How do we build a tree?   Do we believe the tree?   Applications.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Date of download: 6/23/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A)
Date of download: 7/7/2016 Copyright © 2016 McGraw-Hill Education. All rights reserved. Pipeline for culture-independent studies of a microbiota. (A) DNA.
Lecture 19 – Species Tree Estimation
Phylogenetic basis of systematics
Distance based phylogenetics
MtActinopterygii: Analysing evolution of mitogenomes belonging to the most dominant class of vertebrates Sevgi Kaynar1, Esra Mine Ünal1, Tuğçe Aygen1,
Pipelines for Computational Analysis (Bioinformatics)
Multiple Alignment and Phylogenetic Trees
Phylogenetic Trees.
Species as datapoints Comparative Methods Biology 683 Heath Blackmon
Presentation transcript:

Christian Arnold Bioinformatics Group, University of Leipzig Bioinformatics Herbstseminar October 23th, 2009 Three Weeks of Experience at the formatics Institute

Content 1.The 10kTrees Project 2.Phylogenetic Targeting 3.Acknowledgements

1. The 10kTrees Project

Goals Updated primate phylogeny that includes phylogenetic uncertainty –Use newest available sequence data, include as much primate species as possible, and update regularly –Produce a set of >=10,000 primate-wide trees (with branch lengths) that are appropriate for taxonomically broad comparative research on primate behavior, ecology and morphology using Bayesian methods Make it accessible to other researchers

Methodology

Version 1 vs. Version 2 Version 1Version 2 Species Genes 4 mitochondrial (COI, COII, CYTB and ND1) and 1 autosomal gene (SRY) 6 mitochondrial (12S rRNA, 16S rRNA, COI, COII, CYTB, cluster of other mitochondrial genes) and 3 autosomal genes (SRY, CCR5, MC1R) Genetic loci24 Total No. of Sites5134~9000 Collected sequences 413 out of 935 total (55.8% missing data) 1007 out of 2079 total (51.6% missing data) No. of constraints291 Generations8 millions60 millions Computing time ~ 48 days (16 processors in parallel, ~ 3 days each) ~ 2 years (32 processors in parallel, ~ 3 weeks each)

Preliminary consensus tree Green: Cercopithecines Blue: Hominoids Red: Platyrrhines Yellow: Tarsiers Brown: Strepsirrhines Rooted with Galeopterus variegatus

The 10kTrees Website

Current Progress Submitted to Evolutionary Anthropology, in press. Will be presented at the AAPA conference (April 2010) in Albuquerque, New Mexico Version 2 is almost finished Available at

Summary Bayesian approach is time-consuming, but works well, even though data matrix is very sparse Increased number of sequences in Version 2 dramatically reduces need for constraints and improves quality of tree and branch lengths estimates Ongoing project Total number of downloaded trees since June 2009: 95800

2. Phylogenetic Targeting

Which species should we study?

For which species should we collect data in order to increase the size of comparative data sets ? Goals ?

Example 1/2 Hypothesis: Two characters (x and y) show correlated evolution Goal: Test this hypothesis comparatively (e.g. by using phylogenetically independent contrasts and correlation tests) Problem 1: Data has been only collected for x, but not for y Solution 1: Collect data for y and test hypothesis Problem 2: From which species should we collect data for y? Solution 2: Phylogenetic targeting!?

Example 2/2 Brain sizeCognitive data 4 ? ? 3 ? 2 ? Collecting new data is time-consuming and expensive…

Methods Systematically generate all possible pairwise comparisons For every pairwise comparison, calculate character differences for the two species that form the pair and assign a score Determine set of phylogenetically independent pairs that maximizes the sum of all selected pair scores (maximal pairing)

Maximal pairing: Example

Time complexity:, for balanced trees: Decomposition of the maximal pairing

Simulation results 1/2 Random (Rnd) selection of species –Type 1 errors close to nominal level –Power: ~40%, independent of number of taxa –Uses 67% of available variation Phylogenetic targeting (PT) induced selection of species –Type 1 errors close to nominal level –Power: 67-81%, increases with number of taxa –Uses 89% of available variation Detecting correlated character evolution, based on selection of 12 species

Simulation results 2/2 PT Rnd Number of selected species Fraction of available variation after sampling PT Rnd PT Rnd PT Rnd

Current Progress A revised version will be resubmitted to American Naturalist in the not too distant future TODO: Extend simulations and clarify some issues Available at

Summary A focused selection of species can save valuable time and money Phylogenetic targeting provides a very flexible approach and can address different questions in the context of limited resources Dynamic programming algorithms are everywhere

3. Acknowledgements

Harvard University Max-Planck Institute for Evolutionary Anthropology University of Leipzig Charlie Nunn Luke Matthews Peter F. Stadler Thanks!

Thank you for your attention! Questions? If not: Cheers (it’s early, but not too early…) Any Questions?