Parsimony is Computationally Intensive

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Introduction to Phylogenies
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
1) Taxonomy: Classification and naming of organisms a.Hierarchical nomenclature with taxonomic categories (kingdom, phylum, class, order, family, genus,
Phylogeny and Systematics
BIO2093 – Phylogenetics Darren Soanes Phylogeny I.
Classification of Living Things. 2 Taxonomy: Distinguishing Species Distinguishing species on the basis of structure can be difficult  Members of the.
Phylogenetic reconstruction
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Review of cladistic technique Shared derived (apomorphic) traits are useful in understanding evolutionary relationships Shared primitive (plesiomorphic)
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
Probabilistic methods for phylogenetic trees (Part 2)
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
Chapter 2 Opener How do we classify organisms?. Figure 2.1 Tracing the path of evolution to Homo sapiens from the universal ancestor of all life.
Classification and Phylogenies Taxonomic categories and taxa Inferring phylogenies –The similarity vs. shared derived character states –Homoplasy –Maximum.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
What Is Phylogeny? The evolutionary history of a group.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
Molecular phylogenetics
Molecular Clock. Rate of evolution of DNA is constant over time and across lineages Resolve history of species –Timing of events –Relationship of species.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Speciation history inferred from gene trees L. Lacey Knowles Department of Ecology and Evolutionary Biology University of Michigan, Ann Arbor MI
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Warm-Up 1.Contrast adaptive radiation vs. convergent evolution? Give an example of each. 2.What is the correct sequence from the most comprehensive to.
PHYLOGENETICS CONTINUED TESTS BY TUESDAY BECAUSE SOME PROBLEMS WITH SCANTRONS.
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
It’s not easy being (photosynthetic) green…. The origin and diversification of Flowering Plants om
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Phylogeny Ch. 7 & 8.
NEW TOPIC: MOLECULAR EVOLUTION.
Estimating genetic diversity (  within populations  =  a function of the number of polymorphic sites in a population (S) “Watterson’s theta”
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
PHYOGENY & THE Tree of life Represent traits that are either derived or lost due to evolution.
Ch. 26 Phylogeny and the Tree of Life. Opening Discussion: Is this basic “tree of life” a fact? If so, why? If not, what is it?
Phylogeny and Taxonomy. Phylogeny and Systematics The evolutionary history of a species or related species Reconstructing phylogeny is done using evidence.
Molecular Evolution. Study of how genes and proteins evolve and how are organisms related based on their DNA sequence Molecular evolution therefore is.
Lecture 19 – Species Tree Estimation
Phylogeny and the Tree of Life
Evolutionary genomics can now be applied beyond ‘model’ organisms
Maximum likelihood (ML) method
Systematics: Tree of Life
In-Text Art, Ch. 16, p. 316 (1).
Models of Sequence Evolution
Patterns in Evolution I. Phylogenetic
Molecular Clocks Rose Hoberman.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Molecular Evolution.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Parsimony is Computationally Intensive
Chapter 19 Molecular Phylogenetics
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
Molecular data assisted morphological analyses
Warm-Up Contrast adaptive radiation vs. convergent evolution? Give an example of each. What is the correct sequence from the most comprehensive to least.
But what if there is a large amount of homoplasy in the data?
Chapter 18: Evolution and Origin of Species
Presentation transcript:

Parsimony is Computationally Intensive The number of possible trees increases exponentially with the number of species, making exhaustive searches impractical for many data sets. Need to utilize a way to search for the best tree without evaluating all possible trees. Tree bisection and reconnection

What if there is a large amount of homoplasy in the data? Seq 1 AGCGAG Seq 2 GCGGAC Sequence data may have multiple, “hidden” substitutions. Use a model of evolution to correct for different rates of substitutions or unequal base frequencies or other parameters. Maximum-likelihood phylogenetic analysis Seq 1 C A Seq 2 C T C A C A Plot of base pair differences between pairs of mammalian species for a representative gene. L = P (DT, M)

Example: Model of sequence evolution G Simplest Model = Jukes-Cantor - Assumes all substitutions are equally likely (a a a a a C a T Example: What is the total number of substitutions? Expected Difference AGGTCG CATTGC CCCGAT CTCTTG ATCGGG Correction AGATCG CAACGC CCGGAC TTCTTA ATCGGG 3 4 4 3 1 - ( ) p K = - ln = 0.27 Observed Difference Sequence Difference total observed = 7 ; p = 7/30 = 0.23 Total expected = 0.27 x 30 = 8.24 Time

Phylogenetic Inference Using Maximum Likelihood Model of sequence evolution and the estimation of its parameters allows the placement of probabilities on different types of substitutional change. Likelihood analysis focuses on the data, not the tree. It is the Probability of the Data given a Tree and a Model of evolution Seq 1 ATATC Seq 2 CTAGC L = P (DT, M) The Likelihood (i.e. the probability of observing the data) is a sum over all possible assignments of nucleotides to the internal nodes

Phylogenetic Inference Using Maximum Likelihood Calculate the Likelihood for each base position in the sequence and summarize across all base positions. The ML tree is the tree that produces the highest likelihood. Evaluates the branching structure of the tree, and also the branch length, using similar tree-searching strategies as used in parsimony analysis. This is important, because by using a model-based approach, mutational change is more probable along longer branches than on shorter branches. Can be extremely computationally intensive.

Phylogenetic Inference Using Maximum Likelihood Important point about ML: The model you choose to use can have a large impact on the resulting ML tree. If you flip a coin and get a head, what is its likelihood? If it’s a 2 sided and fair coin (your model), the likelihood is 0.5 If it’s a two-headed coin (your model), the likelihood is 1.0

Assessing the Robustness Of Trees We can use a number of methods to assess the robustness of particular branches in our trees Bootstrapping (Jacknifing, Decay-Index) Bootstrapping: Multiple new data sets are made by resampling from the original data set. Bootstrapping: Sampling done with replacement The resampled data sets are subjected to phylogenetic analysis. The proportion of times a clade appears in the trees across all replicate data sets is called its bootstrap proportion.

Taken from Baldauf, S. L. Phylogeny for the faint of heart: a tutorial Taken from Baldauf, S. L. Phylogeny for the faint of heart: a tutorial. Trends in Genetics 19:345-351.

Bootstrapping Clades that receive a high bootstrap are considered to be more supported by the data than clades with a lower bootstrap. 70% or greater is good, but many phylogeneticists will only consider branches with ≥90% as being strongly supported. Bootstrap Can perform with any type of phylogenetic analysis: parsimony, ML, distance-based Important to emphasize that a bootstrap does not reveal the probability that a particular clade is true, but only how well it is supported by the particular dataset.

Molecular Clocks The mutation rate for some genes may be relatively constant across species. This idea is based on neutral theory (this will be introduced later in the course) - nucleotide or amino acid substitutions occur at a rate equal to the mutation rate. Generally in applying a molecular clock, you assume that the mutation rate for a gene does not differ among species.

} Molecular Clocks R= 2%/1MY 1) Construct A Tree 2) Date a Node in the Tree Outgroup Outgroup Species 1 Species 1 Species 2 Species 2 Species 3 Species 3 Species 4 Species 4 You know that the most recent possible divergence between 3 and 4 is at least 1 MY Fossil for Species 4 ~1 MY 3) Calculate Divergence 4) Calculate a Rate Species 3 } 2% Sequence Divergence R= 2%/1MY Species 4

Molecular Clocks 5MY 2MY 1MY 5) Apply Rate to Other Nodes in Tree Outgroup Species 1 Species 2 Species 3 5MY Species 4 2MY 1MY Best applied when dates available for multiple nodes. Can utilize solid geological information as well as fossil information. Must be aware of possible non-clock behavior of genes.

Phylogeny of North American Black Basses Near et al., 2003. Evolution 57:1610–1621. Previous hypothesis that speciation within the genus Micropterus occurred during the Pleistocene. Micropterus has a very good fossil record. Calibration of a molecular clock and calculation of divergence times among species reveals that most species diverged well before the Pleistocene

Species Delimitation in Rapidly Radiating Systems • Accumulation of species diversity over short periods of time. • Adaptive radiations • Often of very recent origin • Difficult to resolve monophyletic species-level lineages. Salzburger, W. and A. Meyer. 2004. Naturwissenschaften 91:277-290.

Species Delimitation in Rapidly Radiating Systems (Species trees vs gene trees) Lineage sorting and the retention of ancestral alleles or allelic lineages

East African Cichlid Fish Species Delimitation in Rapidly Radiating Systems Lineage sorting and the retention of ancestral alleles or allelic lineages East African Cichlid Fish Darwin’s Finches Moran and Kornfield. 1993. Mol. Biol. Evol. 10:1015-1029. Takahashi et al. 2001. Mol. Biol. Evol. 18:2057-2066. Sato et al. 1999. PNAS. 96:5101-5106.

Species Delimitation in Rapidly Radiating Systems Limited reproductive isolation leads to hybridization and introgression

Ambystoma tigrinum species complex A. californiense Shaffer & McKnight 1996 Evolution 50:417-433 Gerald and Buff Corsi © California Academy of Sciences

An early study found that A. ordinarium was not a monophyletic group.

Indeed, more data shows extensive mtDNA non-monophyly with respect to A. ordinarium.

Nuclear Genes Summary • 4 genes yield A. ordinarium monophyly. • 3 genes yield A. ordinarium paraphyly. (2 are nearly monophylyetic.) • 1 gene yields A. ordinarium polyphyly. • Nuclear data strongly suggests that A. ordinarium is a monophyletic lineage.

MtDNA Polyphyly • mtDNA genealogy offers a strong contrast to the nuclear gene trees. • MtDNA should achieve monophyly faster than nuclear loci. • What explains this discrepancy?

Phylogenetic Discordance Signatures of Rapid Lineage Diversification Poe, S., and A. L. Chubb. 2004. Syst. Biol. 58:404-415. Short Internal Branches Phylogenetic Discordance Among Loci

A. dumerilii Shared and minimally divergent mtDNA haplotypes strongly indicate recent hybrid introgression.