Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU

Slides:



Advertisements
Similar presentations
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Advertisements

. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
An Introduction to Phylogenetic Methods
An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Anton E.
The Evolutionary Basis of Bioinformatics: An Introduction to Phylogenetics > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
. Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:
Bioinformatics and Phylogenetic Analysis
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
. Phylogenetic Trees - Parsimony Tutorial #11 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Phylogenetic trees Sushmita Roy BMI/CS 576
9/1/ Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Terminology of phylogenetic trees
Molecular phylogenetics
P HYLOGENETIC T REE. OVERVIEW Phylogenetic Tree Phylogeny Applications Types of phylogenetic tree Terminology Data used to build a tree Building phylogenetic.
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
1 Summary on similarity search or Why do we care about far homologies ? A protein from a new pathogenic bacteria. We have no idea what it does A protein.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetics II.
 Read Chapter 4.  All living organisms are related to each other having descended from common ancestors.  Understanding the evolutionary relationships.
Phylogenetic Tree Reconstruction
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
CSCE555 Bioinformatics Lecture 12 Phylogenetics I Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Phylogeny & Systematics
Phylogenetic Trees - Parsimony Tutorial #13
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Phylogenetic Trees - Parsimony Tutorial #12
Character-Based Phylogeny Reconstruction
Clustering methods Tree building methods for distance-based trees
Multiple Alignment and Phylogenetic Trees
Inferring phylogenetic trees: Distance and maximum likelihood methods
Phylogeny.
Presentation transcript:

Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU Phylogeny Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU

What is phylogenetics? Phylogenetics is the study of evolutionary relationships among and within species. birds snakes rodents primates crocodiles marsupials lizards

This is an example of a phylogenetic tree. What is phylogenetics? crocodiles birds lizards snakes rodents primates marsupials This is an example of a phylogenetic tree.

Applications of phylogenetics • Forensics: Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist? • Conservation: How much gene flow is there among local populations of island foxes off the coast of California? Invasive: Marked by the tendency to spread, especially into healthy tissue The prion diseases are a large group of related neurodegenerative conditions, which affect both animals and humans Prion diseases are unique in that they can be inherited, they can occur sporadically, or they can be infectious. • Medicine: What are the evolutionary relationships among the various prion-related diseases? To be continued…

Phylogenetic concepts: Interpreting a Phylogeny Which sequence is most closely related to B? Sequence A Sequence B Sequence C Sequence D Sequence E A, because B diverged from A more recently than from any other sequence. Physical position in tree is not meaningful! Only tree structure matters. Time

Phylogenetic concepts: Rooted and Unrooted Trees Time A B C D Root = A B C D Root X = ? A B C D X

Rooting and Tree Interpretation bacteria archaea oak fruit fly chicken human bacteria archaebacteria oak fruit fly chicken human – bones – cell nuclei bacteria archaebacteria oak fruit fly chicken human The Archaea constitute a domain of single-celled microorganisms. These microbes have no cell nucleus or any other membrane-bound organelles within their + cell nuclei + bones

Rooting Methods Outgroup Rooting a network of relationships Given an unrooted network of relationships among four species of Carnivora [left], outgroup rooting uses an additional taxon (the outgroup) known from independent evidence to be less closely related to any of the other species (the ingroup) than they are to each other. The root is then placed on the branch between the outgroup and the ingroup. In this case, Lynx is a feloid carnivore in a separate superfamily from the four canoid carnivores. Inclusion of Lynx in the network analysis places it on the internode.This method requires accurate information as to ingroup / outgroup relationships.  

How Many Trees? (assuming bifurcation only) Unrooted trees # sequences # pairwise distances # trees # branches /tree # branches /tree 3 4 5 6 10 30 N

How Many Trees? 2N - 2 (2N - 3)! 2N - 2 (N - 2)! 2N - 3 (2N - 5)! 58 4.95  1038 57 8.69  1036 435 30 18 34,459,425 17 2,027,025 45 10 945 9 105 15 6 8 7 5 3 4 1 # branches /tree # trees # branches /tree # pairwise distances # sequences Rooted trees Unrooted trees

All tips are an equal distance from the root. Tree Properties Root Ultrametricity All tips are an equal distance from the root. X Y a b c d e a = b + c + d + e Root Additivity Distance between any two tips equals the total branch length between them. X Y a b c d e XY = a + b + c + d + e A phylogram is a phylogenetic tree that has branch spans proportional to the amount of character change In simple scenarios, evolutionary trees are ultrametric and phylograms are additive.

Terminology External nodes: things under comparison; operational taxonomic units (OTUs) Internal nodes: ancestral units; hypothetical; goal is to group current day units Root: common ancestor of all OTUs under study. Path from root to node defines evolutionary path Unrooted: specify relationship but not evolutionary path If have an outgroup (external reason to believe certain OTU branched off first), then can root Topology: branching pattern of a tree Branch length: amount of difference that occurred along a branch In cladistics or phylogenetics, an outgroup is a (monophyletic) group of organisms that serve as a reference group for determination of the evolutionary relationship among three or more monophyletic groups of organisms.

Phylogeny Applications Tree of Life: Analyzing changes that have occurred in evolution of different organisms http://tolweb.org/tree/phylogeny.html Phylogenetic relationships among genes can help predict which ones might have similar functions (e.g., ortholog detection) Follow changes occurring in rapidly changing species (e.g., HIV virus) Homolog : A gene related to a second gene by descent from a common ancestral DNA sequence. The term, homolog, may apply to the relationship between genes separated by the event of speciation (see ortholog) or to the relationship between genes separated by the event of genetic duplication (see paralog). Ortholog : Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution. Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes. (See also Paralogs.).

Phylogeny Packages PHYLIP, Phylogenetic inference package evolution.genetics.washington.edu/phylip.html Felsenstein Free! PAUP, phylogenetic analysis using parsimony paup.csit.fsu.edu Swofford

Similarity vs. Homology sequences resemble one another Homolog sequences derived from common ancestor Ortholog homologous sequences within a species Paralog homologous sequences between species Paralog : Paralogs are genes related by duplication within a genome. Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions, even if these are related to the original one.

Ortholog vs. Paralog Ortholog Paralog genomic variation occurs after speciation hence can be used for phylogeny of organism Paralog genetic duplication occurs before speciation hence not suitable for phylogeny of organism

Homoplasy Sequence similarity NOT due to common ancestry May arise due to parallelism or convergent evolution Parallelism or parallel evolution the development of a similar trait in related, but distinct, species descending from the same ancestor, but from different clades Convergent evolution

Parallel evolution Parallel evolution occurs when two species that have descended from the same ancestor remain similar over long periods of time because they independently acquire the same evolutionary adaptations. Parallel evolution occurs because genetically related species adapt to similar environmental changes in similar ways. After many years, the organisms may still resemble each other, even though they speciated in the distant past.

Convergent evolution when species from different ancestors colonize the same environment, they may independently acquire the same adaptations. The evolution of species descended from different ancestors to become superficially similar because they are adapting to the same environment is called convergent evolution

Divergent Evolution

Phylogeny of what? Organisms Strains (closely related microbes) Whole genome phylogeny Ribosomal RNA (surrogate for whole genome) Strains (closely related microbes) Individual genes (or gene families) Repetitive DNA sequences Metabolic pathways Secondary Structures Any discrete character(s) Human languages Microbial communities A microorganism (from the Greek: μικρός, mikrós, "small" and ὀργανισμός, organismós, "organism") or microbe is a microscopic organism, which may be a single cell[1] or multicellular organism.

Why compute phylogenetic trees? Understand evolutionary history Map pathogen strain diversity for vaccines Assist in epidemiology Of infectious diseases Of genetic defects Aid in prediction of function of novel genes Biodiversity studies Understanding microbial ecologies

Tree Building Exercises

Computational Approaches to Phylogenetic Tree Computation Distance Based Methods UPGMA Neighbor joining Character State Methods Maximum Parsimony Method Maximum Likelihood Methods Tree merging Consensus trees, super-trees

What data is used to build trees? Traditionally: morphological features (e.g., number of legs, beak shape, etc.) Today: Mostly molecular data (e.g., DNA and protein sequences)

Data for Phylogeny Can be classified into two categories: Numerical data Distance between objects e.g., distance(man, mouse)=500, distance(man, chimp)=100 Usually derived from sequence data Discrete characters Each character has finite number of states e.g., number of legs = 1, 2, 4 DNA = {A, C, T, G}

UPGMA

UPGMA

2. Determine the evolutionary distances and build distance matrix - A simple example AGGCCATGAATTAAGAATAA AGCCCATGGATAAAGAGTAA AGGACATGAATTAAGAATAA AAGCCAAGAATTACGAATAA Distance Matrix In this example the evolutionary distance is expressed as the number of nucleotide differences for each sequence pair. For example, sequences 1 and 2 are 20 nucleotides in length and have four differences, corresponding to an evolutionary difference of 4/20 = 0.2. 1 2 3 4 - 0.2 0.05 0.15 0.25 0.4 Given 4 DNA sequences, look for differences Symmetrical matrix, no values on the diagonal Usually more complex to generate, I’m not going to go into detail here laboratory methods for examining sequences can be imprecise and often lead to incomplete distance matrices

3. Phylogenetic Tree Construction example (UPGMA algorithm) UPMGA (Michener & Sokal 1957) Bear Raccoon 0.13 0.13 Dij Bear Raccoon Weasel Seal - 0.26 0.34 0.29 0.42 0.44 UPMGA is a so called distance based method and needs complete distance matrices Simple sequential clustering algorithm, generally the algorithm joins the two nearest clusters (species) until only one cluster is left. 1. Pick smallest entry Dij 2. Join the two intersecting species and assign branch lengths Dij/2 to each of the nodes

3. Phylogenetic Tree Construction example (UPGMA algorithm) Dij Bear Raccoon Weasel Seal - 0.26 0.34 0.29 0.42 0.44 Bear Raccoon 0.13 0.13 3. Compute new distances to the other species using arithmetic means Repeat steps 1 to 3

3. Phylogenetic Tree Construction example (UPGMA algorithm) Dij BR Weasel Seal - 0.38 0.365 0.44 Bear Raccoon Seal 0.13 0.1825 0.1825 New matrix 1. Pick smallest entry Dij 2. Join the two intersecting species and assign branch lengths Dij/2 to each of the nodes

3. Phylogenetic Tree Construction example (UPGMA algorithm) Dij BR Weasel Seal - 0.38 0.365 0.44 Bear Raccoon Seal 0.13 0.1825 0.1825 Repeat steps 1 to 3 Compute new distances to the other species using arithmetic means

3. Phylogenetic Tree Construction example (UPGMA algorithm) Dij BRS Weasel - 0.4 Bear Raccoon Seal Weasel 0.13 0.1825 0.2 0.2 Simple algorithm Generally satisfactory results Next talk more sophisticated algorithms Pick smallest entry Dij. Join the two intersecting species and assign branch lengths Dij/2 to each of the nodes. Done!

Downside of UPGMA Assume molecular clock (assuming the evolutionary rate is approximately constant) Generates only rooted tree Trees are ultrametric Doesn’t work the following case: 37

Computational Approaches to Phylogenetic Tree Computation Distance Based Methods UPGMA Neighbor joining Character State Methods Maximum Parsimony Method Maximum Likelihood Methods Tree merging Consensus trees, super-trees

Neighbor-joining method Developed in 1987 by Saitou and Nei Works in a similar fashion to UPGMA Still fast – works great for large dataset Doesn’t require the data to be ultrametric Great for largely varying evolutionary rates

How to construct a tree with Neighbor-joining method? Step 1: Calculate sum all distance from x and divide by (leaves – 2) Sx = (sum all Dx) / (leaves - 2) Step 2: Calculate pair with smallest M Mij = Distance ij – Si – Sj Step 3: Create a node U that joins pair with lowest Mij S1U = (Dij / 2) + (Si – Sj) / 2

How to construct a tree with Neighbor-joining method? Step 4: Join I and j according to S and make all other taxa in form of a star Step 5: Recalculate new distance matrix of all other taxa to U with: DxU = Dix + Djx - Dij

Example of Neighbor-joining C D E 5 4 7 10 6 9 F 8 11 Step 1: S calculation : Sx = (sum all Dx) / (leaves - 2) S(A) = (5 + 4 + 7 + 6 + 8) / 4 = 7.5 S(B) = (5 + 7 + 10 + 9 + 11) / 4 = 10.5 S(C) = (4 + 7 + 7 + 6 + 8) / 4 = 8 S(D) = (7+ 10 + 7 + 5 + 9) / 4 = 9.5 S(E) = (6 + 9 + 6 + 5 + 8) / 4 = 8.5 S(F) = (8 + 11 + 8 + 9 + 8) / 4 = 11

Example of Neighbor-joining cont 1 Step 2: Calculate pair with smallest M Mij = Distance ij – Si – Sj Smallest are M(AB) = d(AB) – S(A) –S(B) = 5 – 7.5 – 10.5= -13 M(DE) = 5 – 9.5 – 8.5 = -13 A B C D E -13 -11.5 -10 -10.5 F -11

Example of Neighbor-joining cont 2 Step 3: Create a node U S1U = (Dij / 2) + (Si – Sj) / 2 U1 joins A and B: S(AU1) = d(AB) / 2 + (S(A) – S(B)) / 2 = 5 / 2 + (7.5 - 10.5) / 2 = 1 S(BU1) = d(AB) / 2 + (S(B) – S(A)) / 2 = 5 / 2 + (10.5 – 7.5) / 2 = 4

Example of Neighbor-joining cont 3 Step 4: Join A and B according to S, and make all other taxa in form of a star. Branches in black are unknown length and Branches in red are known length

Example of Neighbor-joining cont 4 Step5: Calculate new distance matrix Dxu = (Dix + Djx – Dij) / 2 d(CU) = (d(AC) + d(BC) - d(AB)) / 2 = (4 + 7 - 5) / 2 =3 d(DU) = d(AD) + d(BD) - d(AB) / 2 = 6 Same as EU and FU Then we get the new distance matrix U1 C D E 3 6 7 5 F 8 9

Example of Neighbor-joining cont 5 Repeat 1 to 5 until all branches are done In this example, we will get this at the end

Downside of Neighbor-joining Generates only one possible tree Generates only unrooted tree

Computational Approaches to Phylogenetic Tree Computation Distance Based Methods UPGMA Neighbor joining Character State Methods Maximum Parsimony Method Maximum Likelihood Methods Tree merging Consensus trees, super-trees

Maximum Parsimony Method Parsimony-score: Number of character-changes (mutations) along the evolutionary tree (tree containing labels on internal vertices) Example: Score = 4 Score = 3 AGA AAA AAG GGA AAA 1 2 1 AGA AAA AAG GGA Most parsimonious tree:  Tree with minimal parsimony score Minimal Evolution Principle

Small vs. Large Parsimony We break the problem into two: Small parsimony: Given the topology find the best assignment to internal nodes Large parsimony: Find the topology which gives best score Large parsimony is NP-hard We’ll show solution to small parsimony (Fitch and Sankoff’s algorithms) Input to small parsimony: tree with character-state assignments to leaves Example: A: CAGGTA B: CAGACA C: CGGGTA D: TGCACT E: TGCGTA Aardvark Bison Chimp Dog Elephant

Fitch’s Algorithm Execute independently for each character: Bottom-up phase: Determine set of possible states for each internal node Top-down phase: Pick states for each internal node Dynamic Programming framework 1 2 1 shows bottom up approach 2 shows top down approach Aardvark Bison Chimp Dog Elephant CAGGTA CGGGTA TGCGTA CAGACA TGCACT

Fitch’s Algorithm Bottom-up phase Determine set of possible states for each internal node Initialization: Ri = {si} Do a post-order (from leaves to root) traversal of tree Determine Ri of internal node i with children j, k: T T Parsimony-score = # union operations AGT CT GT score = 3 C T G T A T

Fitch’s Algorithm Top-down phase Pick states for each internal node Pick arbitrary state in Rroot for the root Do pre-order (from root to leaves) traversal of tree Determine sj of internal node j with parent i: T Complexity: O(mnk) #characters #taxa/nodes #states T AGT CT GT score = 3 C T G T A T

Weighted Parsimony Sankoff’s algorithm Each mutation a↔b costs differently - S(a,b). Bottom-up phase: Determine Ri(s) – cost of optimal state-assignment for subtree of i, when it is assigned state s. Top-down phase: Pick optimal states for each internal node Fitch’s algorithm as special case: Ri – set of states which yield minimal-cost subtree of i Same as algorithm for optimal lifted tree alignment (Tutorial #4)

Sankoff’s Algorithm Bottom-up phase Determine Ri(s) for each internal node Initialization: Do a post-order (from leaves to root) traversal of tree Determine Ri of internal node i with children j, k: Natural generalization For non-binary trees Remember pointers ss’ C T G T A T

Sankoff’s Algorithm Top-down phase Pick states for each internal node Select minimal cost character for root (s minimizing Rroot(s)) Do pre-order (from root to leaves) traversal of tree: - For internal node j, with parent i, select state that produced minimal cost at i (use pointers kept in 1st stage) Complexity: O(mnk2) #characters #taxa/nodes #states C T G T A T

Fitch’s Algorithm as special case of Sankoff’s algorithm Unweighted parsimony: Sankoff’s algorithm: Ri(s) - cost of optimal subtree of i, when it is assigned state s Fitch’s algorithm: Score(i) - cost of optimal state-assignment for subtree of i Ri - set of optimal state-assignment for subtree of i We need to show that: Optimal tree assigns node i with state from Ri. Fitch’s bottom-up recursive formula for Ri. is correct: Check for yourselves

Fitch’s Algorithm as special case of Sankoff’s algorithm Unweighted parsimony: Score(i) - cost of optimal state-assignment for subtree of i Ri - set of optimal state-assignment for subtree of i We need to show that: Optimal tree assigns node i with state from Ri. Trivially true for the root Assume (to the contrary) that in an optimal assignment, some node – j is assigned sj∉Rj root i j Parsimony-score is integer Why is this not the case for the weighted version? sj∉Rj  Rj(sj) ≥ Score(j)+1  By switching from sj to some s∊Rj we do not raise the parsimony-score

Computational Approaches to Phylogenetic Tree Computation Distance Based Methods UPGMA Neighbor joining Character State Methods Maximum Parsimony Method Maximum Likelihood Methods Tree merging Consensus trees, super-trees

Maximum likelihood Originally developed for statistics by Ronald Fisher between 1912 and 1922 Therefore, explicit statistical model Uses all the data Tends to outperform parsimony or distance matrix methods

How to construct a tree with Maximum likelihood? Step 1: Make all possible trees depending on the number of leaves Step 2: Calculate likelihood of occurring with the given data L(Tree) = probability of each tree. optimizing branch length generating tree topology Step 3: Pick the tree that have the highest likelihood.

Sounds really great? Num of leaves Num of possible trees 3 1 5 15 10 2027025 13 15058768725 20 8200794532637891559375 Maximum likelihood is very expensive and extremely slow to compute

Comparison of Methods Distance Maximum parsimony Maximum likelihood Uses only pairwise distances Uses only shared derived characters Uses all data Minimizes distance between nearest neighbors Minimizes total distance Maximizes tree likelihood given specific parameter values Very fast Slow Very slow Easily trapped in local optima Assumptions fail when evolution is rapid Highly dependent on assumed evolution model Good for generating tentative tree, or choosing among multiple trees Best option when tractable (<30 taxa, homoplasy rare) Good for very small data sets and for testing trees built using other methods

Methods of evaluating trees Bootstrap: resample initial data set with one datum removed and replaced with another member Jackknife: resample initial distribution with one datum missing and not replaced MCMC: complex, but generates random numbers to produce a desired probability distribution with which to compare model

Phylogeny Flowchart

Difference in Methods Maximum-likelihood and parsimony methods have models of evolution Distance methods do not necessarily Useful aspect in some circumstances E.g., trees built based on whole genomes, presence or absence of genes Religious wars over which methods to use Most people now believe ML based methods are best: most sensitive at large evolutionary distances – but also most time-consuming & depend on specific model of evolution used Most commonly used packages contain software for all three methods: may want to use more than 1 to have confidence in built tree

Phylip URL: http://evolution.genetics.washington.edu/phylip.html Parsimony DNApenny or Protpars Distance Compute distance measure using DNAdist or Protdist Neighbor (can use NJ or UPGMA) ML DNAml

Visualising trees Treeview You can change the graphic presentation of a tree (cladogram, rectangular cladogram, radial tree, phylogram), but not change the structure of a tree http://homopan.wayne.edu/softwares/phoenix/index.html

Reference Mostly from Web