Download presentation
Presentation is loading. Please wait.
1
#30 - Phylogenetics Distance-Based Methods
BCB 444/544 11/02/07 Lecture 30 Phylogenetics – Distance-Based Methods #30_Nov02 BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods BCB 444/544 Fall 07 Dobbs
2
Required Reading (before lecture)
#30 - Phylogenetics Distance-Based Methods Required Reading (before lecture) 11/02/07 Wed Oct 30 - Lecture 29 Phylogenetics Basics Chp 10 - pp Thurs Oct 31 - Lab 9 Gene & Regulatory Element Prediction Fri Oct 30 - Lecture 30 Phylogenetic – Distance-Based Methods Chp 11 - pp 142 – 169 Mon Nov 5 - Lecture 31 Phylogenetics – Parsimony and ML Chp 11 - pp BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods BCB 444/544 Fall 07 Dobbs
3
Assignments & Announcements
#30 - Phylogenetics Distance-Based Methods Assignments & Announcements 11/02/07 Mon Oct 29 - HW#5 HW#5 = Hands-on exercises with phylogenetics and tree-building software Due: Mon Nov 5 (not Fri Nov 1 as previously posted) BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods BCB 444/544 Fall 07 Dobbs
4
#30 - Phylogenetics Distance-Based Methods
11/02/07 BCB 544 "Team" Projects Last week of classes will be devoted to Projects Written reports due: Mon Dec 3 (no class that day) Oral presentations (20-30') will be: Wed-Fri Dec 5,6,7 1 or 2 teams will present during each class period See Guidelines for Projects posted online BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods BCB 444/544 Fall 07 Dobbs
5
BCB 544 Only: New Homework Assignment
#30 - Phylogenetics Distance-Based Methods 11/02/07 BCB 544 Only: New Homework Assignment 544 Extra#2 Due: √PART 1 - ASAP PART 2 - meeting prior to 5 PM Fri Nov 2 Part 1 - Brief outline of Project, to Drena & Michael after response/approval, then: Part 2 - More detailed outline of project Read a few papers and summarize status of problem Schedule meeting with Drena & Michael to discuss ideas BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods BCB 444/544 Fall 07 Dobbs
6
#30 - Phylogenetics Distance-Based Methods
11/02/07 Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: Nov 2 Fri - BCB Faculty Seminar 2:10 in 102 ScI Bob Jernigan BBMB, ISU Control of Protein Motions by Structure BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods BCB 444/544 Fall 07 Dobbs
7
#30 - Phylogenetics Distance-Based Methods
11/02/07 Chp 10 - Phylogenetics SECTION IV MOLECULAR PHYLOGENETICS Xiong: Chp 10 Phylogenetics Basics Evolution and Phylogenetics Terminology Gene Phylogeny vs. Species Phylogeny Forms of Tree Representation Why Finding a True Tree is Dificult Procedure of Building a Phylogenetic Tree BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods BCB 444/544 Fall 07 Dobbs
8
Tree Building Procedure
Choose molecular markers Perform MSA Choose a model of evolution Determine tree building method Assess tree reliability BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
9
Choice of Molecular Markers
Very closely related organisms - nucleic acid sequence will show more differences For individuals within a species - faster mutation rate is in noncoding regions of mtDNA More distantly related species - slowly evolving nucleic acid sequences like ribosomal RNA or protein sequences Very distantly related species - use highly conserved protein sequences BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
10
Multiple Sequence Alignment
Most critical step in tree building - cannot build correct tree without correct alignment Should build alignments with multiple programs, then inspect and compare to identify the most reasonable one Most alignments need manual editing Make sure important functional residues align Align secondary structure elements Use full alignment or just parts BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
11
Automatic Editing of Alignments
Rascal and NorMD – correct alignment errors, remove potentially unrelated or highly divergent sequences Gblocks – detect and eliminate poorly aligned positions and divergent regions BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
12
How do we measure divergence between sequences?
Simple measure – just count the number of substitutions observed between the sequences in the MSA Problem – number of substitutions may not represent the number of evolutionary events that actually occurred BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
13
Multiple Substitutions
C A T G Just because we only see one difference, does not mean that there was only one evolutionary event BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
14
Multiple Substitutions
A T G Just because we only see no difference, does not mean that there were no evolutionary events BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
15
Choosing Substitution Models
Statistical models of evolution are used to correct for the multiple substitution problem Focus on DNA models BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
16
Can be used to correct for multiple substitutions
Jukes-Cantor Model Jukes-Cantor model assumes all nucleotides are substituted with equal probability Can be used to correct for multiple substitutions BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
17
Many Other Models BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
18
Evolutionary Models for Protein Sequences
PAM and JTT substitution matrices already take into account multiple substitutions There are also models similar to Jukes-Cantor for protein sequences BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
19
This variation is called among-site rate heterogeneity
What about differences in mutation rates between positions within a sequence? One of our assumptions was that all positions in a sequence are evolving at the same rate Bad assumption Third position in a codon changes with higher frequency In proteins, some amino acids can change and others cannot This variation is called among-site rate heterogeneity Many tree building programs have parameters meant to deal with this problem – adds to complexity of getting the correct tree BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
20
Chp 11 – Phylogenetic Tree Construction Methods and Programs
#30 - Phylogenetics Distance-Based Methods 11/02/07 Chp 11 – Phylogenetic Tree Construction Methods and Programs SECTION IV MOLECULAR PHYLOGENETICS Xiong: Chp 11 Phylogenetic Tree Construction Methods and Programs Distance-Based Methods Character-Based Methods Phylogenetic Tree Evaluation Phylogenetic Programs BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods BCB 444/544 Fall 07 Dobbs
21
Two main categories of tree building methods Distance-based
Tree Construction Two main categories of tree building methods Distance-based Overall similarity between sequences Character-based Consider the entire MSA BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
22
Distance-Based Methods
Given a MSA and an evolutionary model, calculate the distance between all pairs of sequences Construct distance matrix Construct phylogenetic tree based on the distance matrix BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
23
Distance Matrices a a 0 b 6 0 c 7 3 0 d 14 10 9 0 a b c d b c d 1 2 3
5 6 7 8 BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
24
Distance-Based Methods
Two ways to construct a tree based on a distance matrix Clustering Optimality BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
25
Clustering-Based Methods
E.g., UPGMA and Neighbor-Joining A cluster is a set of taxa Interspecies distances translate into intercluster distances Clusters are repeatedly merged “Closest” clusters merged first Distances are recomputed after merging BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
26
UPGMA – Unweighted Pair Group Method Using Arithmetic Average
Uses molecular clock assumption – all taxa evolve at a constant rate and are equally distant from the root (ultrametric tree) This assumption is usually wrong So why use UPGMA? Very fast BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
27
UPGMA Example BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
28
UPGMA Example BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
29
UPGMA Example BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
30
UPGMA Example BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
31
Implicitly finds a pair of neighboring taxa
Neighbor Joining Idea: Find a pair of taxa that are close to each other but far from other taxa Implicitly finds a pair of neighboring taxa No molecular clock assumption BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
32
Neighbor Joining NJ corrects for unequal evolutionary rates between sequences by using a conversion step The conversion step requires calculation of “r-values” and “transformed r-values” BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
33
The r-value for a sequence is:
Neighbor Joining The r-value for a sequence is: The sum of the distances between sequence i and all other sequences BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
34
The transformed r-value for a sequence is:
Neighbor Joining The transformed r-value for a sequence is: Where n is the number of taxa Transformed r-values are used to determine the distance of a taxon to the nearest node BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
35
The converted distance between two sequences is:
Neighbor Joining The converted distance between two sequences is: These converted distances are used in building the tree BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
36
Neighbor Joining The final equation we need is for computing the distance from a new cluster to each taxa. Assume taxa i and j were merged into a cluster u. The distance from taxa i to cluster u is: BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
37
Neighbor Joining Example
C 0.40 0.35 0.45 D 0.60 0.70 0.55 BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
38
Neighbor Joining Example
Initialize tree into a star shape with all taxa connected to the center Step 1: Compute r-values and transformed r-values for all taxa BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
39
Neighbor Joining Example
Step 2: Compute converted distances BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
40
Neighbor Joining Example
Step 3: Fill out converted distance matrix A B C -1.05 -1 D BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
41
Neighbor Joining Example
Step 4: Create a node by merging closest taxa In this example, the distance between A and B is the same as the distance between C and D We can pick either pair to start with Let’s pick A and B and create a node called U B ? A A U B ? D C BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
42
Neighbor Joining Example
Step 5: Compute branch lengths Use the equation for computing the distance from a taxa to a node 0.15 A U B 0.25 BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
43
Neighbor Joining Example
Step 6: Construct reduced distance matrix by computing converted distances from each taxa to the new node U In UPGMA, we simply calculated the average BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
44
Neighbor Joining Example
Our reduced distance matrix: U C 0.20 D 0.45 0.55 BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
45
Neighbor Joining Example
From here, we go back to step 1 Continue until all taxa have been decomposed from the star tree BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
46
Optimality-Based Methods
Clustering methods produce a single tree with no ability to judge how good it is compared to alternative tree topologies Optimality-based methods compare all possible tree topologies and select a tree that best fits the distance matrix Two algorithms: Fitch-Margoliash Minimum evolution BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
47
Fitch-Margoliash Selects best tree among all possible trees based on minimum deviation between distances calculated in the tree and distances in the distance matrix Basically, a least squares method Dij = distance between i and j in matrix dij = distance between i and j in tree Objective: Find tree that minimizes BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
48
Similar to Fitch-Margoliash, but uses a different optimality criterion
Minimum Evolution Similar to Fitch-Margoliash, but uses a different optimality criterion Searches for a tree with the minimum total branch length This is an indirect way of achieving the best fit of the branch lengths with the original data BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
49
Summary of Distance-Based Methods
Clustering-based methods: Computationally very fast and can handle large datasets that other methods cannot Not guaranteed to find the best tree Optimality-based methods: Better overall accuracies Computationally slow All distance-based methods lose all sequence information and cannot infer the most likely state at an internal node BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.