Download presentation
Presentation is loading. Please wait.
Published byKelley Hampton Modified over 9 years ago
1
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood
2
The goals of phylogenetics To understand the evolutionary relationships among species, e.g. - the order in which they diverged - the time since divergence
3
The assumptions in phylogenetics 1.Any group of organisms are related to each other by descent from a common ancestor 2.The relationships between organisms are described by a bifurcating tree 3.Change in characteristics between organisms occurs over time
4
Phylogenetic “objects” taxon clade node branch Phylogenetic tree
5
Constructing an evolutionary tree Step 2. Construction of multiple sequence alignment Step 1. Selection of appropriate sequences Step 3. Calculation of pair-wise evolutionary distances Step 4. Tree construction Step 5. Tree evaluation
6
1. Sequence selection find sequences with an appropriate amount of divergence: there can be too little or too much divergence (e.g. genes identical across taxa, or non- conserved genomic sequence) try to select orthologous sequences to make sure that the genes used for tree construction are likely to have preserved functions
7
2. Multiple alignment (mitochondrial small subunit RNA gene) informative sites alignment editing mechanics of multiple alignment construction covered in earlier classes in the course
8
3. Pair-wise distance measures how diverged two sequences are: ACGCGTTATTACAGTTGACT ACACGTTATGACAGTTGACT 2 differences in 20bp D = 2/20 = 0.1 (10% divergence) Jukes-Cantor (JC) d = -3/4 ln(1-D*4/3) = 0.10732 (evolutionary distance) how evolutionarily distant two sequences are:
9
Pair-wise distances Pair-wise JC distance matrix
10
More complex substitution models substitutions between less similar residues indicate more divergence than between more similar residues (hydrophobic vs. hydrophilic) ACGTA-212C2-21G12-2T212-ACGTA-212C2-21G12-2T212- ACGCGTTATTACAGTTGACT ACACGTTATGACAGTTGACT A/G (1) + T/G (2) diff = 3 amino acid substitution matrices (e.g. PAM, BLOSUM)
11
4. Tree construction goal is to group (cluster) sequences in a hierarchical fashion each step creates a “node” that represents the common ancestor of all the species/sequences within the group CA of group containing (A,B) CA of group containing (A,B,C,D) CA of group containing (A,B)
12
UPGMA method for phylogeny construction UPGMA (unweighted pair-group method with arithmetic mean) is conceptually very simple Step #1. Cluster two nodes with the shortest distance: e.g. if d(C,D) is lower than d(A,B), d(A,C), etc. then group C and D together. CD is now a new “node” Step #2. re-calculate distance between new node CD and all other current node, e.g.: d(CD, A) = ½ * (d(C,A) + d(D,A)) Go to Step #1. until every node is clustered into a single group CD
13
Example UPGMA phylogeny from a given distance matrix First cluster: Chimp + Pygmy chimp
14
Example (cont’d) After performing the complete clustering with UPGMA, we get the following rooted tree: There are many other tree-building methods (see Higgs & Attwood)
15
Branch lengths ultra-metricity additivity
16
Rooted vs. un-rooted trees Tree rooted with an outgroup (rodents)
17
5. Tree evaluation Goal: to evaluate the strength of the phylogenetic signal in the data and the robustness of the tree Bootstrapping: re-sample the original columns of the alignment with replacement, and produce a random, artificial alignment
18
Bootstrap support Report: for each node, the %-age of times resampled alignments produced the same tree topology (from that node down to the leaves) strong bootstrap support weak bootstrap support
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.