Download presentation
1
Multiple sequence alignment
Tutorial 5 Multiple sequence alignment
2
Multiple Sequence Alignment – When?
More than two sequences DNA Protein Evolutionary relation Homology Phylogenetic tree Detect motif GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGAC GTC-TAG-CGAGCGT-GAT GC-GAAG-AG-GCG-AG-C GCCGTCG-CG-TCGTA-AC A D B C
3
Multiple Sequence Alignment – How?
Dynamic Programming Optimal alignment Exponential in #Sequences Progressive Efficient Heuristic GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGAC GTC-TAG-CGAGCGT-GAT GC-GAAG-AG-GCG-AG-C GCCGTCG-CG-TCGTA-AC A D B C
4
Hierarchical Clustering
A way to represent similarities graphically. Sums up a pairwise distance matrix as a dendrogram. Not all matrices can be embedded in a tree without error. TGTTAAC TGT-AAC TGT--AC ATGT--C ATGTGGC
5
ClustalW Pairwise alignment – calculate distance matrix Guided tree
Progressive alignment using the guide tree “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, J D Thompson et al
6
Progressive (incremental)
ClustalW Progressive (incremental) At each step align two existing alignments or sequences. Gaps present in older alignments remain fixed. Uses the Neighbor Joining algorithm.
7
Neighbor Joining Algorithm
An agglomerative hierarchical clustering method. Constructs unrooted tree. 7
8
Neighbor Joining (Not assuming equal divergence)
Step by step summary: Calculate all pairwise distances. Pick two nodes (i and j) for which the relative distance is minimal (lowest). Define a new node (x). Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes. Continue until two nodes remain – connect with edge.
9
Step 1. Calculate all pairwise distances.
B C D E E D C B A 41 39 22 - 43 20 18 10
10
Measuring Distance Problem: unrelated sequences approach a fraction of difference expected by chance The distance measure converges. Jukes-Cantor
11
Measuring Distance (cont)
Euclidean Distance: Given a multiple sequence alignment, calculate the square root of the sum of the score at every position between two sequences the score increases proportionally to the extent of dissimilarity between residues
12
Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest).
Relative distance between i and j Distance between i and j from the distance table Negative values As the average distance from the common ancestor to the rest of the nodes increases, Mij has a lower value. Select pair that produce lowest value Reevaluate M with every iteration Distance of i from all other sequences Number of leaves (=sequences) left in the tree
13
Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest).
B A 41 39 22 - 43 20 18 10
14
Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest).
Etc. E D C B A -44 -47.3 -74 - -57.3 -64 A,B is the pair with the minimal Mi,j distance. The Mij Table is used only to choose the closest pairs (lowest value) and not for calculating the distances
15
Step 3. Define a new node (x)
B C D E X
16
Step 4. Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes. Now we’ll calculate the distance from X to all other nodes. E D C X 31 29 - 20 18 10
17
Step 5 - Continue until two nodes remain
X -44 -49 - New Mi,j table A B C D E X Y
18
E D Y 11 9 - 10 New Di,j table Only 2 nodes are left. Let’s calculate all the distances to Z A B C D E X Y Z
19
And in newick tree format
The tree 6 4 E D C 5 9 12 10 B A 20 Z Y X And in newick tree format ((C(D,E))(A,B))
20
ClustalW - Input Input sequences Scoring matrix Gap scoring
Input sequences Scoring matrix Gap scoring Output format address
21
Match strength in decreasing order: * : .
ClustalW - Output Match strength in decreasing order: * : .
22
ClustalW - Output
23
ClustalW - Output
24
ClustalW - Output
25
Pairwise alignment scores
ClustalW - Output Pairwise alignment scores Building tree Building alignment Final score
26
ClustalW - Output
27
Match strength in decreasing order: * : .
ClustalW Output Sequence names Sequence positions Match strength in decreasing order: * : .
28
ClustalW - Output
29
ClustalW - Output Branch length
30
ClustalW - Output
31
ClustalW - Output
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.