Multiple sequence alignment Tutorial 5 Multiple sequence alignment
Multiple Sequence Alignment – When? More than two sequences DNA Protein Evolutionary relation Homology Phylogenetic tree Detect motif GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGAC GTC-TAG-CGAGCGT-GAT GC-GAAG-AG-GCG-AG-C GCCGTCG-CG-TCGTA-AC A D B C
Multiple Sequence Alignment – How? Dynamic Programming Optimal alignment Exponential in #Sequences Progressive Efficient Heuristic GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGAC GTC-TAG-CGAGCGT-GAT GC-GAAG-AG-GCG-AG-C GCCGTCG-CG-TCGTA-AC A D B C
Hierarchical Clustering A way to represent similarities graphically. Sums up a pairwise distance matrix as a dendrogram. Not all matrices can be embedded in a tree without error. TGTTAAC TGT-AAC TGT--AC ATGT--C ATGTGGC
ClustalW Pairwise alignment – calculate distance matrix Guided tree Progressive alignment using the guide tree “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, J D Thompson et al
Progressive (incremental) ClustalW Progressive (incremental) At each step align two existing alignments or sequences. Gaps present in older alignments remain fixed. Uses the Neighbor Joining algorithm.
Neighbor Joining Algorithm An agglomerative hierarchical clustering method. Constructs unrooted tree. 7
Neighbor Joining (Not assuming equal divergence) Step by step summary: Calculate all pairwise distances. Pick two nodes (i and j) for which the relative distance is minimal (lowest). Define a new node (x). Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes. Continue until two nodes remain – connect with edge.
Step 1. Calculate all pairwise distances. B C D E E D C B A 41 39 22 - 43 20 18 10
Measuring Distance Problem: unrelated sequences approach a fraction of difference expected by chance The distance measure converges. Jukes-Cantor
Measuring Distance (cont) Euclidean Distance: Given a multiple sequence alignment, calculate the square root of the sum of the score at every position between two sequences the score increases proportionally to the extent of dissimilarity between residues
Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest). Relative distance between i and j Distance between i and j from the distance table Negative values As the average distance from the common ancestor to the rest of the nodes increases, Mij has a lower value. Select pair that produce lowest value Reevaluate M with every iteration Distance of i from all other sequences Number of leaves (=sequences) left in the tree
Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest). B A 41 39 22 - 43 20 18 10
Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest). Etc. E D C B A -44 -47.3 -74 - -57.3 -64 A,B is the pair with the minimal Mi,j distance. The Mij Table is used only to choose the closest pairs (lowest value) and not for calculating the distances
Step 3. Define a new node (x) B C D E X
Step 4. Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes. Now we’ll calculate the distance from X to all other nodes. E D C X 31 29 - 20 18 10
Step 5 - Continue until two nodes remain X -44 -49 - New Mi,j table A B C D E X Y
E D Y 11 9 - 10 New Di,j table Only 2 nodes are left. Let’s calculate all the distances to Z A B C D E X Y Z
And in newick tree format The tree 6 4 E D C 5 9 12 10 B A 20 Z Y X And in newick tree format ((C(D,E))(A,B))
ClustalW - Input Input sequences Scoring matrix Gap scoring http://www.ebi.ac.uk/Tools/clustalw2/index.html Input sequences Scoring matrix Gap scoring Output format Email address
Match strength in decreasing order: * : . ClustalW - Output Match strength in decreasing order: * : .
ClustalW - Output
ClustalW - Output
ClustalW - Output
Pairwise alignment scores ClustalW - Output Pairwise alignment scores Building tree Building alignment Final score
ClustalW - Output
Match strength in decreasing order: * : . ClustalW Output Sequence names Sequence positions Match strength in decreasing order: * : .
ClustalW - Output
ClustalW - Output Branch length
ClustalW - Output
ClustalW - Output