Download presentation
Presentation is loading. Please wait.
Published byJerome Fox Modified over 8 years ago
1
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics
2
Alignment problem Given a set of sequences, produce a multiple alignment which corresponds as well as possible to the biological relationships between the corresponding bio-molecules
3
For homologous proteins Two residues should be aligned (on top of each other) if they are homologous (evolved from the same residue in a common ancestor protein) if they are structurally equivalent
4
Automatic approach Need a way of scoring alignments fitness function which for an alignment quantifies its “goodness” Need an algorithm for finding alignments with good scores Not all methods provide a scoring function for the final alignment!
5
Analysis of fitness function One can test whether the alignments optimal under a given fitness function correspond well to the biological relationships between the sequences For example, if the structure of (some of) the proteins are known.
6
Scoring Alignments In order to find an optimal alignment, we need to be able to measure how good an alignment is Sum of pairs (SP) method: in a column, score each pair of letters and total the scores. Pairs of gaps score 0. Total up scores for each column
7
SP Method Example Using BLOSUM62 matrix, gap penalty -8 In column 1, we have pairs -,S S,S k(k-1)/2 pairs per column -IK SIK SSE -8 - 8 + 4 = -12
8
Align by use of dynamic programming Dynamic programming finds best alignment of k sequences with given scoring scheme For two sequences there are three different column types For three sequences there are seven different column types x means an amino acid, - a blank Sequence1 x - x x - - x Sequence2 x x - x - x - Sequence3 x x x - x - x
9
Use of dynamic programming Dynamic programming finds best alignment of k sequences given scoring scheme
10
Algorithm for dynamic programming
11
Analysis O(n k ) entries to fill Each entry combines O(2 k ) other entries Costs O(k 2 ) to calculate each SP score Overall cost is O(k 2 2 k n k ), or exponential in the number of sequences! NP-complete
12
General progressive alignment Algorithm. General progressive alignment. Progressive alignment of the sequences {s 1, s 2,..., s m } Var C current set of alignments begin C := ∅ for i := 1 to m do C := C union {{s i }} end one alignment of each seq. for i := 1 to m − 1 do choose two alignments A p,A q from C; C := C − {A p,A q } A r := align(A p,A q );C := C union {A r } end C now contains the (single) final alignment end
13
The Clustal Algorithm Three steps: 1 Compare all pairs of sequences to obtain a similarity matrix 2 Based on the similarity matrix, make a guide tree relating all the sequences 3 Perform progressive alignment where the order of the alignments is determined by the guide tree
14
(A) 1 pairwise comparison 2 clustering/making tree (B) 3 Align according to tree
15
Clustal - summary Does not use a score for the final alignment Each pairwise alignment is done using dynamic programming Heuristics are used - tailored to globular proteins Graphical version: ClustalX
16
Phylogeny The basic principle is that the origin of similarity is common ancestry. The field of phylogeny has the goals of working out the relationships among species, populations, individuals, or genes. Usually expressed as a tree.
17
Phylogeny The basic principle is that the origin of similarity is common ancestry. The field of phylogeny has the goals of working out the relationships among species, populations, individuals, or genes. Usually expressed as a tree.
18
Phylogeny A statement of phylogeny among objects assumes homology and depends on classification. Phylogeny states a topology of the relationships based on classification according to similarity of one or more sets of characters, or on a model of evolutionary processes.
20
Phylogeny It is rare for species relationships and ancestry to be directly observable. Evolutionary trees determined from genetic data are often based on inferences from the patterns of similarity, which are all that is observable among species living now.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.