. Class 5: Multiple Sequence Alignment
Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKGFYPSD--IAVEWESNG-- Homologous residues are aligned together in columns Homologous - in the structural and evolutionary sense Ideally, a column of aligned residues occupy similar 3d structural positions
Multiple alignment – why? u Identify sequence that belongs to a family Family – a collection of homologous, with similar sequence, 3d structure, function or evolutionary history u Find features that are conserved in the whole family Highly conserved regions, core structural elements
The relation between the divergence of sequence and structure [Durbin p. 137, redrawn from data in Chothia and Lesk (1986)]
Scoring a multiple alignment (1) Important features of multiple alignment: uSuSome positions are more conserved than others Position specific scoring uSuSequences are not independent (related by phylogenetic tree) Ideally, specify a complete model of molecular sequence evolution
Scoring a multiple alignment (2) Unfortunately, not enough data … Assumption (1) Columns of alignment are statistically independent.
Minimum entropy Assumption (2) Symbols within columns are independent Entropy measure
Sum of pairs (SP) Columns are scored by a “sum of pairs” function, using a substitution scoring matrix Note:
Multidimensional DP
Complexity Space: Time:
Pairwise projections of MA
MSA (i) [Carrillo and Lipman, 1988]
MSA (ii)
MSA (iii) Algorithm sketch
Progressive alignment methods (i) Basic idea: construct a succession of PW alignments Variatoins: u PW alignment order u One growing alignment or subfamilies u Alignment and scoring procedure
Progressive alignment methods (ii) Most important heuristic – align the most similar pairs first. Many algorithms build a “guide tree”: u Leaves – sequence u Interior nodes – alignments u Root – complete multiple alignment
Feng-Doolittle (1987) u Calculate all pairwise distances using alignment scores: u Construct a guide tree using hierarchical clustering u Highest scoring pairwise alignment determines sequence to group alignment
Profile alignment u Use profiles for group to sequence and group to group alignments CLUSTALW (Thompson et al., 1994): Similar to Feng-Doolittle, but uses profile alignment methods Numerous heuristics
Iterative Refinement u Addresses “frozen” sub-alignment problem u Iteratively realign sequences or groups to a profile of the rest u Barton and Sternberg (1987) Align two most similar sequences Align current profile to most similar sequence Remove each sequence and align it to profile