©CMBI 2005 Sequence Alignment In phylogeny one wants to line up residues that came from a common ancestor. For information transfer one wants to line up residues at similar positions in the structure. gap = insertion ór deletion
©CMBI 2005 Global versus Local Alignment Global Local
©CMBI 2005 Global Alignment Align two sequences from “head to toe”, i.e. from 5’ ends to 3’ ends from N-termini to C-termini Algorithm published by: Needleman, S.B. and Wunsch, C.D. (1970) “A general method applicable to the search for similarities in the amino acid sequence of two proteins”. J. Mol. Biol. 48:
©CMBI 2005 Global Alignment aacttgagc- c-6 t-5 g-4 a-3 g-2 t We fill-up this matrix backwards, using a very simple scorings scheme. Identity = 1. Other = 0. Gaps cost -1.
©CMBI 2005 Global Alignment aacttgagc- c-6 t-5 g-4 a-3 g-2 t Score = Where you came from + Gap penalty + Similarity score
©CMBI 2005 Global Alignment aacttgagc- c-6 t-5 g-4 a-3 g-2 t = – 1 = -2
©CMBI 2005 Global Alignment aacttgagc- c t g a g t = – 1 = – 1 = 1
©CMBI 2005 Global Alignment aacttgagc- c t g a g t
©CMBI 2005 Global Alignment aacttgagc- c t g a g t aacttgagc--ct-gagtaacttgagc--ct-gagt
©CMBI 2005 Global Alignment aacttgagc- c t g a g t aacttgagc--c-tgagtaacttgagc--c-tgagt
©CMBI 2005 Local Alignment Locate region(s) with high degree of similarity in two sequences Algorithm published by: Smith, T.F. and Waterman, M.S. (1981) “Identification of common molecular subsequences”. J. Mol. Biol. 147:
©CMBI 2005 Local Alignment aacttgagc-c t g a g t aacttgagc-c t g a g t cttgagct-gagcttgagct-gag
©CMBI 2005 Gap Penalty Functions Linear Penalty rises monotonous with length of gap Affine Penalty has a gap-opening and a separate length component Probabilistic Penalties may depend upon the character of the residues involved Other functions Penalty first rises fast, but levels off at greater length values
©CMBI 2005 Significance of Alignment How significant is the alignment that we have found? Or put differently: how much different is the alignment score that we found from scores obtained by aligning random sequences to our sequence?
©CMBI 2005 Calculating Significance Repeat N times (N > 100): Randomise sequence A by shuffling the residues in a random fashion Align randomized sequence A with sequence B, and calculate alignment score S Calculate mean and standard deviation Calculate Z-score: Z = (S genuine – Ŝ random ) / s.d.
©CMBI 2005 Significance of Alignment Random matches Genuine match Alignment score
©CMBI 2005 Significance of Alignment Random matches Random match Alignment score