Presentation is loading. Please wait.

Presentation is loading. Please wait.

©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information.

Similar presentations


Presentation on theme: "©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information."— Presentation transcript:

1 ©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information from a well studied to a newly determined sequence, we need an alignment that represents the protein structures of today.

2 ©CMBI 2005 Sequence Alignment In phylogeny one wants to line up residues that came from a common ancestor. For information transfer one wants to line up residues at similar positions in the structure. gap = insertion ór deletion

3 ©CMBI 2005 Global versus Local Alignment Global Local

4 ©CMBI 2005 Global Alignment Align two sequences from “head to toe”, i.e. from 5’ ends to 3’ ends from N-termini to C-termini Algorithm published by: Needleman, S.B. and Wunsch, C.D. (1970) “A general method applicable to the search for similarities in the amino acid sequence of two proteins”. J. Mol. Biol. 48:443-453.

5 ©CMBI 2005 Global Alignment aacttgagc- c345431-1-2-4-6 t1234420-1-4-5 g-2-1012310-3-4 a-2-2-2-10 120-2-3 g-5-4-3-2-1001-1-2 t-6-5-4-3-3-3-2-10-1 --9-8-7-6-5-4-3-2-10 aacttgagc--c-tgagtaacttgagc--c-tgagt

6 ©CMBI 2005 Local Alignment Locate region(s) with high degree of similarity in two sequences Algorithm published by: Smith, T.F. and Waterman, M.S. (1981) “Identification of common molecular subsequences”. J. Mol. Biol. 147:195-197.

7 ©CMBI 2005 Local Alignment aacttgagc-c3454310010t1234421000g2101231100a2210112000g0011010100t0001100000-0000000000aacttgagc-c3454310010t1234421000g2101231100a2210112000g0011010100t0001100000-0000000000 cttgagct-gagcttgagct-gag

8 ©CMBI 2005 Gap Penalty Functions Linear Penalty rises monotonous with length of gap Affine Penalty has a gap-opening and a separate length component Probabilistic Penalties may depend upon the character of the residues involved Other functions Penalty first rises fast, but levels off at greater length values

9 ©CMBI 2005 Significance of Alignment How significant is the alignment that we have found? Or put differently: how much different is the alignment score that we found from scores obtained by aligning random sequences to our sequence?

10 ©CMBI 2005 Calculating Significance Repeat N times (N > 100): Randomise sequence A by shuffling the residues in a random fashion Align randomized sequence A with sequence B, and calculate alignment score S Calculate mean and standard deviation Calculate Z-score: Z = (S genuine – Ŝ random ) / s.d.

11 ©CMBI 2005 Significance of Alignment Random matches Genuine match Alignment score

12 ©CMBI 2005 Significance of Alignment Random matches Random match Alignment score

13 ©CMBI 2001 The amino acids Most information that enters the alignment procedure comes from the physicochemical properties of the amino acids. Example: which is the better alignment (left or right)? CPISRTWASIFRCW CPISRT---LFRCW CPISRTL---FRCW

14 ©CMBI 2001 A difficult alignment problem AYAYAYAYSY LGLPLPLPLP So, in an alignment of more than 2 sequences you can find more information than from just the 2 sequences you are interested in. How do we make these multi- sequence alignmnets? AGAPAPAPSP

15 ©CMBI 2001 A difficult alignment problem solved AYAYAYAYSY AGAPAPAPSP LGLPLPLPLP

16 ©CMBI 2001 Alignment order MIESAYTDSW QFEKSYVTDY -MIESAYTDSW QFEKSYVTDY-

17 ©CMBI 2001 Alignment order MIESAYTDSW QFEKSYVTDY QWERTYASNF -MIESAYTDSW QFEKSYVTDY- QWERTYASNF-

18 ©CMBI 2001 Conclusion Align first the sequences that look very much like each other. So you ‘build up information’ while generating those alignments that most likely are correct.

19 ©CMBI 2001 Alignment order In order to know which sequences look most like each other, you need to do all pairwise alignments first. This is exactly what CLUSTAL does. CLUSTAL builds a tree while doing the build-up of the multiple sequence alignment.

20 ©CMBI 2001 MSA and trees Take, for example, the three sequences: 1 ASWTFGHK 2 GTWSFANR 3 ATWAFADR and you see immediately that 2 and 3 are close, while 1 is further away. So the tree will look roughly like: 3 2 1

21 ©CMBI 2001 Aligning sequences; start with distances D E Matrix of pair-wise distances between five sequences. 10 8 7 D and E are the closest pair. Take them, and collapse the matrix by one row/column.

22 ©CMBI 2001 Aligning sequences D E A B

23 ©CMBI 2001 Aligning sequences D E C A B

24 ©CMBI 2001 Aligning sequences D E C A B

25 ©CMBI 2001 The problem is actually bigger 1 ASWTFGHK 2 GTWSFANR 3 ATWAFADR d(i,j) is the distance between sequences i and j. d(1,2)=6; d(1,3)=5; d(2,3)=3. 1 3 2 So a perfect representation would be: But what if a 4th sequence is added with d(1,4)=4, d(2,4)=5, d(3,4)=4? Where would that sequence sit?

26 ©CMBI 2001 So, nice tree, but what did we actually do? 1)We determined a distance measure 2)We measured all pair-wise distances 3)We reduced the dimensionality of the space of the problem 4)We used an algorithm to visualize In a way, we projected the hyperspace in which we can perfectly describe all pair-wise distances onto a 1-dimensional line. What does this sentence mean?

27 ©CMBI 2001 Back to sequences: In we have N sequences, we can only draw their distance matrix in an N-1 dimensional space. By the time it is a tree, how many dimensions, and how much information have we lost? Perhaps we should cluster in a different way?

28 ©CMBI 2001 Other algorithms Multi-sequence alignment can also be done with an iterative ‘profile’ alignment. A) Make an alignment of few, well-aligned sequences B) Align all sequences using this profile

29 ©CMBI 2001 1. What is a profile? Normally, we use a PAM-like matrix to determine the score for each possible match in an alignment. This assumes that all matches between I E are the same. But the aren’t.

30 ©CMBI 2001 2. What is a profile? QWERTYIPASEF At 1, E and I are QWEKSFIPGSEY both OK. NWERTMVPVSEM QFEKTYLPSSEY At 2, I is OK, NFIKTLMPATEF but E surely not. QYIRSLIPAGEM NYIQSLIPSTEL At 3, E is OK, QFIRSLFPSSEI but I surely not. 1 2 3

31 ©CMBI 2001 3. What is a profile? The knowledge about which residue types are good at a certain position in the multiple sequence alignment can be expressed in a profile. A profile holds for each position 20 scores for the 20 residue types, and sometimes also two values for position specific gap open and gap elongation penalties.

32 ©CMBI 2001 Conserved, variable, or in-between QWERTYASDFGRGH QWERTYASDTHRPM QWERTNMKDFGRKC QWERTNMKDTHRVW Gray = conserved Black = variable Green = correlated mutations

33 ©CMBI 2001 Correlated mutations determine the tree shape 1 AGASDFDFGHKM 2 AGASDFDFRRRL 3 AGLPDFMNGHSI 4 AGLPDFMNRRRV


Download ppt "©CMBI 2001 Alignment Most alignment programs create an alignment that represents what happened during evolution at the DNA level. To carry over information."

Similar presentations


Ads by Google