Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein Sequence Alignment 1 July 2002- DPJ David Philip Judge.

Similar presentations


Presentation on theme: "Protein Sequence Alignment 1 July 2002- DPJ David Philip Judge."— Presentation transcript:

1 Protein Sequence Alignment 1 July 2002- DPJ David Philip Judge

2 SQRGGKTVGYWSTDGYDFNDSKALRNGKFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL KDRGAKIVGSWSTDGYEFESSEAVVDGKFVGLALDLDNQSGKTDERVAAWLAQIAPEFGL AKQGAKPVGFSNPDDYDYEESKSVRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV N Comparing pairs of sequences Objectives: To align two sequences in a biologically logical fashion. To generate a score representing the “quality” of the alignment. Assumption: The sequences being compared are homologous. {so differences between the sequences are exclusively due to evolutionary processes} {Sadly, the computed score has little absolute meaning}

3 ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM

4 ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM

5 ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM

6 ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM

7 ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM

8 P 1 0 -3 0 0 -2 -3 -2 -5 6 1 0 -6 -5 0 0 S 1 0 1 0 0 0 1 -3 0 -2 -3 1 2 1 -2 -3 0 0 0 T 1 0 0 -2 0 0 0 -2 0 -3 0 1 3 -5 -3 0 0 0 W -6 2 -4 -7 -8 -5 -7 -3 -5 -2 -3 -4 0 -6 -2 -5 17 0 -6 -5 -6 0 Y -3 -4 -2 -4 0 -5 0 -4 2 7 -5 -3 0 10 -2 -3 -4 0 V 0 -2 -2 4 2 2 0 -6 -2 4 0 B 0 2 3 -4 1 2 0 1 -2 -3 1 -2 -5 0 0 -5 -3 -2 2 2 0 Z 0 0 1 3 -5 3 3 2 -2 -3 0 -2 -5 0 0 -6 -4 -2 2 3 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A R N D C Q E G H I L K M F P S T W V B Z X Y A 2 0 0 0 0 1 -2 -4 1 1 1 -6 0 0 0 0 -3 N 0 0 2 2 -4 1 1 0 2 -2 -3 1 -2 -4 1 0 -4 -2 2 1 0 D 0 2 4 -5 2 3 1 1 -2 -4 0 -3 -6 0 0 -7 -2 3 3 0 -4 C -2 -4 -5 12 -5 -3 -2 -6 -5 -4 -3 0 -2 -8 -2 -4 -5 0 0 Q 0 1 1 2 4 2 3 -2 1 -5 0 -2 -5 -2 1 3 0 -4 E 0 1 3 -5 2 4 0 1 -2 -3 0 -2 -5 1 0 -7 -2 2 3 0 -4 G 1 -3 0 1 0 5 -2 -3 -4 -2 -3 -5 1 0 -7 0 0 -5 H 2 2 1 -3 3 1 -2 6 0 0 -3 -2 1 2 0 0 I -2 -3 -2 5 2 1 0 -5 4 -2 0 L -2 -3 -4 -6 -2 -3 -4 -2 2 6 4 2 2 -3 -2 2 -3 0 K 3 1 0 -5 1 0 -2 0 -3 5 0 -5 0 0 -3 -2 1 0 0 -4 M 0 -2 -3 -5 -2 -3 -2 2 4 0 6 0 -4 2 -2 0 -5 0 F -4 -6 -4 -5 -2 1 2 -5 0 9 -3 0 7 Dayhoff PAM 250 Matrix R -2 6 0 -4 1 -3 2 -2 -3 3 0 -4 0 0 2 -2 0 0 -4

9 The PAM matrices are computed from aligned families of proteins. Original protein Aligned current proteins

10 P 1 0 -3 0 0 -2 -3 -2 -5 6 1 0 -6 -5 0 0 S 1 0 1 0 0 0 1 -3 0 -2 -3 1 2 1 -2 -3 0 0 0 T 1 0 0 -2 0 0 0 -2 0 -3 0 1 3 -5 -3 0 0 0 W -6 2 -4 -7 -8 -5 -7 -3 -5 -2 -3 -4 0 -6 -2 -5 17 0 -6 -5 -6 0 Y -3 -4 -2 -4 0 -5 0 -4 2 7 -5 -3 0 10 -2 -3 -4 0 V 0 -2 -2 4 2 2 0 -6 -2 4 0 B 0 2 3 -4 1 2 0 1 -2 -3 1 -2 -5 0 0 -5 -3 -2 2 2 0 Z 0 0 1 3 -5 3 3 2 -2 -3 0 -2 -5 0 0 -6 -4 -2 2 3 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A R N D C Q E G H I L K M F P S T W V B Z X Y A 2 0 0 0 0 1 -2 -4 1 1 1 -6 0 0 0 0 -3 N 0 0 2 2 -4 1 1 0 2 -2 -3 1 -2 -4 1 0 -4 -2 2 1 0 D 0 2 4 -5 2 3 1 1 -2 -4 0 -3 -6 0 0 -7 -2 3 3 0 -4 C -2 -4 -5 12 -5 -3 -2 -6 -5 -4 -3 0 -2 -8 -2 -4 -5 0 0 Q 0 1 1 2 4 2 3 -2 1 -5 0 -2 -5 -2 1 3 0 -4 E 0 1 3 -5 2 4 0 1 -2 -3 0 -2 -5 1 0 -7 -2 2 3 0 -4 G 1 -3 0 1 0 5 -2 -3 -4 -2 -3 -5 1 0 -7 0 0 -5 H 2 2 1 -3 3 1 -2 6 0 0 -3 -2 1 2 0 0 I -2 -3 -2 5 2 1 0 -5 4 -2 0 L -2 -3 -4 -6 -2 -3 -4 -2 2 6 4 2 2 -3 -2 2 -3 0 K 3 1 0 -5 1 0 -2 0 -3 5 0 -5 0 0 -3 -2 1 0 0 -4 M 0 -2 -3 -5 -2 -3 -2 2 4 0 6 0 -4 2 -2 0 -5 0 F -4 -6 -4 -5 -2 1 2 -5 0 9 -3 0 7 Dayhoff PAM 250 Matrix R -2 6 0 -4 1 -3 2 -2 -3 3 0 -4 0 0 2 -2 0 0 -4

11 F F F F F F F F F F Y Y Y Y Y F Y Y Y Y F F Y Y Y Y F Y F Y Y Y F F Original protein Aligned current proteins Y F F Y Original protein Aligned current proteins The High F Y score implies: F -> Y substitutions are common + Y -> F substitutions are common Where F is conserved Where Y is conserved

12 P 1 0 -3 0 0 -2 -3 -2 -5 6 1 0 -6 -5 0 0 S 1 0 1 0 0 0 1 -3 0 -2 -3 1 2 1 -2 -3 0 0 0 T 1 0 0 -2 0 0 0 -2 0 -3 0 1 3 -5 -3 0 0 0 W -6 2 -4 -7 -8 -5 -7 -3 -5 -2 -3 -4 0 -6 -2 -5 17 0 -6 -5 -6 0 Y -3 -4 -2 -4 0 -5 0 -4 2 7 -5 -3 0 10 -2 -3 -4 0 V 0 -2 -2 4 2 2 0 -6 -2 4 0 B 0 2 3 -4 1 2 0 1 -2 -3 1 -2 -5 0 0 -5 -3 -2 2 2 0 Z 0 0 1 3 -5 3 3 2 -2 -3 0 -2 -5 0 0 -6 -4 -2 2 3 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A R N D C Q E G H I L K M F P S T W V B Z X Y A 2 0 0 0 0 1 -2 -4 1 1 1 -6 0 0 0 0 -3 N 0 0 2 2 -4 1 1 0 2 -2 -3 1 -2 -4 1 0 -4 -2 2 1 0 D 0 2 4 -5 2 3 1 1 -2 -4 0 -3 -6 0 0 -7 -2 3 3 0 -4 C -2 -4 -5 12 -5 -3 -2 -6 -5 -4 -3 0 -2 -8 -2 -4 -5 0 0 Q 0 1 1 2 4 2 3 -2 1 -5 0 -2 -5 -2 1 3 0 -4 E 0 1 3 -5 2 4 0 1 -2 -3 0 -2 -5 1 0 -7 -2 2 3 0 -4 G 1 -3 0 1 0 5 -2 -3 -4 -2 -3 -5 1 0 -7 0 0 -5 H 2 2 1 -3 3 1 -2 6 0 0 -3 -2 1 2 0 0 I -2 -3 -2 5 2 1 0 -5 4 -2 0 L -2 -3 -4 -6 -2 -3 -4 -2 2 6 4 2 2 -3 -2 2 -3 0 K 3 1 0 -5 1 0 -2 0 -3 5 0 -5 0 0 -3 -2 1 0 0 -4 M 0 -2 -3 -5 -2 -3 -2 2 4 0 6 0 -4 2 -2 0 -5 0 F -4 -6 -4 -5 -2 1 2 -5 0 9 -3 0 7 Dayhoff PAM 250 Matrix R -2 6 0 -4 1 -3 2 -2 -3 3 0 -4 0 0 2 -2 0 0 -4

13 F F F F F F F F F F Y Y Y Y Y F Y Y Y Y F F Y Y Y Y F Y F Y Y Y F F Original protein Aligned current proteins Y F F Y Original protein Aligned current proteins The High W W score implies: Any substitutions are uncommon Where W is conserved W W W W W W W W W W W W W W W W W W W

14 P 1 0 -3 0 0 -2 -3 -2 -5 6 1 0 -6 -5 0 0 S 1 0 1 0 0 0 1 -3 0 -2 -3 1 2 1 -2 -3 0 0 0 T 1 0 0 -2 0 0 0 -2 0 -3 0 1 3 -5 -3 0 0 0 W -6 2 -4 -7 -8 -5 -7 -3 -5 -2 -3 -4 0 -6 -2 -5 17 0 -6 -5 -6 0 Y -3 -4 -2 -4 0 -5 0 -4 2 7 -5 -3 0 10 -2 -3 -4 0 V 0 -2 -2 4 2 2 0 -6 -2 4 0 B 0 2 3 -4 1 2 0 1 -2 -3 1 -2 -5 0 0 -5 -3 -2 2 2 0 Z 0 0 1 3 -5 3 3 2 -2 -3 0 -2 -5 0 0 -6 -4 -2 2 3 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A R N D C Q E G H I L K M F P S T W V B Z X Y A 2 0 0 0 0 1 -2 -4 1 1 1 -6 0 0 0 0 -3 N 0 0 2 2 -4 1 1 0 2 -2 -3 1 -2 -4 1 0 -4 -2 2 1 0 D 0 2 4 -5 2 3 1 1 -2 -4 0 -3 -6 0 0 -7 -2 3 3 0 -4 C -2 -4 -5 12 -5 -3 -2 -6 -5 -4 -3 0 -2 -8 -2 -4 -5 0 0 Q 0 1 1 2 4 2 3 -2 1 -5 0 -2 -5 -2 1 3 0 -4 E 0 1 3 -5 2 4 0 1 -2 -3 0 -2 -5 1 0 -7 -2 2 3 0 -4 G 1 -3 0 1 0 5 -2 -3 -4 -2 -3 -5 1 0 -7 0 0 -5 H 2 2 1 -3 3 1 -2 6 0 0 -3 -2 1 2 0 0 I -2 -3 -2 5 2 1 0 -5 4 -2 0 L -2 -3 -4 -6 -2 -3 -4 -2 2 6 4 2 2 -3 -2 2 -3 0 K 3 1 0 -5 1 0 -2 0 -3 5 0 -5 0 0 -3 -2 1 0 0 -4 M 0 -2 -3 -5 -2 -3 -2 2 4 0 6 0 -4 2 -2 0 -5 0 F -4 -6 -4 -5 -2 1 2 -5 0 9 -3 0 7 Dayhoff PAM 250 Matrix R -2 6 0 -4 1 -3 2 -2 -3 3 0 -4 0 0 2 -2 0 0 -4

15 A8.3 C1.7 D5.3 E6.2 F3.9 G7.2 H2.2 I5.2 K5.7 L9.0 M2.4 N4.4 P5.1 Q4.0 R5.7 S6.9 T5.8 V6.6 W1.3 Y3.2 AMINO ACIDS % A8.3 W1.3 Alanine is very commonTryptophan is relatively rare Typical Amino Acid composition { according to Argos and McCaldon}

16 PAM – Point Accepted Mutation PAM – is a measure of evolutionary distance An evolutionary distance of 1 PAM indicates the probability of a residue mutating during a distance in which 1 point mutation was accepted per 100 residues.

17 Observed %Evolutionary DifferenceDistance (PAMs) 1 1011 2023 3038 4056 5080 60112 70159 80246 Relationship between Observed difference and PAM value.

18 The BLOSUM matrices are also computed from aligned families of proteins. Original protein Aligned current proteins The BLOSUM scoring matrices

19 Regions of high conservation, stored in the BLOCKS database

20 ABEERNALEDLAGERDFWGALSTOUTWRARWATERA Insertions/Deletions => GAPS ACBEERGYALEDILAGERAFGSTOUTFAWATERM

21 Insertions/Deletions => GAPS ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM

22 Insertions/Deletions => GAPS ABEERN-ALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM

23 Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM

24 Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFG-STOUTFAWATERM

25 Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFG--STOUTFAWATERM

26 Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFG---STOUTFAWATERM

27 Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFG---STOUTFA-WATERM

28 Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFG---STOUTFA--WATERM

29 ?????? – but adds 12 to the score ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAF-G--STOUTFA--WATERM G W = -7 G G = +5 Insertions/Deletions => GAPS

30 ?????? – but adds 12 to the score A-BEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAF-G--STOUTFA--WATERM ?????? – but adds 4 to the score G W = -7 G G = +5 A C = -2 A A = +2 Insertions/Deletions => GAPSThe need for penalising GAPs.

31 LOCAL and GLOBAL implementations of pairwise alignment LOCAL Alignment

32 LOCAL and GLOBAL implementations of pairwise alignment LOCAL Alignment GCG : bestfit Staden : spin Emboss: water/matcher GLOBAL Alignment GCG : gap Staden : spin Emboss : needle/stretcher


Download ppt "Protein Sequence Alignment 1 July 2002- DPJ David Philip Judge."

Similar presentations


Ads by Google