Download presentation
Presentation is loading. Please wait.
Published byDaniela Infield Modified over 9 years ago
1
Protein Sequence Alignment 1 July 2002- DPJ David Philip Judge
2
SQRGGKTVGYWSTDGYDFNDSKALRNGKFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL KDRGAKIVGSWSTDGYEFESSEAVVDGKFVGLALDLDNQSGKTDERVAAWLAQIAPEFGL AKQGAKPVGFSNPDDYDYEESKSVRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV N Comparing pairs of sequences Objectives: To align two sequences in a biologically logical fashion. To generate a score representing the “quality” of the alignment. Assumption: The sequences being compared are homologous. {so differences between the sequences are exclusively due to evolutionary processes} {Sadly, the computed score has little absolute meaning}
3
ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM
4
ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM
5
ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM
6
ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM
7
ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM
8
P 1 0 -3 0 0 -2 -3 -2 -5 6 1 0 -6 -5 0 0 S 1 0 1 0 0 0 1 -3 0 -2 -3 1 2 1 -2 -3 0 0 0 T 1 0 0 -2 0 0 0 -2 0 -3 0 1 3 -5 -3 0 0 0 W -6 2 -4 -7 -8 -5 -7 -3 -5 -2 -3 -4 0 -6 -2 -5 17 0 -6 -5 -6 0 Y -3 -4 -2 -4 0 -5 0 -4 2 7 -5 -3 0 10 -2 -3 -4 0 V 0 -2 -2 4 2 2 0 -6 -2 4 0 B 0 2 3 -4 1 2 0 1 -2 -3 1 -2 -5 0 0 -5 -3 -2 2 2 0 Z 0 0 1 3 -5 3 3 2 -2 -3 0 -2 -5 0 0 -6 -4 -2 2 3 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A R N D C Q E G H I L K M F P S T W V B Z X Y A 2 0 0 0 0 1 -2 -4 1 1 1 -6 0 0 0 0 -3 N 0 0 2 2 -4 1 1 0 2 -2 -3 1 -2 -4 1 0 -4 -2 2 1 0 D 0 2 4 -5 2 3 1 1 -2 -4 0 -3 -6 0 0 -7 -2 3 3 0 -4 C -2 -4 -5 12 -5 -3 -2 -6 -5 -4 -3 0 -2 -8 -2 -4 -5 0 0 Q 0 1 1 2 4 2 3 -2 1 -5 0 -2 -5 -2 1 3 0 -4 E 0 1 3 -5 2 4 0 1 -2 -3 0 -2 -5 1 0 -7 -2 2 3 0 -4 G 1 -3 0 1 0 5 -2 -3 -4 -2 -3 -5 1 0 -7 0 0 -5 H 2 2 1 -3 3 1 -2 6 0 0 -3 -2 1 2 0 0 I -2 -3 -2 5 2 1 0 -5 4 -2 0 L -2 -3 -4 -6 -2 -3 -4 -2 2 6 4 2 2 -3 -2 2 -3 0 K 3 1 0 -5 1 0 -2 0 -3 5 0 -5 0 0 -3 -2 1 0 0 -4 M 0 -2 -3 -5 -2 -3 -2 2 4 0 6 0 -4 2 -2 0 -5 0 F -4 -6 -4 -5 -2 1 2 -5 0 9 -3 0 7 Dayhoff PAM 250 Matrix R -2 6 0 -4 1 -3 2 -2 -3 3 0 -4 0 0 2 -2 0 0 -4
9
The PAM matrices are computed from aligned families of proteins. Original protein Aligned current proteins
10
P 1 0 -3 0 0 -2 -3 -2 -5 6 1 0 -6 -5 0 0 S 1 0 1 0 0 0 1 -3 0 -2 -3 1 2 1 -2 -3 0 0 0 T 1 0 0 -2 0 0 0 -2 0 -3 0 1 3 -5 -3 0 0 0 W -6 2 -4 -7 -8 -5 -7 -3 -5 -2 -3 -4 0 -6 -2 -5 17 0 -6 -5 -6 0 Y -3 -4 -2 -4 0 -5 0 -4 2 7 -5 -3 0 10 -2 -3 -4 0 V 0 -2 -2 4 2 2 0 -6 -2 4 0 B 0 2 3 -4 1 2 0 1 -2 -3 1 -2 -5 0 0 -5 -3 -2 2 2 0 Z 0 0 1 3 -5 3 3 2 -2 -3 0 -2 -5 0 0 -6 -4 -2 2 3 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A R N D C Q E G H I L K M F P S T W V B Z X Y A 2 0 0 0 0 1 -2 -4 1 1 1 -6 0 0 0 0 -3 N 0 0 2 2 -4 1 1 0 2 -2 -3 1 -2 -4 1 0 -4 -2 2 1 0 D 0 2 4 -5 2 3 1 1 -2 -4 0 -3 -6 0 0 -7 -2 3 3 0 -4 C -2 -4 -5 12 -5 -3 -2 -6 -5 -4 -3 0 -2 -8 -2 -4 -5 0 0 Q 0 1 1 2 4 2 3 -2 1 -5 0 -2 -5 -2 1 3 0 -4 E 0 1 3 -5 2 4 0 1 -2 -3 0 -2 -5 1 0 -7 -2 2 3 0 -4 G 1 -3 0 1 0 5 -2 -3 -4 -2 -3 -5 1 0 -7 0 0 -5 H 2 2 1 -3 3 1 -2 6 0 0 -3 -2 1 2 0 0 I -2 -3 -2 5 2 1 0 -5 4 -2 0 L -2 -3 -4 -6 -2 -3 -4 -2 2 6 4 2 2 -3 -2 2 -3 0 K 3 1 0 -5 1 0 -2 0 -3 5 0 -5 0 0 -3 -2 1 0 0 -4 M 0 -2 -3 -5 -2 -3 -2 2 4 0 6 0 -4 2 -2 0 -5 0 F -4 -6 -4 -5 -2 1 2 -5 0 9 -3 0 7 Dayhoff PAM 250 Matrix R -2 6 0 -4 1 -3 2 -2 -3 3 0 -4 0 0 2 -2 0 0 -4
11
F F F F F F F F F F Y Y Y Y Y F Y Y Y Y F F Y Y Y Y F Y F Y Y Y F F Original protein Aligned current proteins Y F F Y Original protein Aligned current proteins The High F Y score implies: F -> Y substitutions are common + Y -> F substitutions are common Where F is conserved Where Y is conserved
12
P 1 0 -3 0 0 -2 -3 -2 -5 6 1 0 -6 -5 0 0 S 1 0 1 0 0 0 1 -3 0 -2 -3 1 2 1 -2 -3 0 0 0 T 1 0 0 -2 0 0 0 -2 0 -3 0 1 3 -5 -3 0 0 0 W -6 2 -4 -7 -8 -5 -7 -3 -5 -2 -3 -4 0 -6 -2 -5 17 0 -6 -5 -6 0 Y -3 -4 -2 -4 0 -5 0 -4 2 7 -5 -3 0 10 -2 -3 -4 0 V 0 -2 -2 4 2 2 0 -6 -2 4 0 B 0 2 3 -4 1 2 0 1 -2 -3 1 -2 -5 0 0 -5 -3 -2 2 2 0 Z 0 0 1 3 -5 3 3 2 -2 -3 0 -2 -5 0 0 -6 -4 -2 2 3 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A R N D C Q E G H I L K M F P S T W V B Z X Y A 2 0 0 0 0 1 -2 -4 1 1 1 -6 0 0 0 0 -3 N 0 0 2 2 -4 1 1 0 2 -2 -3 1 -2 -4 1 0 -4 -2 2 1 0 D 0 2 4 -5 2 3 1 1 -2 -4 0 -3 -6 0 0 -7 -2 3 3 0 -4 C -2 -4 -5 12 -5 -3 -2 -6 -5 -4 -3 0 -2 -8 -2 -4 -5 0 0 Q 0 1 1 2 4 2 3 -2 1 -5 0 -2 -5 -2 1 3 0 -4 E 0 1 3 -5 2 4 0 1 -2 -3 0 -2 -5 1 0 -7 -2 2 3 0 -4 G 1 -3 0 1 0 5 -2 -3 -4 -2 -3 -5 1 0 -7 0 0 -5 H 2 2 1 -3 3 1 -2 6 0 0 -3 -2 1 2 0 0 I -2 -3 -2 5 2 1 0 -5 4 -2 0 L -2 -3 -4 -6 -2 -3 -4 -2 2 6 4 2 2 -3 -2 2 -3 0 K 3 1 0 -5 1 0 -2 0 -3 5 0 -5 0 0 -3 -2 1 0 0 -4 M 0 -2 -3 -5 -2 -3 -2 2 4 0 6 0 -4 2 -2 0 -5 0 F -4 -6 -4 -5 -2 1 2 -5 0 9 -3 0 7 Dayhoff PAM 250 Matrix R -2 6 0 -4 1 -3 2 -2 -3 3 0 -4 0 0 2 -2 0 0 -4
13
F F F F F F F F F F Y Y Y Y Y F Y Y Y Y F F Y Y Y Y F Y F Y Y Y F F Original protein Aligned current proteins Y F F Y Original protein Aligned current proteins The High W W score implies: Any substitutions are uncommon Where W is conserved W W W W W W W W W W W W W W W W W W W
14
P 1 0 -3 0 0 -2 -3 -2 -5 6 1 0 -6 -5 0 0 S 1 0 1 0 0 0 1 -3 0 -2 -3 1 2 1 -2 -3 0 0 0 T 1 0 0 -2 0 0 0 -2 0 -3 0 1 3 -5 -3 0 0 0 W -6 2 -4 -7 -8 -5 -7 -3 -5 -2 -3 -4 0 -6 -2 -5 17 0 -6 -5 -6 0 Y -3 -4 -2 -4 0 -5 0 -4 2 7 -5 -3 0 10 -2 -3 -4 0 V 0 -2 -2 4 2 2 0 -6 -2 4 0 B 0 2 3 -4 1 2 0 1 -2 -3 1 -2 -5 0 0 -5 -3 -2 2 2 0 Z 0 0 1 3 -5 3 3 2 -2 -3 0 -2 -5 0 0 -6 -4 -2 2 3 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A R N D C Q E G H I L K M F P S T W V B Z X Y A 2 0 0 0 0 1 -2 -4 1 1 1 -6 0 0 0 0 -3 N 0 0 2 2 -4 1 1 0 2 -2 -3 1 -2 -4 1 0 -4 -2 2 1 0 D 0 2 4 -5 2 3 1 1 -2 -4 0 -3 -6 0 0 -7 -2 3 3 0 -4 C -2 -4 -5 12 -5 -3 -2 -6 -5 -4 -3 0 -2 -8 -2 -4 -5 0 0 Q 0 1 1 2 4 2 3 -2 1 -5 0 -2 -5 -2 1 3 0 -4 E 0 1 3 -5 2 4 0 1 -2 -3 0 -2 -5 1 0 -7 -2 2 3 0 -4 G 1 -3 0 1 0 5 -2 -3 -4 -2 -3 -5 1 0 -7 0 0 -5 H 2 2 1 -3 3 1 -2 6 0 0 -3 -2 1 2 0 0 I -2 -3 -2 5 2 1 0 -5 4 -2 0 L -2 -3 -4 -6 -2 -3 -4 -2 2 6 4 2 2 -3 -2 2 -3 0 K 3 1 0 -5 1 0 -2 0 -3 5 0 -5 0 0 -3 -2 1 0 0 -4 M 0 -2 -3 -5 -2 -3 -2 2 4 0 6 0 -4 2 -2 0 -5 0 F -4 -6 -4 -5 -2 1 2 -5 0 9 -3 0 7 Dayhoff PAM 250 Matrix R -2 6 0 -4 1 -3 2 -2 -3 3 0 -4 0 0 2 -2 0 0 -4
15
A8.3 C1.7 D5.3 E6.2 F3.9 G7.2 H2.2 I5.2 K5.7 L9.0 M2.4 N4.4 P5.1 Q4.0 R5.7 S6.9 T5.8 V6.6 W1.3 Y3.2 AMINO ACIDS % A8.3 W1.3 Alanine is very commonTryptophan is relatively rare Typical Amino Acid composition { according to Argos and McCaldon}
16
PAM – Point Accepted Mutation PAM – is a measure of evolutionary distance An evolutionary distance of 1 PAM indicates the probability of a residue mutating during a distance in which 1 point mutation was accepted per 100 residues.
17
Observed %Evolutionary DifferenceDistance (PAMs) 1 1011 2023 3038 4056 5080 60112 70159 80246 Relationship between Observed difference and PAM value.
18
The BLOSUM matrices are also computed from aligned families of proteins. Original protein Aligned current proteins The BLOSUM scoring matrices
19
Regions of high conservation, stored in the BLOCKS database
20
ABEERNALEDLAGERDFWGALSTOUTWRARWATERA Insertions/Deletions => GAPS ACBEERGYALEDILAGERAFGSTOUTFAWATERM
21
Insertions/Deletions => GAPS ABEERNALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM
22
Insertions/Deletions => GAPS ABEERN-ALEDLAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM
23
Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFGSTOUTFAWATERM
24
Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFG-STOUTFAWATERM
25
Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFG--STOUTFAWATERM
26
Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFG---STOUTFAWATERM
27
Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFG---STOUTFA-WATERM
28
Insertions/Deletions => GAPS ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAFG---STOUTFA--WATERM
29
?????? – but adds 12 to the score ABEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAF-G--STOUTFA--WATERM G W = -7 G G = +5 Insertions/Deletions => GAPS
30
?????? – but adds 12 to the score A-BEERN-ALED-LAGERDFWGALSTOUTWRARWATERA ACBEERGYALEDILAGERAF-G--STOUTFA--WATERM ?????? – but adds 4 to the score G W = -7 G G = +5 A C = -2 A A = +2 Insertions/Deletions => GAPSThe need for penalising GAPs.
31
LOCAL and GLOBAL implementations of pairwise alignment LOCAL Alignment
32
LOCAL and GLOBAL implementations of pairwise alignment LOCAL Alignment GCG : bestfit Staden : spin Emboss: water/matcher GLOBAL Alignment GCG : gap Staden : spin Emboss : needle/stretcher
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.