Download presentation
Presentation is loading. Please wait.
1
“Nothing in Biology makes sense except in the light of evolution” (Theodosius Dobzhansky (1900-1975)) “Nothing in bioinformatics makes sense except in the light of Biology” Bioinformatics
2
Evolution Three requirements: Template structure providing stability (DNA) Copying mechanism (meiosis) Mechanism providing variation (mutations; insertions and deletions; crossing-over; etc.)
3
Evolution Ancestral sequence: ABCD ACCD (B C) ABD (C ø) ACCD or ACCD Pairwise Alignment AB─D A─BD mutation deletion
4
Evolution Ancestral sequence: ABCD ACCD (B C) ABD (C ø) ACCD or ACCD Pairwise Alignment AB─D A─BD true alignment mutation deletion
5
Example: Pairwise sequence alignment needs sense of evolution Global dynamic programming MDAGSTVILCFVG MDAASTILCGSMDAASTILCGS Amino Acid Exchange Matrix Gap penalties (open,extension) Search matrix MDAGSTVILCFVG- MDAAST-ILC--GS Evolution
6
Sequence alignment History 1970 Needleman-Wunsch global pair-wise alignment 1981 Smith-Waterman local pair-wise alignment 1984 Hogeweg-Hesper progressive multiple alignment 1989 Lipman-Altschul-Kececioglu simultaneous multiple alignment 1994 Hidden Markov Models (HMM) for multiple alignment 1996 Iterative strategies for progressive multiple alignment revived 1997PSI-Blast (PSSM)
7
Pair-wise alignment Combinatorial explosion - 1 gap in 1 sequence: n+1 possibilities - 2 gaps in 1 sequence: (n+1)n - 3 gaps in 1 sequence: (n+1)n(n-1), etc. 2n (2n)! 2 2n = ~ n (n!) 2 n 2 sequences of 300 a.a.: ~10 88 alignments 2 sequences of 1000 a.a.: ~10 600 alignments! T D W V T A L K T D W L - - I K
8
A protein sequence alignment MSTGAVLIY--TSILIKECHAMPAGNE----- ---GGILLFHRTHELIKESHAMANDEGGSNNS A DNA sequence alignment attcgttggcaaatcgcccctatccggccttaa attt---ggcggatcg-cctctacgggcc----
9
Dynamic programming Scoring alignments S a,b = + gp(k) = pi + k pe affine gap penalties pi and pe are the penalties for gap initialisation and extension, respectively
10
Dynamic programming Scoring alignments 101 Amino Acid Exchange Matrix Affine gap penalties (open, extension) 20 20 Score: s(T,T)+s(D,D)+s(W,W)+s(V,L)+P o +2P x + +s(L,I)+s(K,K) T D W V T A L K T D W L - - I K
11
Amino acid exchange matrices How do we get one? And how do we get associated gap penalties? First systematic method to derive a.a. exchange matrices by Margaret Dayhoff et al. (1978) – Atlas of Protein Structure 20 20
12
A 2 R -2 6 N 0 0 2 D 0 -1 2 4 C -2 -4 -4 -5 12 Q 0 1 1 2 -5 4 E 0 -1 1 3 -5 2 4 G 1 -3 0 1 -3 -1 0 5 H -1 2 2 1 -3 3 1 -2 6 I -1 -2 -2 -2 -2 -2 -2 -3 -2 5 L -2 -3 -3 -4 -6 -2 -3 -4 -2 2 6 K -1 3 1 0 -5 1 0 -2 0 -2 -3 5 M -1 0 -2 -3 -5 -1 -2 -3 -2 2 4 0 6 F -4 -4 -4 -6 -4 -5 -5 -5 -2 1 2 -5 0 9 P 1 0 -1 -1 -3 0 -1 -1 0 -2 -3 -1 -2 -5 6 S 1 0 1 0 0 -1 0 1 -1 -1 -3 0 -2 -3 1 2 T 1 -1 0 0 -2 -1 0 0 -1 0 -2 0 -1 -3 0 1 3 W -6 2 -4 -7 -8 -5 -7 -7 -3 -5 -2 -3 -4 0 -6 -2 -5 17 Y -3 -4 -2 -4 0 -4 -4 -5 0 -1 -1 -4 -2 7 -5 -3 -3 0 10 V 0 -2 -2 -2 -2 -2 -2 -1 -2 4 2 -2 2 -1 -1 -1 0 -6 -2 4 B 0 -1 2 3 -4 1 2 0 1 -2 -3 1 -2 -5 -1 0 0 -5 -3 -2 2 Z 0 0 1 3 -5 3 3 -1 2 -2 -3 0 -2 -5 0 0 -1 -6 -4 -2 2 3 A R N D C Q E G H I L K M F P S T W Y V B Z PAM250 matrix amino acid exchange matrix (log odds) Positive exchange values denote mutations that are more likely than randomly expected, while negative numbers correspond to avoided mutations compared to the randomly expected situation
13
Pairwise sequence alignment Global dynamic programming MDAGSTVILCFVG MDAASTILCGSMDAASTILCGS Amino Acid Exchange Matrix Gap penalties (open,extension) Search matrix MDAGSTVILCFVG- MDAAST-ILC--GS Evolution
14
Global dynamic programming i-1 j-1 S i,j = s i,j + Max Max{S 0<x<i-1, j-1 - Pi - (i-x-1)Px} S i-1,j-1 Max{S i-1, 0<y<j-1 - Pi - (j-y-1)Px}
15
Global dynamic programming
17
Pairwise alignment Global alignment: all gaps are penalised Semi-global alignment: N- and C-terminal gaps (end-gaps) are not penalised MSTGAVLIY--TS----- ---GGILLFHRTSGTSNS End-gaps
18
Local dynamic programming (Smith & Waterman, 1981) LCFVMLAGSTVIVGTR EDASTILCGSEDASTILCGS Amino Acid Exchange Matrix Gap penalties (open, extension) Search matrix Negative numbers AGSTVIVG A-STILCG
19
Local dynamic programming (Smith & Waterman, 1981) i-1 j-1 S i,j = Max S i,j + Max{S 0<x<i-1,j-1 - Pi - (i-x-1)Px} S i,j + S i-1,j-1 S i,j + Max {S i-1,0<y<j-1 - Pi - (j-y-1)Px} 0
20
Local dynamic programming
21
Dot plots Way of representing (visualising) sequence similarity without doing dynamic programming (DP) Make same matrix, but locally represent sequence similarity by averaging using a window See Lesk’s book pp. 167-171
22
Comparing two sequences We want to be able to choose the best alignment between two sequences. A simple method of finding similarities between two sequences is to use dot plots. The first sequence to be compared is assigned to the horizontal axis and the second is assigned to the vertical axis.
23
Dot plots can be filtered by window approaches (to calculate running averages) and applying a threshold They can identify insertions, deletions, inversions
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.