Download presentation
Presentation is loading. Please wait.
1
PAM250
2
M. Dayhoff Scoring Matrices Point Accepted Mutations or PAM matrices Proteins with 85% identity were used -> the function is not significantly changed -> the mutations are “accepted” PAM units – the measure of the amount of evolutionary distance between two amino acid sequences. One PAM unit – S 1 has converted (mutated) to S 2 with an average of one accepted point-mutation event per 100 amino acids.
3
p a (=N a /N) – probability of occurrence of amino acid ‘a’ over a large, sufficiently varied, data set. a p a = 1 f ab – the number of times the mutation a b was observed to occur. f a = b a f ab - - - the total number of mutations in which a was involved f = a f a - the total number of amino acid occurrences involved in mutations. 1)Probability matrix 2)Scoring matrix
4
M - 20x20 probability matrix M ab - the probability of amino acid ‘a’ changing into ‘b’ m a = (f a / f) * 1/(100 * p a ) relative mutability of amino acid ‘a’. It is the probability that the given amino acid will change in the evolutionary period of interest. Assumptions – (a) 1 in 100 amino acids on average is changed. (b) mutations are position independent. (c) mutations are independent on its past.
5
M aa =1- m a - the probablity of ‘a’ to remain unchanged. M ab = Pr(a -> b) = Pr(a -> b | a changed) Pr(a changed)= = (f ab /f a )m a Easy to see: b M ab =1 = M aa + b a (f ab /f a )m a = 1- m a + m a /f a b a f ab = 1 a p a M aa =0.99 -> in average 1 mutation every 100 positions.
6
What is the probability that ‘a’ mutates into ‘b’ in two PAM units of evolution? a->c->b or a->d-> … c M ac M cb = M 2 ab ->M 2, M 3, M 4 … M 250 … k-> M k converges to a matrix with identical rows. M k ac = p c - no matter what amino acid you start with, after a long period of evolution the resulting amino acid will be ‘c’ with probability p c.
7
PAM-k ab = M k ab / p b - probability that a pair ‘ab’ is a mutation as opposed to being a random occurrence (likelihood or odds ratio). M ab / p b = [(f ab /f a )m a ] / p b = (f ab /f a ) f a / (f * 100 * p a * p b ) = f ab / (f * 100 * p a * p b ) = M ba / p a The total alignment score is the product of Pam-k ab. To avoid accuracy problems: Pam-k ab = 10 log M k ab / p b -> The total alignment score is the sum of Pam-k ab. PAM-k matrix
8
Multiple Sequence Alignment Mult-Seq-Align allows to detect similarities which cannot be detected with Pairwise-Seq-Align methods. Detection of family characteristics. Three questions: 1.Scoring 2.Computation of Mult-Seq-Align. 3.Family representation.
9
Multiple Sequence Alignment
11
Scoring: SP (sum of pairs) SP – the sum of pairwise scores of all pairs of symbols in the column. ρ 3 (-,A,A) = (-,A)+(-,A)+(A,A) SP Total Score = Σ ρ i (-,-) = 0
12
Induced pairwise alignment Induced pairwise alignment or projection of a multiple alignment. a(S 1, S 2 ) a(S 2, S 3 ) a(S 1, S 3 ) (-,-) = 0 SP Total Score = Σ i<j score[ a(S i, S j ) ]
13
Dyn.Prog. Solution
14
Dynamic Programming Solution The best multiple alignment of r sequences is calculated using an r- dimensional hyper-cube The size of the hyper-cube is O( Πn i ) Time complexity O(2 r n r ) * O( computation of the ρ function ). Exact problem is NP-Complete (metrics: sum-of-pairs or evolutionary tree). more efficient solution is needed
15
Multiple Alignment from Pairwise Alignments ? Problem: The best pairwise alignment does not necessary lead to the best multiple alignment.
16
Pattern-APattern-X Pattern-APattern-X Pattern-B Pattern-XPattern-B Pattern-D S1 S3 S2 S1S2S1S3S2S3 Pattern-APattern-BPattern-D Empty Correct Solution S1S2S3 Pattern-X
17
Center Star Alignment S1S1 S2S2 S3S3 SkSk ScSc S k-1 S k-2 (a)Scoring scheme – distance. (b)Scoring scheme satisfies the triangle inequality: for any character a,b,c dist(a,c) ≤ dist(a,b) + dist(b,c) (in practice not all scoring matrices satisfy the triangle inequality) (c) D(S i, S j ) – score of the optimal pairwise alignment. (d) D(M) = Σ i<j a M (S i, S j ) – score of the multiple alignment M. (e) a M (S i, S j ) – pairwise alignment/score induced by M.
18
S1S1 S2S2 S3S3 SkSk ScSc S k-1 S k-2 The Center Star Algorithm: (a) Find S c minimizing Σ i c D(S c, S i ). (b) Iteratively construct the multiple alignment M c : 1. M c ={S c } 2. Add the sequences in S\{S c } to M c one by one so that the induced alignment a Mc (S c, S i ) of every newly added sequence S i with S c is optimal. Add spaces, when needed, to all pre-aligned sequences. Running time: * O(n 2 ). AC-BC DCABC AC--BC DCAAB C AC--BC DCA-BC DCAAB C
19
D(M c ) is at most twice the score of the D(M opt ) D (M c ) / D (M opt ) ≤ 2(k-1)/k ( < 2 ) Proof: (a) a(S i, S j ) ≥ D (S i, S j ) (any induced align. is not better than optimal align.) a Mc (S c, S j ) = D (S c, S j ) (b) a Mc (S i, S j ) ≤ a Mc (S i, S c ) + a Mc (S c, S j ) = D (S i, S c ) + D (S c, S j ) (follows from the triangle inequality) (c) 2 D(M c ) = Σ i=1..k Σ j=1..k,j i a Mc (S i, S j ) ≤ Σ i=1..k Σ j=1..k,j i ( a Mc (S i, S c ) + a Mc (S c, S j ) )= 2(k-1) Σ j c a Mc (S c, S j ) = 2(k-1) Σ j c D(S c, S j )
20
(d) k Σ j=1..k,j c D(S c, S j ) = Σ i=1..k Σ j=1..k,j c D(S c, S j ) ≤ Σ i=1..k Σ j=1..k,j i D(S i, S j ) ≤ Σ i=1..k Σ j=1..k,j i a Mopt (S i, S j ) = 2 D(M opt ) (e) → 2 D(M c ) ≤ 2(k-1) Σ j c D(S c, S j ) k Σ j c D(S c, S j ) ≤ 2 D(M opt ) → D(M c )/(k-1) ≤ Σ j c D(S c, S i ) Σ j c D(S c, S i ) ≤ 2 D(M opt )/k → D (M c ) / D (M opt ) ≤ 2(k-1)/k
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.