Download presentation
Presentation is loading. Please wait.
Published byDora Gregory Modified over 9 years ago
1
CS 5263 Bioinformatics Lecture 6: Sequence Alignment Statistics
2
Roadmap Last lecture review –Affine gap penalty (more today) –Local sequence alignment –Statistics of substitution matrices Statistics of alignment scores Sequence alignment and FSA –Affine gap penalty –More complex models
3
Seq Alignment Algorithms Global alignment –Basic: Needleman-Wunsch –Variants (LCS, overlapping, …) –Bounded DP (pruning search space) –Linear space (divide-and-conquer) –Affine gap penalty Local Alignment –Basic: Smith-Waterman –All tricks in global alignment applicable Bounded DP, linear space, affine gap
4
The local alignment problem Given two strings X = x 1 ……x M, Y = y 1 ……y N Find substrings x’, y’ whose similarity (optimal global alignment value) is maximum e.g. X = abcxdex X’ = cxde Y = xxxcde Y’ = c-de x y
5
The Smith-Waterman algorithm Initialization: F(0, j) = F(i, 0) = 0 0 F(i – 1, j) – d F(i, j – 1) – d F(i – 1, j – 1) + (x i, y j ) Iteration: F(i, j) = max
6
The Smith-Waterman algorithm Termination: 1.If we want the best local alignment… F OPT = max i,j F(i, j) 2.If we want all local alignments scoring > t For all i, j find F(i, j) > t, and trace back
7
Analysis Time: –O(MN) for finding the best alignment –Depending on the number of sub-opt alignments Memory: –O(MN) –O(M+N) possible
8
The statistics of alignment Where does (x i, y j ) come from? Are two aligned sequences actually related?
9
Protein substitution matrix : score to align amino acid s against t p s, p t, frequency of s and t in database q st : the frequency that s is aligned to t in real homologous sequences Log odds ratio Scaling factor
10
BLOSUM matrices p s, p t, q st estimated from trusted alignments in the BLOCKS database Eliminate near-identical sequences –BLOSUM-N: constructed from sequences where identity between any pair of sequences is less than N% –BLOSUM-62: good for most purposes 45 62 90 Weak homologyStrong homology
11
DNA substitution matrix Given the percent identity you would like to detect and some assumptions You can get the substitution matrix by some calculation
12
Example Assume p A = p C = p T = p G = 0.25 We want 88% identity q AA = q CC = q TT = q GG = 0.22 The rest = 0.12/12 = 0.01 ACGT A5-7 C 5 G 5 T 5
13
Arbitrary substitution matrix Even arbitrary substitution matrix has meaning Better know what you are doing Solve a polynomial function to obtain the scaling factor Calculate target frequency q st Calculate target percent identity
14
Example ACGT A1-2 C 1 G 1 T 1 ACGT A5-4 C 5 G 5 T 5 = 1.33 q st = 0.24 for s = t, and 0.004 for s ≠ t Translate: 95% identity = 1.21 q st = 0.16 for s = t, and 0.03 for s ≠ t Translate: 65% identity
15
Today Significance of alignment score Sequence alignment and FSA
16
Statistics of Alignment Scores Q: How do we assess whether an alignment provides good evidence for homology? –Is a score 82 good? What about 180? A: determine how likely it is that such an alignment score would result from chance
17
Most of the theory applies to local alignment For global alignment, your best bet is to do Monte-Carlo simulation –Randomly shuffle your sequences before alignment –What’s the chance you can get a score as high as the real alignment?
18
Procedure to estimate the significance of a global alignment –Given sequence X, Y –Global alignment score = S –Randomly shuffle sequence X (or Y) N times, obtain X 1, X 2, …, X N –Align each X i with Y, let the score be S i –Plot the distribution of S i, and see where the real S locates
19
…………………………………………………… Mouse HEXA Human HEXA Score = 732
20
732 Distribution of the alignment scores between mouse HEXA and 200 randomly shuffled human HEXA sequences
21
Human HEXA Fly HEXO1 Score = -74
22
-74 Distribution of the alignment scores between fly HEXO1 and 200 randomly shuffled human HEXA sequences
23
P-value of alignment p-value –The probability that the alignment score can be obtained from aligning random sequences –Small p-value means the score is unlikely to happen by chance A p-value 0.05 means you are 95% sure that the result is significant.
24
What p-value is significant? The most common thresholds are 0.01 and 0.05. Is 95% enough? It depends on the cost associated with making a mistake. Examples of costs: –Doing expensive wet lab validation. –Making clinical treatment decisions. –Misleading the scientific community. Most sequence analysis uses more stringent thresholds because the p-values are not very accurate.
25
-74 There are 88 random sequences with alignment score >= -74. Therefore P-value = 88 / 200 = 0.44 => alignment is not significant
26
732 There are no random sequences with alignment score >= 732. Therefore the P-value is less than 1 / 200 = 0.05 => significant Even though the p-value looks much smaller than 0.05, we cannot say anything unless we generate more random sequences
27
Drawbacks Monte-Carlo may take long time Cannot accurately estimate p-value if p is small To get 10 -5 p-value, have to align 10 5 random sequences –Unless we can fit a distribution Such distribution may not be generalizable No theory exists for global alignment score distribution
28
Statistics for local alignment Theory much more elegant Score for ungapped local alignment follows extreme value distribution (Gumbel distribution) This distribution is characterized by a larger tail on the right.
29
Normal distributionExtreme value distribution Intuitive interpretation for extreme value distribution Randomly sample 100 numbers from a normal distribution, and compute max Repeat 100 times. The max values will follow extreme value distribution
30
Computing a p-value The probability of observing a score >4 is the area under the curve to the right of 4. For score S, this probability is calculated as
31
Computing a p-value
32
Statistics for local alignment How does this apply to sequence alignment? Given two unrelated sequences of lengths M, N Expected number of local alignments with score >= S can be calculated by –E(S) = KMN exp[- S] –Known as E-value – : scaling factor as computed in last lecture –K: empirical parameter ~ 0.1 Depend on sequence composition and substitution matrix
33
P-value for alignment score P-value for a local alignment score S when P is small.
34
Example You are aligning two sequences, each has 1000 bases m = 1, s = -1, d = -inf (ungapped alignment) You obtain a score 20 Is this score significant?
35
= ln3 = 1.1 E(S) = K MN exp{- S} E(20) = 0.1 * 1000 * 1000 * 3 -20 = 3 x 10 -5 P-value = 3 x 10 -5 << 0.05 The alignment is significant
36
Distribution of 1000 random sequence pairs 20
37
Multiple-testing problem What if you are searching a 1000-base sequence against a database of 10 6 sequences (average length 1000 bases)? How significant is a score 20 now? You are essentially comparing 1000 bases with 1000x10 6 = 10 9 bases (ignore edge effect) E(20) = 0.1 * 1000 * 10 9 * 3 -20 = 30 By chance we would expect to see 30 matches P-value = 1 – e -30 = 0.9999999999 Not significant at all
38
Statistics for gapped local alignment Theory not well developed Extreme value distribution works well empirically Need to estimate K and empirically –Given the database and substitution matrix, generate some random sequence pairs –Do local alignment –Fit an extreme value distribution to obtain K and
39
More on sequence alignment and FSA
40
Gap penalty models Linear model – (n) = n x d –Needleman-Wunsch –O(MN) time –O(M+N) memory General gap penalty function –O(N 2 M) time –O(MN) memory n n
41
Affine gap penalty (n) = d + (n – 1) e | | gap open extension d e (n) O(MN) time O(M+N) memory
42
Finite State Automaton x, y Aligned Gap on x Gap on y (x i,y j ) / (x i,-) / d (x i,-) / e (-, y j ) / d (-, y j ) / e
43
Finite State Automaton F Ix Iy (x i,y j ) / (x i,-) / d (x i,-) / e (-, y j ) / d (-, y j ) / e Input Output State Mealy machine: output associated with transitions Moore machine: output associated with states Mealy machine generally uses less states. Mutually convertible.
44
Mealy machine A Mealy machine is a 6-tuple, (S, S 0, Σ, Λ, T, G), consisting of the following: –a finite set of states (S) –a start state (also called initial state) S 0 which is an element of (S) –a finite set called the input alphabet (Σ) –a finite set called the output alphabet (Λ) –a transition function (T : S × Σ → S) –an output function (G : S × Σ → Λ)
45
F Ix Iy (x i,y j ) / (x i,-) / d (x i,-) / e (-, y j ) / d (-, y j ) / e Input Output Start state Current stateInputOutputNext state F (x i,y j ) F F (-,y j )d Ix F (x i,-)d Iy Ix (-,y j )e Ix … …… …
46
Finite State Automaton F Ix Iy (x i,y j ) / (x i,-) / d (x i,-) / e (-, y j ) / d (-, y j ) / e Given a pair of sequences, find a path in the state diagram to reproduce the sequences using this machine such that the score is the highest
47
F Ix Iy (x i,y j ) / (x i,-) / d (x i,-) / e (-, y j ) / d (-, y j ) / e AAC ACT F-F-F-F AAC ||| ACT F-I y -F-F-I x AAC- || -ACT F-F-I y -F-I x AAC- | A-CT start state Symbols are generated during transition.
48
F Ix Iy (x i,y j ) / (x i,-) /d (x i,-)/e (-, y j ) /d (-, y j )/e F(i-1, j-1) + (x i, y j ) F(i, j) = max Ix(i-1, j-1) + (x i, y j ) Iy(i-1, j-1) + (x i, y j )
49
F Ix Iy (x i,y j ) / (x i,-) /d (x i,-)/e (-, y j ) /d (-, y j )/e F(i, j-1) + d Ix(i, j) = max Ix(i, j-1) + e
50
F Ix Iy (x i,y j ) / (x i,-) /d (x i,-)/e (-, y j ) /d (-, y j )/e F(i-1, j) + d Iy(i, j) = max Iy(i-1, j) + e
51
F(i – 1, j – 1) F(i, j) = (x i, y j ) + max I x (i – 1, j – 1) I y (i – 1, j – 1) F(i, j – 1) – d I x (i, j) = max I x (i, j – 1) – e F(i – 1, j) – d I y (i, j) = max I y (i – 1, j) – e Continuing alignment Closing gaps in x Closing gaps in y Opening a gap in x Gap extension in x Opening a gap in y Gap extension in y
52
Exercise x = GCAC y = GCC m = 2 s = -2 d = -5 e = -1
53
0 -- -- -- -- -- -- -- -- -- -- -- -5 -6 -7 -8 -- -5-6-7 -- -- -- -- FIy: Insertion on y F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC x = y = x = y = x = y = Ix: Insertion on x
54
0 -- -- -- -- 2 -- -- -- -- -- -- -- -5 -6 -7 -8 -- -5-6-7 -- -- -- -- FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC
55
0 -- -- -- -- 2-7 -- -- -- -- -- -- -- -5 -6 -7 -8 -- -5-6-7 -- -- -- -- FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC
56
0 -- -- -- -- 2-7-8 -- -- -- -- -- -- -- -5 -6 -7 -8 -- -5-6-7 -- -- -- -- FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC
57
0 -- -- -- -- 2-7-8 -- -- -- -- -- -- -5 -6 -7 -8 -5-6-7 -- -- -3 -- -- -- FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC
58
0 -- -- -- -- 2-7-8 -- -- -- -- -- -- -5 -6 -7 -8 -5-6-7 -- -- -3-4 -- -- -- FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC
59
0 -- -- -- -- 2-7-8 -- -- -- -- -- -- -5 -- -- -- -6 -7 -8 -5-6-7 -- -- -3-4 -- -- -- FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC
60
0 -- -- -- -- 2-7-8 -- -7 -- -- -- -- -- -5 -- -- -- -6 -7 -8 -5-6-7 -- -- -3-4 -- -- -- FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC
61
0 -- -- -- -- 2-7-8 -- -74 -- -- -- -- -- -5 -- -- -- -6 -7 -8 -5-6-7 -- -- -3-4 -- -- -- FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC
62
0 -- -- -- -- 2-7-8 -- -74-5 -- -- -- -- -- -- -- -- -6 -7 -8 -5-6-7 -- -- -3-4 -- -- -- FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC
63
0 -- -- -- -- 2-7-8 -- -74-5 -- -- -- -- -- -- -- -- -6 -7 -8 -5-6-7 -- -- -3-4 -- -- -12 -- -- FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC
64
0 -- -- -- -- 2-7-8 -- -74-5 -- -- -- -- -- -- -- -- -6-3 -7 -8 -5-6-7 -- -- -3-4 -- -- -12 -- -- FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC
65
0 -- -- -- -- 2-7-8 -- -74-5 -- -- -- -- -- -- -- -- -6-3-12-13 -7 -8 -5-6-7 -- -- -3-4 -- -- -12 -- -- FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC
66
0 -- -- -- -- 2-7-8 -- -74-5 -- -8-52 -- -- -- -- -- -- -- -6-3-12-13 -7 -8 -5-6-7 -- -- -3-4 -- -- -12 -- -- -13-10 -- FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC
67
0 -- -- -- -- 2-7-8 -- -74-5 -- -8-52 -- -- -- -- -- -- -- -6-3-12-13 -7-8 -8 -5-6-7 -- -- -3-4 -- -- -12 -- -- -13-10 -- FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC
68
0 -- -- -- -- 2-7-8 -- -74-5 -- -8-52 -- -- -- -- -- -- -- -6-3-12-13 -7-8-10 -8 -5-6-7 -- -- -3-4 -- -- -12 -- -- -13-10 -- FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC
69
0 -- -- -- -- 2-7-8 -- -74-5 -- -8-52 -- -9-61 -- -- -- -5 -- -- -- -6-3-12-13 -7-8-10 -8-13-2-3 -5-6-7 -- -- -3-4 -- -- -12 -- -- -13-10 -- -- -14-11 FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC
70
0 -- -- -- -- 2-7-8 -- -74-5 -- -8-52 -- -9-61 -- -- -- -5 -- -- -- -6-3-12-13 -7-8-10 -8-13-2-3 -5-6-7 -- -- -3-4 -- -- -12 -- -- -13-10 -- -- -14-11 FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC
71
0 -- -- -- -- 2-7-8 -- -74-5 -- -8-52 -- -9-61 -- -- -- -5 -- -- -- -6-3-12-13 -7-8-10 -8-13-2-3 -5-6-7 -- -- -3-4 -- -- -12 -- -- -13-10 -- -- -14-11 FIy Ix F(i, j) F(i-1, j-1) Ix(i-1, j-1) Iy(i-1, j-1) Ix(i,j) Ix(i,j-1) F(i,j-1) Iy(i,j) Iy(i-1,j) F(i-1,j) G C C GCACGCAC GCACGCAC GCACGCAC GCAC || | GC-C
72
Exercising FSA How do you make an FSA for the Needleman-Wunsch algorithm?
73
Exercising FSA How do you make an FSA for the Needleman-Wunsch algorithm? F Ix Iy (x i,y j ) / (x i,-) / d (-, y j ) / d (x i,-)/d (-, y j ) / d
74
Simplify F I (x i,y j ) / (x i,-) / d (-, y j ) / d (x i,-) / d
75
Simplify more F (x i,y j ) / (-, y j ) / d (x i,-) / d F(i-1, j-1) + (x i, y j ) F(i,j) = max F(i-1, j) + d F(i, j-1) + d
76
A more difficult alignment problem (A gene finder indeed!) X is a genomic sequence (DNA) –X encodes a gene –May contain introns Y is an ORF from another species –Contains only exons We want to compare X against Y –Conservation is on the level of amino acids
77
5’ UTR 3’ UTRexon intron Start codonStop codon Open reading frame (ORF) Pre-mRNA Mature mRNA (mRNA) Splice DNA
78
We have a predicted gene We know the positions of the start codon and stop codon But we don’t know where are the splicing sites –Not even the number of introns exon intron Start codon Stop codon intron
79
Mouse putative gene human ORF 1.Most splicing sites start at GT and end at AG 2.But there are lots of GT and AG in the sequence 3.Aligning to a orthologous gene with known ORF may help us determine the splicing sites Orthologous genes: two genes evolved from the same ancestor Coding region are likely conserved on amino acid level UUA, UUG encode the same amino acid So do UCA, UCU, UCG, UCC GT…………AG
80
The Genetic Code Third letter
81
Easy Remove introns Global alignment Mouse putative gene human ORF Mouse putative ORF translate If know where are the exons
82
Or directly align triplets Remove introns Global alignment Mouse putative gene human ORF Mouse putative ORF
83
Codon substitution scores AAAAAGAAUAAC………UCUUCC AAA43 AAG34 AAU 4311 AAC 3411 … … … UCU 1143 UCC 1134 64 x 64 substitution matrix
84
FSA for aligning genomic DNA to ORF A B (x i-2 x i-1 x i, y j-2 y j-1 y j ) / (x i-2 x i-1 x i, - ) or (-, y j-2 y j-1 y j ) / e (x i-2 x i-1 x i, - ) or (-, y j-2 y j-1 y j ) / d Considering only exons
85
Mouse putative gene human ORF 1.We don’t know exactly where are the splicing sites 2.Length of introns may not be a multiple of 3 - If convert the whole seq into triplets, may result in ORF shift 17 bases?
86
Model introns Mouse putative gene human ORF 1.Most splicing sites start at GT and end at AG 2.For simplicity, assume length of exon is a multiple of 3 Not true in reality Only a little more work without this assumption GT…………AG 120 nt = 40 aa 126 nt = 42 aa
87
Aligning genomic DNA to ORF Fixed cost to have an intron Alignment with Affine gap penalty
88
FSA for aligning genomic DNA to ORF A B (x i-2 x i-1 x i, y j-2 y j-1 y j ) / (x i-2 x i-1 x i, - ) or (-, y j-2 y j-1 y j ) / e Considering only exons (x i-2 x i-1 x i, y j-2 y j-1 y j ) / (x i-2 x i-1 x i, - ) or (-, y j-2 y j-1 y j ) / d
89
FSA for aligning genomic DNA to ORF A B C (-, GT) / s Start an intron (x i-2 x i-1 x i, y j-2 y j-1 y j ) / (x i-2 x i-1 x i, - ) or (-, y j-2 y j-1 y j ) / d (x i-2 x i-1 x i, y j-2 y j-1 y j ) / (x i-2 x i-1 x i, - ) or (-, y j-2 y j-1 y j ) / e
90
FSA for aligning genomic DNA to ORF A B C (-, GT) / s (-, y i ) / 0 Start an intron Continue in intron (x i-2 x i-1 x i, y j-2 y j-1 y j ) / (x i-2 x i-1 x i, - ) or (-, y j-2 y j-1 y j ) / d (x i-2 x i-1 x i, y j-2 y j-1 y j ) / (x i-2 x i-1 x i, - ) or (-, y j-2 y j-1 y j ) / e
91
FSA for aligning genomic DNA to ORF A B C (-, GT) / s (-, y i ) / 0 (-, AG) / s Close an intron Start an intron Continue in intron (x i-2 x i-1 x i, y j-2 y j-1 y j ) / (x i-2 x i-1 x i, - ) or (-, y j-2 y j-1 y j ) / d (x i-2 x i-1 x i, y j-2 y j-1 y j ) / (x i-2 x i-1 x i, - ) or (-, y j-2 y j-1 y j ) / e
92
A B C (-, GT) / s (-, y j ) / 0 (-, AG) / s A(i-3,j-3) + (x i-2 x i-1 x i, y j-2 y j-1 y j ) A(i, j) = max B(i-3,j-3) + (x i-2 x i-1 x i, y j-2 y j-1 y j ) C(i, j-2) + s, if y j-1 y j == ‘AG’ (x i-2 x i-1 x i, y j-2 y j-1 y j ) / (x i-2 x i-1 x i, - ) or (-, y j-2 y j-1 y j ) / d (x i-2 x i-1 x i, y j-2 y j-1 y j ) / (x i-2 x i-1 x i, - ) or (-, y j-2 y j-1 y j ) / e
93
A B C (-, GT) / s (-, AG) / s A(i, j-3) + d A(i-3, j) + d B(i, j) = max B(i, j-3) + e B(i-3, j) + e (-, y j ) / 0 (x i-2 x i-1 x i, y j-2 y j-1 y j ) / (x i-2 x i-1 x i, - ) or (-, y j-2 y j-1 y j ) / d (x i-2 x i-1 x i, y j-2 y j-1 y j ) / (x i-2 x i-1 x i, - ) or (-, y j-2 y j-1 y j ) / e
94
A B C (-, GT) / s (-, AG) / s B(i, j-2) + s, if y j-1 y j == ‘GT’ C(i, j) = max C(i, j-1) (-, y j ) / 0 (x i-2 x i-1 x i, y j-2 y j-1 y j ) / (x i-2 x i-1 x i, - ) or (-, y j-2 y j-1 y j ) / d (x i-2 x i-1 x i, y j-2 y j-1 y j ) / (x i-2 x i-1 x i, - ) or (-, y j-2 y j-1 y j ) / e
95
ACGGATGCGATCAGTTGTACTACGAGCTGACGGTCCTCAGACTTGATTA
96
There is a close relationship between dynamic programming, FSA, regular expression, and regular grammar Using FSA, you can design more complex alignment algorithms If you can draw the state diagram for a problem, it can be easily formulated into a DP problem –In particular, Hidden Markov Models –Will discuss more in a few weeks
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.