Download presentation
Presentation is loading. Please wait.
Published byDonald Farmer Modified over 8 years ago
1
Identifying abnormal sequence alignments Ricardo Restrepo School of Mathematics Georgia Insitute of Technology
2
Sequence alignment Given two DNA arrangements, identify the alignment with the highest score. Particular case: Longest common subsequence. Is the alignment found abnormal? (longest common subsequence too long)
3
Longest common subsequence R I C A R D O R I C H A R D
4
Abnormal alignments For which value of k we guarantee that a longest common subsequence greater than k between two DNA words of length n is abnormal? Open and hard problem! Expectation of the order C(k)n. Value of C(k) conjectured by Arratia and Steele.
5
Functional representation
6
Integration over noise !
7
The equivalent diffusion 1/k (i,j) a(i,j) b(i,j) Find the diffusion coefficients is also a hard combinatorial problem
8
Letting the length go to infinity Random field!!
9
How to obtain the drift coefficients? Sampling !!
10
Words of length 200 – 4 letters alphabet Approximate drift coefficients of the random field
11
Now we can perform simulations to estimate the distribution of the longest common subsequence !! Words of length 200 - 4 letters alphabet Some simulations
12
Algorithms Exhaustive search Dynamic programming Four Russian speed-up Gradient algorithm
13
Level set k: Indexes (i,j) such that LCS restricted to (1..i,1..j) is greater or equal than k. The levels can be described easily in terms of the minimal indexes of previous levels
14
Step 1
15
Step 2
16
Step 3
17
Step k
18
Last step
19
A sample output for the binary words case
20
For the DNA case (4 letters alphabet),
21
The words presented dependence.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.