Download presentation
Presentation is loading. Please wait.
1
Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department
2
Outline Sequence alignment Common frame-work DP solution Why constrained ? RE constrained sequence alignment Algorithm Concluding Remarks
3
Alignment Matrix
4
Edit Graph
5
Dynamic Programming Solution H i,j : maximum score achieved at (i, j) where H i,j = 0 whenever i=0 or j=0, H n,m in O(nm) time, O(m) space
6
DP Solution: Local Alignment H i,j : similarity score achieved at (i, j) where S i,j = 0 whenever i=0 or j=0, max H i,j in O(nm) time, O(m) space
7
Dynamic Programming Formulation Affine gap penalties Penalty for a gap of length k is +(k-1) where S i,j = F i,j = E i,j = 0 when i=0 or j=0 max H i,j O(nm) time, O(m) space
8
The Definition of the Constrained LCS Problem The contrained LCS (CLCS) problem Given strings S 1,S 2, and P Find lcs of S 1 and S 2 s.t. P is a subsequence of this lcs Motivation: Computing the homology of two biological sequences that have a specific part in common
9
Constrained Sequence Alignment Problems Constrained LCS Tsai 2003,O(n 2 m 2 r) time Chin et. al 2004, Arslan and Egecioglu 2004 O(nmr) time Edit-distance constrained sequence alignment Arslan and Egecioglu 2004, O(dnmr) Regular-expression constrained sequence alignment Motivation: Comet and Henry, 2002 PROSITE patterns This paper
10
PROSITE patterns as constraints PROSITE patterns are Regular expressions with no Kleene closure PROSITE database e.g. [GA]-X(4)-G-K-[ST] ATP/GTP-binding site motif A (P-loop) (PS00017) Comet and Henry reward alignments Regular expression constrained sequence alignment Find a maximal alignment that includes a given RE
11
Example: For [GA]-X(4)-G-K-[ST]
12
Using Edit Graph: e.g. A(C+G) * (S+T)
13
Automata for A(C+G) * (S+T)
14
Some Details of Automata Construction Equivalent NFA N to a given RE R Construct from N a new NxN automaton Moves on edit operations (or equivalently on alignment columns) States have weights Interested in the weights of the final states after the alignment is complete
15
Weighted Automaton Initial weights are Weight of (q 0,q 0 ) is initially 0 Update new maximum scores at reachable states Weights become in unreachable states What are the maximum weights at the final states?
16
Computations on Automata
17
Complexity Simulate automata based on DP solution Each steps requires examining the trasition functions Maintain a list of active (reachable) states Update state weights as alignments are formed Automaton M i,j has the optimum weights
18
Generalizations: Local Alignment & Affine gaps
19
CONCLUSION Introduced the regular expression constrained sequence alignment problem Present an algorithm for the problem Future work Generalization of the problem for Multiple sequence alignment Multiple regular expressions as a constraint
20
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.