Local Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.

Local Alignment Vasileios Hatzivassiloglou University of Texas at Dallas

Final Exam Scheduled by the Office of the Registrar Our class: 7:00p.m. Tuesday, December 16 Grades due/available: December 24 Find the schedule for all your classes at http://www.utdallas.edu/student/registrar/fi nals/ (usually available many months in advance) http://www.utdallas.edu/student/registrar/fi nals/

Global alignment complexity For each cell, constant time (calculate new values for at most three other cells, take the max between current and previous value, if any) Number of cells: n × m Retrieving the alignment takes n+m steps maximum O(nm) time and space

O notation A function f(n) is O(g(n)) if –there are constants n 0 and c such that f(n) ≤ c·g(n) for all n ≥ n 0 n 0 allows the exclusion of a finite number of initial values of f(n) – we are interested in the asymptotic behavior

Implementation We need an n × m array to save intermediate alignment scores Can build the array first, and calculate values forward as in the example Can also work backwards through the recursive formula –somewhat easier to program –some additional overhead for the recursion

Local alignment Sometimes we don’t want to align the two strings in their entirety, but instead find an alignment for parts of S and T Example: –DNA matching where we want to identify the same gene (or closely related genes) in a long stretch of DNA from two organisms

Definitions A prefix of a string S is a contiguous sequence of characters from S starting with S[1], or the empty string A suffix of a string S is a contiguous sequence of characters from S ending with S[|S|], or the empty string A substring of S is any contiguous sequence of (possibly zero) characters from S (cf. subsequence defined earlier)

Adapting DP for local alignment If we could start anywhere (not at (0,0)), we would obtain alignments of suffixes of S and T If we could stop anywhere (not at (|S|,|T|)), we would obtain alignments of prefixes of S and T A substring is just a suffix of a prefix

Algorithm for local alignment As for global alignment, but –At any cell, we can restart the alignment from there, i.e., set V(i,j)=0 –We can stop at the cell that maximizes the overall value in the table

Recursive formula Recursion for global alignment Recursion for local alignment

Complexity of local alignment O(nm) time –One extra step: Keep track of the maximum O(nm) space Recover alignment in O(n+m) time

Local alignment for search A variant of local alignment can be used for searching for a match of a short sequence S within a much longer sequence T Example: –S is a gene and T is a long region of DNA

Alignment with Gaps 1 2 AAC—AATTAAG—ACTAC—GTTCATGAC A—CGA—TTA—GCAC—ACTG—T—A—GA— AACAATTAAGACTACGTTCATGAC——— AACAATT————————GTTCATGACGCA AAC—AATTAAG—ACTAC—GTTCATGAC A—CGA—TTA—GCAC—ACTG—T—A—GA—

Defining gaps A substring of S is any contiguous sequence of characters from S A gap γ is a maximal, non-empty substring of the string S ′ (obtained by extending S for alignment), such that γ contains only spaces

Motivation for gaps cDNA matching –Complementary DNA is formed from mRNA after transcription (no introns) –We want to match it with chromosomal DNA Mutations can cause insertion or deletion of blocks of DNA –Probability of inserting 10 bases is not exponentially less than inserting one

Gap penalties We replace our overall score for the alignment with the difference of two components – –Note that the indel penalty is now subsumed by the gap scoring

Gap penalty functions Constant, g(q) = W g –Appropriate for cDNA matching Linear (affine model), g(q) = W g + qW s –W g is the gap start penalty –W s is the gap continuation penalty –Both are non-negative –Special cases Constant penalty for each gap: –W s = 0 No special treatment of gaps: –W g = 0, W s = σ(x,-) or σ(-, y) as appropriate

Other gap penalty functions Monotonically increasing functions that grow more slowly than q –e.g., g(q) = W g + W s logq –Each space in the same gap contributes progressively less of a penalty –Better model from a probabilistic viewpoint –Harder to work with algorithmically

Finding the optimal alignment The algorithm (and complexity) depend on the function g Cannot keep a single array V(i,j) of best values, because –the update rule depends on what is at the end of the currently processed parts of S ′ and T′

Finding the optimal alignment – Linear g(q) Keep three separate tables tracking best solutions so far for the three possible cases (no gap, gap in S ′, gap in T′) V(i,j): Overall best alignment score G(i,j): Best alignment score where S[i] is matched with T[j] F(i,j): Best alignment score where S[i] is matched with a space (gap in T ′) E(i,j): Best alignment score where T[j] is matched with a space (gap in S ′)

Recursive calculation V(i,j) = max(G(i,j), F(i,j), E(i,j)) G(i,j) = V(i-1,j-1) + σ(S[i], T[j])

Finite state view G F E σ(S[i],T[j]) S[i] -W g - W s S[i] -W s T[j] -W g -W s T[j] -W s S[i] andT[j]

Complexity of alignment with gaps Filling multiple matrices, but a constant number of them –Three matrices are really needed –V(i,j) can be kept as a matrix for convenience O(nm) time O(nm) space Tracing the alignment takes O(n+m)

Algorithms for other forms of g If g is convex (-g is concave), then there is a general O(nmlogm) algorithm –g ′′(x) ≤ 0 For any general function, there is an O(n 2 m+nm 2 ) algorithm

Local Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.

Similar presentations

Presentation on theme: "Local Alignment Vasileios Hatzivassiloglou University of Texas at Dallas."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Local Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.

Similar presentations

Presentation on theme: "Local Alignment Vasileios Hatzivassiloglou University of Texas at Dallas."— Presentation transcript:

Similar presentations

About project

Feedback