Download presentation
Presentation is loading. Please wait.
Published byEthan Clarke Modified over 9 years ago
1
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics
2
Homology Search Given sequence q does there exist a sequence d in a database D such that q and d are homolgous? Could perform global pairwise alignment between q and each sequence in D, but Maybe only a segment of q is highly (beyond random) similar to a segment of a database sequence Remote homology – only motif conserved Sequence/domain rearrangements – sequences not globally homologous, but share domain Local alignment (alignment of segment of q with segment of d) desirable
3
Homology Search – Task Present all sequences in D that have segments homologous to segments in q Avoid presenting sequences in D that are not homologous For each local alignment – calculate statistical probability that alignment is ”random” (not caused by evolutionary relation)
4
Definitions Segment – contiguous subsequence (substring) of q or d Segment pair – pair of segments, one from q and one from d (need not be of the same length) Local alignment – alignment of a segment pair
5
Dot Plot – Visualising Similarity For sequence q (length m), d (length n), construct m times n matrix Make a dot in cell (i,j) if q i =d j. Possible to filter matrix E.g., use window of length K – make dot in (i,j) only if at least C% of characters are similar between K-windows around (i,j)
6
Dot Plots are Easy to Interpret Can identify for instance repeats Example: Human HPRT gene (genomic sequence) Dot if 8 identical bases http:// www.ansorge-group.embl.de/ geneskipper/dotplot.htm
7
Dynamic Programming for Local Alignment (Smith & Waterman 1981) Assumptions scoring matrix has ”negative expectation” gaps should decrease alignment score (as before) Consequence: Subalignment with negative score coming first (prefix) or last (suffix) can be removed to improve alignment score Gaps should not be included unless the alignments on either side score to make up for the gap penalty Alignment prefix suffix
8
Recurrence relation qi-qi- q 1..i-1 h 1..j qidjqidj q 1..i-1 h 1..j-1 -dj-dj q 1..i h 1..j-1 Empty alignment Effectively allows for removal of negatively contributing prefixes.
9
Initialization – Removing Initial Gaps Initial gaps – in either sequence – should be ignored
10
The Best Local Alignment Should ignore negatively contributing suffixes of alignments Score of best local alignment – highest value in dynamic programming matrix Alignment found by tracing back from maximum value until cell with value 0 (zero) has been reached
11
Calculating Best Local Alignment Use to fill first row Use to fill first column Use to fill rest row by row H matrix Score of best alignment 0 Best alignment
12
Time Complexity Sequences of lengths n and m Two sequences of length l
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.