Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.

Similar presentations


Presentation on theme: "Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics."— Presentation transcript:

1 Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics

2 Homology Search Given sequence q does there exist a sequence d in a database D such that q and d are homolgous? Could perform global pairwise alignment between q and each sequence in D, but  Maybe only a segment of q is highly (beyond random) similar to a segment of a database sequence  Remote homology – only motif conserved  Sequence/domain rearrangements – sequences not globally homologous, but share domain Local alignment (alignment of segment of q with segment of d) desirable

3 Homology Search – Task Present all sequences in D that have segments homologous to segments in q Avoid presenting sequences in D that are not homologous For each local alignment – calculate statistical probability that alignment is ”random” (not caused by evolutionary relation)

4 Definitions Segment – contiguous subsequence (substring) of q or d Segment pair – pair of segments, one from q and one from d (need not be of the same length) Local alignment – alignment of a segment pair

5 Dot Plot – Visualising Similarity For sequence q (length m), d (length n), construct m times n matrix  Make a dot in cell (i,j) if q i =d j. Possible to filter matrix  E.g., use window of length K – make dot in (i,j) only if at least C% of characters are similar between K-windows around (i,j)

6 Dot Plots are Easy to Interpret Can identify for instance repeats Example:  Human HPRT gene (genomic sequence)  Dot if 8 identical bases http:// www.ansorge-group.embl.de/ geneskipper/dotplot.htm

7 Dynamic Programming for Local Alignment (Smith & Waterman 1981) Assumptions  scoring matrix has ”negative expectation”  gaps should decrease alignment score (as before) Consequence:  Subalignment with negative score coming first (prefix) or last (suffix) can be removed to improve alignment score  Gaps should not be included unless the alignments on either side score to make up for the gap penalty Alignment prefix suffix

8 Recurrence relation qi-qi- q 1..i-1 h 1..j qidjqidj q 1..i-1 h 1..j-1 -dj-dj q 1..i h 1..j-1 Empty alignment Effectively allows for removal of negatively contributing prefixes.

9 Initialization – Removing Initial Gaps Initial gaps – in either sequence – should be ignored

10 The Best Local Alignment Should ignore negatively contributing suffixes of alignments Score of best local alignment – highest value in dynamic programming matrix Alignment found by tracing back from maximum value until cell with value 0 (zero) has been reached

11 Calculating Best Local Alignment Use to fill first row Use to fill first column Use to fill rest row by row H matrix Score of best alignment 0 Best alignment

12 Time Complexity Sequences of lengths n and m Two sequences of length l


Download ppt "Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics."

Similar presentations


Ads by Google