Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequence Alignment. Assignment Read Lesk, 160-194 Problem: Given two sequences R and S of length n, how many alignments of R and S are possible? If you.

Similar presentations


Presentation on theme: "Sequence Alignment. Assignment Read Lesk, 160-194 Problem: Given two sequences R and S of length n, how many alignments of R and S are possible? If you."— Presentation transcript:

1 Sequence Alignment

2 Assignment Read Lesk, 160-194 Problem: Given two sequences R and S of length n, how many alignments of R and S are possible? If you don’t find an exact answer, how tight of a Big-O bound can you derive? (optional)

3 The ‘bio’ statement of the problem Given two or more sequences: Measure their similarity Establish a correspondence Find conserved and varied locations Infer evolutionary relationships

4 The ‘cs’ statement Given two or more sequences: Establish an optimal residue-residue correspondence

5 Definition of alignment An alignment is a set of correspondences between pairs of residues which preserves their order. Example: a b c d e – a – c d e f Note: gaps are permitted in both sequences

6 Definition of ‘optimal’ Requires a scoring system May include positive and negative value Best (highest scoring) of all possible values Question: given two sequences of length n, how many alignments are possible?

7 Dot plots W H I R L I N G W H I I I R L I G I I I G

8 Dot plots W H I R L I N G W H I I I R L I G I I I G

9 Dotplots and alignments Dotplots are visual representations of similarity Any path from upper left to lower right, using only S, E and SE moves, is an alignment

10 Edit distance The minimal number of edit operations (insert/delete, change) to transform one sequence to another Operations can be weighted: –Indels by length –Transformations by type

11 A weighted scheme Transitions (a g, c t) are more common than transversions a t g c a 20 10 5 5 c 10 20 5 5 g 5 5 20 10 t 5 5 10 20

12 Gap penalties For DNA alignment, CLUSTAL-W uses: –+1 for a match –0 for a mismatch –10 for gap initiation –0.1 for gap extension

13 Dynamic programming Gives global optimum Takes 0(nm) time Doesn’t distinguish among equal-scoring alignments

14 Variations on the question Small sequence vs small sequence (how close are these two?) A small sequence against a very long sequence (Is this gene’s relative in the database?) Closest subsequences (does these sequences share a motif?)

15 Blast-style searches Answers the ‘relative’ question Heuristic (but statistically good, for the simplest model) Method: –Find local alignments –Find paths close to local alignments

16 P score Probability that alignment would arise by chance What if short vs long search gives a P-value of 10E-2? 10E-4?

17 Z-score, E-value Z-value is measure of ‘unlikelihood’ of match, from known mean and deviation E-value is expected number of sequences that give same Z-score or better with random probe E is usual Blast statistic E <= 0.02 is ‘good’

18 The Blast family Blast Blastp (protein-protein) Blastx (nucleotide-protein) Tblastn (amino-nucleotide) Tblastx (n-n) Psi-blast (improved a-a)


Download ppt "Sequence Alignment. Assignment Read Lesk, 160-194 Problem: Given two sequences R and S of length n, how many alignments of R and S are possible? If you."

Similar presentations


Ads by Google