Genome Revolution: COMPSCI 004G 8.1 BLAST l What is BLAST? What is it good for? Basic Local Alignment Search Tools Given query (DNA or Protein) find “matches” What is a match? How do judge a good one? l Two kinds of alignment or matches Global alignment (sequence to sequence) Local alignment (subseq to subseq)
Genome Revolution: COMPSCI 004G 8.2 Global Alignment l Words explain (see O’Reilly BLAST) Align ‘coelacanth’ and ‘pelican’ Score +1 for match, -1 for mismatch, -1 gap coelacanth p-elican-- -pelican-- What are scores of these matches? What’s the best score? Needleman-Wunsch algorithm
Genome Revolution: COMPSCI 004G 8.3 Global Alignment COELACANTH P E L I C A N
Genome Revolution: COMPSCI 004G 8.4 Local Alignment l Subsequence alignment rather than global Advantages? Tradeoffs? Score +1 for match, -1 for mismatch, -1 gap (co)ELECAN(th) (p)ELICAN Smith-Waterman: initialize to zero, only score positive, trace-back from highest score
Genome Revolution: COMPSCI 004G 8.5 Local Alignment COELACANTH P E L I C A N
Genome Revolution: COMPSCI 004G 8.6 Analysis l How long does this algorithm take to execute? How do we measure the complexity/size? Time v. Memory l We need a different measure of “gap match” and mismatch? Just using +1 or -1 doesn’t provide domain specific analysis In practice use scoring matrix, see ncbi site
Genome Revolution: COMPSCI 004G 8.7 BLOSUM 62 scoring matrix l