Sequence alignment BI420 – Introduction to Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu
Sequence alignment – Biology http://bioinfo.pbi.nrc.ca:8090/EMBOSS/index.html Biologically significant sequence alignment
Sequence alignment – Biology http://bioinfo.pbi.nrc.ca:8090/EMBOSS/index.html Biologically plausible sequence alignment
Sequence alignment – Biology http://bioinfo.pbi.nrc.ca:8090/EMBOSS/index.html Spurious alignment Examples from: Biological sequence analysis. Durbin, Eddy, Krogh, Mitchison
Alignment types How do we align the words: CRANE and FRAME? CRANE || | 3 matches, 2 mismatches How do we align words that are different in length? COELACANTH || ||| P-ELICAN-- COELACANTH || ||| -PELICAN-- 5 matches, 2 mismatches, 3 gaps In this case, if we assign +1 points for matches, and -1 for mismatches or gaps, we get 5 x 1 + 1 x (-1) + 3 x (-1) = 0. This is the alignment score. Examples from: BLAST. Korf, Yandell, Bedell
Finding the “best” alignment COELACANTH | ||| PE-LICAN-- COELACANTH || P-EL-ICAN- COELACANTH PELICAN-- S=-2 S=-6 S=-10 COELACANTH || ||| P-ELICAN-- S=0
Global alignment – Needleman-Wunsch -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 P I
Local alignment – Smith-Waterman P 1 2 I 3 4
Visualizing pair-wise alignments http://bioinfo.pbi.nrc.ca:8090/EMBOSS/index.html
Sequence similarity and scoring Match-mismatch-gap penalties: e.g. Match = 1 Mismatch = -5 Gap = -10 Scoring matrices
Multiple alignments
Anchored multiple alignment