Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequence alignment BI420 – Introduction to Bioinformatics

Similar presentations


Presentation on theme: "Sequence alignment BI420 – Introduction to Bioinformatics"— Presentation transcript:

1 Sequence alignment BI420 – Introduction to Bioinformatics
BI420 Fall 2012 Department of Biology, Boston College

2 Biologically significant alignment
1. Find two evolutionarily related sequences (subunits of human hemoglobin) in GenBank: hba_human hbb_human 2. Save sequences on the Desktop and rename: hba_human.fasta & hbb_human.fasta

3 Biologically significant alignment
3. Visit a web-based pair-wise alignment program: 4. Upload our two proteins:

4 Biologically significant alignment
5. Create a pair-wise alignment between the two protein sequences:

5 Biologically plausible alignment
Retrieve another sequence, leghemoglobin: Leghemoglobin Create a pair-wise alignment with human hemoglobin A:

6 Biologically plausible alignment

7 Spurious alignment Retrieve the sequence of a human BRCA1 gene variant, clearly not related to hemoglobin: Make the pair-wise alignment: Examples from: Biological sequence analysis. Durbin, Eddy, Krogh, Mitchison

8 How Alignment Works

9 Alignment types How do we align the words: CRANE and FRAME? CRANE || |
3 matches, 2 mismatches How do we align words that are different in length? COELACANTH || ||| P-ELICAN-- COELACANTH || ||| -PELICAN-- 5 matches, 2 mismatches, 3 gaps In this case, if we assign +1 points for matches, and -1 for mismatches or gaps, we get 5 x x (-1) + 3 x (-1) = 0. This is the alignment score. Examples from: BLAST. Korf, Yandell, Bedell

10 Finding the “best” alignment
COELACANTH | ||| PE-LICAN-- COELACANTH || P-EL-ICAN- COELACANTH PELICAN-- S=-2 S=-6 S=-10 COELACANTH || ||| P-ELICAN-- S=0

11 JACKALOPE ANTELOPE JACKALOPE JACKA---LOPE -ANTELOPE ----ANTELOPE More mismatches More gaps Choice depends on score function

12 Global vs. local alignment
Aligning words: SHAKE and SPEARE 1. Global alignment: aligning the two sequences along their entire length (even if it means adding many “gaps”): SH-AKE | | | SPEARE SHAKE--- | | SP--EARE -OR- 1. Local alignment: aligning only a nicely matching section between the two sequences (possibly leaving the ends un-aligned): SHAKE | | SPEARE SHAKE SPEARE Example from: Higgs and Attwood

13 MATLAB example – global alignment
MATLAB bioinformatics toolbox sequence analysis demo: Aligning pairs of sequences >> s1 = 'ACGATT’ >> s2 = 'CCGACTA’ >> [score, ga] = nwalign(s1,s2) score = 7.3333 ga = ACGA-TT ||| |: CCGACTA

14 MATLAB example – local alignment
MATLAB bioinformatics toolbox sequence analysis demo: Aligning pairs of sequences >> s1 = 'ACGATT’ >> s2 = 'CCGACTA’ >> [score, sa] = swalign(s1,s2) score = 10 sa = CGATT ||| | CGACT

15 Score Function + gap score g = -6
Pair-wise amino-acid scores S(ai,bj) (PAM250 scoring scheme) plus gap score g. Example from: Higgs and Attwood

16 Global alignment – Needleman-Wunsch
Exact recursion scheme to calculate scores from already known scores: { H(i-1,j-1) + S(ai,bj)  diagonal H(i,j) = best of: H(i-1,j) – g  vertical H(i,j-1) – g  horizontal Example from: Higgs and Attwood

17 Global alignment – Needleman-Wunsch
Example: Align the two sequences SHAKE and SPEARE Example from: Higgs and Attwood

18 Global alignment – Needleman-Wunsch
Initialization (filling the top row and left column from gap scores): Example from: Higgs and Attwood

19 Global alignment – Needleman-Wunsch
Filling cell (1,1): Example from: Higgs and Attwood

20 Global alignment – Needleman-Wunsch
Filling the rest of the cells (i,j): Example from: Higgs and Attwood

21 Global alignment – Needleman-Wunsch
Tracing back to read out the alignment: Best global alignment: S-HAKE SPEARE Example from: Higgs and Attwood

22 Global alignment – Needleman-Wunsch
The Needleman-Wunsch procedure is exhaustive. Every possible alignment is considered by the algorithm. So it is guaranteed to find the best global alignment. Example from: Higgs and Attwood

23 Local alignment – Smith-Waterman
Smith-Waterman algorithm find the optimal LOCAL alignment. It works similarly to the Needleman-Wunsch GLOBAL alignment algorithm. Recursion scheme changes: 1. if the best score for a cell is negative, we replace it by 0 (start over) 2. gaps at the boundary are ignored  they get 0 score { H(i-1,j-1) + S(ai,bj)  diagonal H(i,j) = best of: H(i-1,j) – g  vertical H(i,j-1) – g  horizontal 0  start over Example from: Higgs and Attwood

24 Local alignment – Smith-Waterman
Initialization Example from: Higgs and Attwood

25 Local alignment – Smith-Waterman
Initialization Example from: Higgs and Attwood

26 Local alignment – Smith-Waterman
Filling the cells Example from: Higgs and Attwood

27 Local alignment – Smith-Waterman
Trace-back: Find path that contains the highest score Best local alignment: SHAKE SPEARE Example from: Higgs and Attwood Example: Align the two sequences: TTCAC and CTCAA using scores +1 for match and -1 for either gap or mismatch.

28 Local alignment – Smith-Waterman
The Smith-Waterman procedure is also exhaustive. Every possible alignment is considered by the algorithm. So it is guaranteed to find the best local alignment. Example from: Higgs and Attwood

29 Example of a scoring matrix for Amino Acids
The scoring matrix describes the scores for amino acid matches/mismatches. Scores are affected by biochemical similarity of amino acids. Note: this is not an alignment matrix!

30 Similar algorithms can be used for multiple alignment
The multiple alignment of 24 hexokinase protein sequences from various species. However, real multiple alignment programs (e.g. clustalw) are usually heuristic, rather than exact

31 Applications of Alignment

32 Alignment is used for mapping sequence reads to the genome

33 Alignment is used in similarity search
Alignment: determining how sequences have descended from a common ancestor Similarity search: determining which sequences are related to one another. Requires scoring of each alignment. query database

34 Alignment Exercises

35 Visualizing pair-wise alignments
Visit a web server running a dot-plotter: Upload hba_human and hbb_human, and create dot-plot:

36 MATLAB example MATLAB bioinformatics toolbox sequence analysis demo:
Aligning pairs of sequences


Download ppt "Sequence alignment BI420 – Introduction to Bioinformatics"

Similar presentations


Ads by Google