Download presentation
Presentation is loading. Please wait.
Published byΆγνη Αναστασιάδης Modified over 5 years ago
1
Sequence alignment BI420 – Introduction to Bioinformatics
BI420 Fall 2012 Department of Biology, Boston College
2
Biologically significant alignment
1. Find two evolutionarily related sequences (subunits of human hemoglobin) in GenBank: hba_human hbb_human 2. Save sequences on the Desktop and rename: hba_human.fasta & hbb_human.fasta
3
Biologically significant alignment
3. Visit a web-based pair-wise alignment program: 4. Upload our two proteins:
4
Biologically significant alignment
5. Create a pair-wise alignment between the two protein sequences:
5
Biologically plausible alignment
Retrieve another sequence, leghemoglobin: Leghemoglobin Create a pair-wise alignment with human hemoglobin A:
6
Biologically plausible alignment
7
Spurious alignment Retrieve the sequence of a human BRCA1 gene variant, clearly not related to hemoglobin: Make the pair-wise alignment: Examples from: Biological sequence analysis. Durbin, Eddy, Krogh, Mitchison
8
How Alignment Works
9
Alignment types How do we align the words: CRANE and FRAME? CRANE || |
3 matches, 2 mismatches How do we align words that are different in length? COELACANTH || ||| P-ELICAN-- COELACANTH || ||| -PELICAN-- 5 matches, 2 mismatches, 3 gaps In this case, if we assign +1 points for matches, and -1 for mismatches or gaps, we get 5 x x (-1) + 3 x (-1) = 0. This is the alignment score. Examples from: BLAST. Korf, Yandell, Bedell
10
Finding the “best” alignment
COELACANTH | ||| PE-LICAN-- COELACANTH || P-EL-ICAN- COELACANTH PELICAN-- S=-2 S=-6 S=-10 COELACANTH || ||| P-ELICAN-- S=0
11
JACKALOPE ANTELOPE JACKALOPE JACKA---LOPE -ANTELOPE ----ANTELOPE More mismatches More gaps Choice depends on score function
12
Global vs. local alignment
Aligning words: SHAKE and SPEARE 1. Global alignment: aligning the two sequences along their entire length (even if it means adding many “gaps”): SH-AKE | | | SPEARE SHAKE--- | | SP--EARE -OR- 1. Local alignment: aligning only a nicely matching section between the two sequences (possibly leaving the ends un-aligned): SHAKE | | SPEARE SHAKE SPEARE Example from: Higgs and Attwood
13
MATLAB example – global alignment
MATLAB bioinformatics toolbox sequence analysis demo: Aligning pairs of sequences >> s1 = 'ACGATT’ >> s2 = 'CCGACTA’ >> [score, ga] = nwalign(s1,s2) score = 7.3333 ga = ACGA-TT ||| |: CCGACTA
14
MATLAB example – local alignment
MATLAB bioinformatics toolbox sequence analysis demo: Aligning pairs of sequences >> s1 = 'ACGATT’ >> s2 = 'CCGACTA’ >> [score, sa] = swalign(s1,s2) score = 10 sa = CGATT ||| | CGACT
15
Score Function + gap score g = -6
Pair-wise amino-acid scores S(ai,bj) (PAM250 scoring scheme) plus gap score g. Example from: Higgs and Attwood
16
Global alignment – Needleman-Wunsch
Exact recursion scheme to calculate scores from already known scores: { H(i-1,j-1) + S(ai,bj) diagonal H(i,j) = best of: H(i-1,j) – g vertical H(i,j-1) – g horizontal Example from: Higgs and Attwood
17
Global alignment – Needleman-Wunsch
Example: Align the two sequences SHAKE and SPEARE Example from: Higgs and Attwood
18
Global alignment – Needleman-Wunsch
Initialization (filling the top row and left column from gap scores): Example from: Higgs and Attwood
19
Global alignment – Needleman-Wunsch
Filling cell (1,1): Example from: Higgs and Attwood
20
Global alignment – Needleman-Wunsch
Filling the rest of the cells (i,j): Example from: Higgs and Attwood
21
Global alignment – Needleman-Wunsch
Tracing back to read out the alignment: Best global alignment: S-HAKE SPEARE Example from: Higgs and Attwood
22
Global alignment – Needleman-Wunsch
The Needleman-Wunsch procedure is exhaustive. Every possible alignment is considered by the algorithm. So it is guaranteed to find the best global alignment. Example from: Higgs and Attwood
23
Local alignment – Smith-Waterman
Smith-Waterman algorithm find the optimal LOCAL alignment. It works similarly to the Needleman-Wunsch GLOBAL alignment algorithm. Recursion scheme changes: 1. if the best score for a cell is negative, we replace it by 0 (start over) 2. gaps at the boundary are ignored they get 0 score { H(i-1,j-1) + S(ai,bj) diagonal H(i,j) = best of: H(i-1,j) – g vertical H(i,j-1) – g horizontal 0 start over Example from: Higgs and Attwood
24
Local alignment – Smith-Waterman
Initialization Example from: Higgs and Attwood
25
Local alignment – Smith-Waterman
Initialization Example from: Higgs and Attwood
26
Local alignment – Smith-Waterman
Filling the cells Example from: Higgs and Attwood
27
Local alignment – Smith-Waterman
Trace-back: Find path that contains the highest score Best local alignment: SHAKE SPEARE Example from: Higgs and Attwood Example: Align the two sequences: TTCAC and CTCAA using scores +1 for match and -1 for either gap or mismatch.
28
Local alignment – Smith-Waterman
The Smith-Waterman procedure is also exhaustive. Every possible alignment is considered by the algorithm. So it is guaranteed to find the best local alignment. Example from: Higgs and Attwood
29
Example of a scoring matrix for Amino Acids
The scoring matrix describes the scores for amino acid matches/mismatches. Scores are affected by biochemical similarity of amino acids. Note: this is not an alignment matrix!
30
Similar algorithms can be used for multiple alignment
The multiple alignment of 24 hexokinase protein sequences from various species. However, real multiple alignment programs (e.g. clustalw) are usually heuristic, rather than exact
31
Applications of Alignment
32
Alignment is used for mapping sequence reads to the genome
33
Alignment is used in similarity search
Alignment: determining how sequences have descended from a common ancestor Similarity search: determining which sequences are related to one another. Requires scoring of each alignment. query database
34
Alignment Exercises
35
Visualizing pair-wise alignments
Visit a web server running a dot-plotter: Upload hba_human and hbb_human, and create dot-plot:
36
MATLAB example MATLAB bioinformatics toolbox sequence analysis demo:
Aligning pairs of sequences
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.