Download presentation
Presentation is loading. Please wait.
1
Pairwise Sequence Alignment
LESSON 3(2)
2
HOMEWORK2 Try a pairwise alignment of human alpha and beta globin at the NCBI protein BLAST site, using the available matrices (PAM30, PAM70, PAM250, BLOSUM45, BLOSUM62, BLOSUM80). Which gives the highest bit score?
3
Protein alignment vs. DNA alignment
Protein Alignment can be more Informative than DNA Alignment. BUT, ……
4
Percentage identity (% ID)
CCATCAAGTCC CCATGTACAGAGTCC 5/15 = 33 % CCAT---CA-AGTCC CCATGTACAGAGTCC 11/15 = 73 %
5
CCATCAAGTCC CCATGTACAGAGTCC CCAT---CA-AGTCC CCATGTACAGAGTCC
6
CCAT---CA-AGTCC CCATGTACAGAGTCC Dotplot C A T G ☻
7
Scoring Matrices CCATCAAGTCC CCATGTACAGA Identity matrix
(e.g. match=1 and mismatch=−1) Substitution matrix
8
A transition (a purine becomes another purine) happens frequently.
(G) (C) (T) A transition (a purine becomes another purine) happens frequently. A transversion (a purine becomes pyrimidine) occurs far less frequently.
9
Codons are degenerate. Codons are degenerate: changes in the third position often do not alter the amino acid that is specified
10
DNA Alignments are appropriate
To confirm To study polymorphism To study non-coding regions of DNA
11
DNA Alignments for Finding regulatory elements in DNA sequences
non-coding DNA ? full of regulatory elements give rise to the differences between organisms Each gene is associated with thousands of nucleotides of non-coding DNA.
12
Best alignment Generate all possible gapped alignment.
Find the score for each. Select the highest-scoring alignment. Time consuming 100 a.a : 1075 alignments Dynamic programming algorithm
13
Global Sequence Alignment: Needleman and Wunsch Algorithm
14
GGTT GAT- GG-TT -GAT- GGTT GAT G-AT Match : +1 Mismatch : -1 Gap : -2
= -1 GG-TT -GAT- = -4 G-AT = -1 GGTT GAT Match : +1 Mismatch : -1 Gap : -2 Introducing gaps greatly increases the number of different comparisons between two sequences and in the general case it is impossible to do them all.
15
Alignment by Dynamic Programming
Global Alignment Needleman & Wunsch (1970) used in major alignment software packages (e.g. the ALIGN tool in the FASTA package) Local Alignment Smith & Waterman Algorithm (1981)
16
“mismatch” “gap” “gap” 16
17
Four possible outcomes in aligning two sequences
1 2 [1] identity (stay along a diagonal) [2] mismatch (stay along a diagonal) [3] gap in sequence 1 (move vertically!) [4] gap in sequence 2 (move horizontally!) 17
18
- G T A Global Alignment by Dynamic Programming GGTT GAT Match : +1
A GGTT GAT Match : +1 Mismatch : -1 Gap : -2
19
Fill in the matrix using “dynamic programming”
19
20
Dynamical programming - the 3 way to leave a cell
→ (Rightward) insert gap in vertical sequence ↓ (Downward) insert gap in horizontal sequence (Diagonal) Match Mismatch - G T -2 -4 -6 -8 A -G G G A
21
- G T A -2 -4 -6 -8 +1 G Global Alignment by Dynamic Programming
Match : +1 Mismatch : -1 Gap : -2 - G T -2 -4 -6 -8 +1 A G
22
- G T A -2 -4 -6 -8 +1 -1 ↓ : -4-2 = -6 → : +1-2 = -1
Global Alignment by Dynamic Programming Match : +1 Mismatch : -1 Gap : -2 - G T -2 -4 -6 -8 +1 -1 A ↓ : = -6 → : = -1 : = -1
23
Global Alignment by Dynamic Programming
Match : +1 Mismatch : -1 Gap : -2 - G T -2 -4 -6 -8 +1 -1 -3 -5 A final alignment score
24
Global Alignment by Dynamic Programming
Match : +1 Mismatch : -1 Gap : -2 Traceback pointer - G T -2 -4 -6 -8 +1 -1 -3 -5 A GGTT G-AT
26
26
27
27
28
Local Alignment : Smith and Waterman Algorithm
29
Fail to identify functionally important residues
30
Global vs. Local Global alignments Local alignments
Comparing sequences over their entire length Comparing sequences with partial homology Making high-quality alignments
31
Global alignment (top) includes matches
ignored by local alignment (bottom) 15% identity 30% identity NP_824492, NP_337032
32
Domain Parts of sequence/Particular functional site
sequence-structure-function relation Domain
33
Local Alignments Only aligns the most similar portions of sequences
To look for small parts of the sequences that are similar to each other. searching for functionally related sequences Programs for database searching FASTA BLAST
34
Alignments by Dynamic Programming
Match : +1 Mismatch : -1 Gap : -2 S1 = GCCCTAGCG S2 = GCGCAATG Needleman-Wunsch methods (Global Alignment) GCCCTAGCG GCGC-AATG Smith-Waterman methods (Local Alignment) GCGCAATG I I I I I I I I
35
Smith- Waterman methods
Dynamic programming algorithm for performing local sequence alignment Traces only continue as long as the scores are positive. Whenever a score becomes negative it is set to 0. diagonal horizontal vertical 0. start again h Smith–Waterman is a dynamic programming algorithm too. No values in the scoring matrix can be negative! H ≥ 0
36
Needleman-Wunsch methods (Global Alignment)
GCCCTAGCG GCGC-AATG Match : +1, Mismatch : -1, Gap : -2 I I I I I
37
Smith-Waterman methods (Local Alignment)
GCCCTAGCG GCGCAATG Match : +1, Mismatch : -1, Gap : -2 I I I
38
The highest scoring cell does not need to be at the bottom right-hand corner, it could be anywhere in the matrix. The backtracing procedure begins at the highest-scoring point in the matrix, and follows the arrows back until a 0 is reached. GCCCTAGCG GCGCAATG I I I
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.