Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pairwise Sequence Alignment

Similar presentations


Presentation on theme: "Pairwise Sequence Alignment"— Presentation transcript:

1 Pairwise Sequence Alignment
LESSON 3(2)

2 HOMEWORK2 Try a pairwise alignment of human alpha and beta globin at the NCBI protein BLAST site, using the available matrices (PAM30, PAM70, PAM250, BLOSUM45, BLOSUM62, BLOSUM80). Which gives the highest bit score?

3 Protein alignment vs. DNA alignment
Protein Alignment can be more Informative than DNA Alignment. BUT, ……

4 Percentage identity (% ID)
CCATCAAGTCC CCATGTACAGAGTCC 5/15 = 33 % CCAT---CA-AGTCC CCATGTACAGAGTCC 11/15 = 73 %

5 CCATCAAGTCC CCATGTACAGAGTCC CCAT---CA-AGTCC CCATGTACAGAGTCC

6 CCAT---CA-AGTCC CCATGTACAGAGTCC Dotplot C A T G

7 Scoring Matrices CCATCAAGTCC CCATGTACAGA Identity matrix
(e.g. match=1 and mismatch=−1) Substitution matrix

8 A transition (a purine becomes another purine) happens frequently.
(G) (C) (T) A transition (a purine becomes another purine) happens frequently. A transversion (a purine becomes pyrimidine) occurs far less frequently.

9 Codons are degenerate. Codons are degenerate: changes in the third position often do not alter the amino acid that is specified

10 DNA Alignments are appropriate
To confirm To study polymorphism To study non-coding regions of DNA

11 DNA Alignments for Finding regulatory elements in DNA sequences
non-coding DNA ? full of regulatory elements give rise to the differences between organisms Each gene is associated with thousands of nucleotides of non-coding DNA.

12 Best alignment Generate all possible gapped alignment.
Find the score for each. Select the highest-scoring alignment. Time consuming 100 a.a : 1075 alignments Dynamic programming algorithm

13 Global Sequence Alignment: Needleman and Wunsch Algorithm

14 GGTT GAT- GG-TT -GAT- GGTT GAT G-AT Match : +1 Mismatch : -1 Gap : -2
= -1 GG-TT -GAT- = -4 G-AT = -1 GGTT GAT Match : +1 Mismatch : -1 Gap : -2 Introducing gaps greatly increases the number of different comparisons between two sequences and in the general case it is impossible to do them all.

15 Alignment by Dynamic Programming
Global Alignment Needleman & Wunsch (1970) used in major alignment software packages (e.g. the ALIGN tool in the FASTA package) Local Alignment Smith & Waterman Algorithm (1981)

16 “mismatch” “gap” “gap” 16

17 Four possible outcomes in aligning two sequences
1 2 [1] identity (stay along a diagonal) [2] mismatch (stay along a diagonal) [3] gap in sequence 1 (move vertically!) [4] gap in sequence 2 (move horizontally!) 17

18 - G T A Global Alignment by Dynamic Programming GGTT GAT Match : +1
A GGTT GAT Match : +1 Mismatch : -1 Gap : -2

19 Fill in the matrix using “dynamic programming”
19

20 Dynamical programming - the 3 way to leave a cell
→ (Rightward) insert gap in vertical sequence ↓ (Downward) insert gap in horizontal sequence (Diagonal) Match Mismatch - G T -2 -4 -6 -8 A -G G G A

21 - G T A -2 -4 -6 -8 +1 G Global Alignment by Dynamic Programming
Match : +1 Mismatch : -1 Gap : -2 - G T -2 -4 -6 -8 +1 A G

22 - G T A -2 -4 -6 -8 +1 -1 ↓ : -4-2 = -6 → : +1-2 = -1
Global Alignment by Dynamic Programming Match : +1 Mismatch : -1 Gap : -2 - G T -2 -4 -6 -8 +1 -1 A ↓ : = -6 → : = -1 : = -1

23 Global Alignment by Dynamic Programming
Match : +1 Mismatch : -1 Gap : -2 - G T -2 -4 -6 -8 +1 -1 -3 -5 A final alignment score

24 Global Alignment by Dynamic Programming
Match : +1 Mismatch : -1 Gap : -2 Traceback pointer - G T -2 -4 -6 -8 +1 -1 -3 -5 A GGTT G-AT

25

26 26

27 27

28 Local Alignment : Smith and Waterman Algorithm

29 Fail to identify functionally important residues

30 Global vs. Local Global alignments Local alignments
Comparing sequences over their entire length Comparing sequences with partial homology Making high-quality alignments

31 Global alignment (top) includes matches
ignored by local alignment (bottom) 15% identity 30% identity NP_824492, NP_337032

32 Domain Parts of sequence/Particular functional site
sequence-structure-function relation Domain

33 Local Alignments Only aligns the most similar portions of sequences
To look for small parts of the sequences that are similar to each other. searching for functionally related sequences Programs for database searching FASTA BLAST

34 Alignments by Dynamic Programming
Match : +1 Mismatch : -1 Gap : -2 S1 = GCCCTAGCG S2 = GCGCAATG Needleman-Wunsch methods (Global Alignment) GCCCTAGCG GCGC-AATG Smith-Waterman methods (Local Alignment) GCGCAATG I I I I I I I I

35 Smith- Waterman methods
Dynamic programming algorithm for performing local sequence alignment Traces only continue as long as the scores are positive. Whenever a score becomes negative it is set to 0. diagonal horizontal vertical 0. start again h Smith–Waterman is a dynamic programming algorithm too. No values in the scoring matrix can be negative! H ≥ 0

36 Needleman-Wunsch methods (Global Alignment)
GCCCTAGCG GCGC-AATG Match : +1, Mismatch : -1, Gap : -2 I I I I I

37 Smith-Waterman methods (Local Alignment)
GCCCTAGCG GCGCAATG Match : +1, Mismatch : -1, Gap : -2 I I I

38 The highest scoring cell does not need to be at the bottom right-hand corner, it could be anywhere in the matrix. The backtracing procedure begins at the highest-scoring point in the matrix, and follows the arrows back until a 0 is reached. GCCCTAGCG GCGCAATG I I I


Download ppt "Pairwise Sequence Alignment"

Similar presentations


Ads by Google