Download presentation
Presentation is loading. Please wait.
Published byBenjamin Pearson Modified over 6 years ago
1
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel: Room 07-24, level 7, SOC1, National University of Singapore
2
Sequence Analysis Methods
3
Gene and Protein Sequence Alignment as a Mathematical Problem:
Example: Sequence a: ATTCTTGC Sequence b: ATCCTATTCTAGC Best Alignment: ATTCTTGC ATCCTATTCTAGC /|\ gap Bad Alignment: AT TCTT GC ATCCTATTCTAGC /|\ /|\ gap gap What is a good alignment?
4
How to rate an alignment?
Match: +8 (w(x, y) = 8, if x = y) Mismatch: -5 (w(x, y) = -5, if x ≠ y) Each gap symbol: -3 (w(-,x)=w(x,-)=-3) a1 a2 a3 - - x - - b1 b2 b3 - - y - -
5
Pairwise Alignment An alignment of a and b: Sequence a: CTTAACT
Sequence b: CGGATCAT An alignment of a and b: C---TTAACT CGGATCA--T Insertion gap Match Mismatch Deletion gap
6
Alignment Graph C---TTAACT CGGATCA--T Sequence a: CTTAACT
Sequence b: CGGATCAT Insertion gap C G G A T C A T C T T A A C T C---TTAACT CGGATCA--T Deletion gap
7
Graphic representation of an alignment
Sequence a: CTTAACT Sequence b: CGGATCAT C C C---TTAACT CGGATCA--T
8
Graphic representation of an alignment
Sequence a: CTTAACT Sequence b: CGGATCAT C G G A C C---TTAACT CGGATCA--T
9
Graphic representation of an alignment
Sequence a: CTTAACT Sequence b: CGGATCAT C G G A T C T C---TTAACT CGGATCA--T
10
Graphic representation of an alignment
Sequence a: CTTAACT Sequence b: CGGATCAT C G G A T C A C T T A A C C---TTAACT CGGATCA--T
11
Graphic representation of an alignment
Sequence a: CTTAACT Sequence b: CGGATCAT C G G A T C A T C T T A A C T C---TTAACT CGGATCA--T
12
Pathway of an alignment
Sequence a: CTTAACT Sequence b: CGGATCAT C G G A T C A T C T T A A C T C---TTAACT CGGATCA--T
13
Graphic representation of an alignment
Sequence a: CTTAACT Sequence b: CGGATCAT C G G A T C A T C T T A A C T CTTAACT- CGGATCAT
14
Pathway of an alignment
Sequence a: CTTAACT Sequence b: CGGATCAT C G G A T C A T C T T A A C T CTTAACT- CGGATCAT
15
Use of graph to generate alignments
Sequence a: CTTAACT Sequence b: CGGATCAT C G G A T C A T C T T A A C T - CTTAACT CGGATCAT
16
Use of graph to generate alignments
Sequence a: CTTAACT Sequence b: CGGATCAT C G G A T C A T C T T A A C T - C - - TTAACT CGGATC - AT -
17
Use of graph to generate alignments
Sequence a: CTTAACT Sequence b: CGGATCAT C G G A T C A T C T T A A C T CTTAACT - - CGGATCAT
18
Which pathway is better?
Sequence a: CTTAACT Sequence b: CGGATCAT C G G A T C A T C T T A A C T Multiple pathways Each with a unique scoring function
19
Alignment Score 8 C---TTAACT CGGATCA--T Sequence a: CTTAACT
Sequence b: CGGATCAT C G G A T C A T 8 C T T A A C T C---TTAACT CGGATCA--T
20
Alignment Score C---TTAACT CGGATCA--T Sequence a: CTTAACT
Sequence b: CGGATCAT C G G A T C A T 8 8-3 =5 C T T A A C T C---TTAACT CGGATCA--T
21
Alignment Score C---TTAACT CGGATCA--T Sequence a: CTTAACT
Sequence b: CGGATCAT C G G A T C A T 8 8-3 =5 5-3 =2 2-3 =-1 C T T A A C T C---TTAACT CGGATCA--T
22
Alignment Score C---TTAACT CGGATCA--T Sequence a: CTTAACT
Sequence b: CGGATCAT C G G A T C A T 8 5 2 -1 -1+8 =7 7-3 =4 4+8 =12 12-3 =9 9-3 =6 C T T A A C T C---TTAACT CGGATCA--T Alignment score 6+8=14
23
An optimal alignment -- the alignment of maximum score
Let A=a1a2…am and B=b1b2…bn . Si,j: the score of an optimal alignment between a1a2…ai and b1b2…bj With proper initializations, Si,j can be computed as follows.
24
Computing Si,j j w(ai,bj) w(ai,-) i w(-,bj) Sm,n
25
Initializations C G G A T C A T C T T A A C T Gap symbol: -3 -3 -6 -9
S0,1=-3, S0,2=-6, S0,3=-9, S0,4=-12, S0,5=-15, S0,6=-18, S0,7=-21, S0,8=-24 S1,0=-3, S2,0=-6, S3,0=-9, S4,0=-12, S5,0=-15, S6,0=-18, S7,0=-21 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 C T T A A C T
26
S1,1 = ? C G G A T C A T ? C T T A A C T Match: 8 Mismatch: -5
Gap symbol: -3 Option 1: S1,1 = S0,0 +w(a1, b1) = 0 +8 = 8 Option 2: S1,1=S0,1 + w(a1, -) = = -6 Option 3: S1,1=S1,0 + w( - , b1) = -3-3 = -6 Optimal: S1,1 = 8 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 ? C T T A A C T
27
S1,2 = ? C G G A T C A T C T T A A C T Match: 8 Mismatch: -5
Gap symbol: -3 Option 1: S1,2 = S0,1 +w(a1, b2) = = -8 Option 2: S1,2=S0,2 + w(a1, -) = = -9 Option 3: S1,2=S1,1 + w( - , b2) = 8-3 = 5 Optimal: S1,2 =5 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 ? C T T A A C T
28
S2,1 = ? C G G A T C A T C T T A A C T Match: 8 Mismatch: -5
Gap symbol: -3 S2,1 = ? Option 1: S2,1= S1,0 +w(a2, b1) = = -8 Option 2: S2,1=S1,1 + w(a2, -) = = 5 Option 3: S2,1=S2,0 + w( - , b1) = -6-3 = -9 Optimal: S2,1 =5 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 ? C T T A A C T
29
S2,2 = ? C G G A T C A T C T T A A C T Match: 8 Mismatch: -5
Gap symbol: -3 Option 1: S2,2= S1,1 +w(a2, b2) = 8 -5 = 3 Option 2: S2,2=S1,2 + w(a2, -) = = 2 Option 3: S2,2=S2,1 + w( - , b2) = 5-3 = 2 Optimal: S2,2 =3 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 ? C T T A A C T
30
S3,5 = ? C G G A T C A T C T T A A C T -3 -6 -9 -12 -15 -18 -21 -24 8
-3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 ? C T T A A C T
31
S3,5 = ? C G G A T C A T C T T A A C T -3 -6 -9 -12 -15 -18 -21 -24 8
-3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 -8 -11 -14 14 C T T A A C T optimal score
32
C T T A A C – T C G G A T C A T 8 – 5 –5 +8 -5 +8 -3 +8 = 14
8 – 5 – = 14 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 -8 -11 -14 14 C T T A A C T
33
Local vs. Global Sequence Alignment:
Example: DNA sequence a: ATTCTTGC DNA sequence b: ATCCTATTCTAGC Local Alignment: ATTCTTGC Gaps ignored in local alignments ATCCTATTCTAGC /|\ gap Global Alignment: AT TCTT GC ATCCTATTCTAGC /|\ /|\ gap gap Gaps counted in global alignments
34
Global Alignment vs. Local Alignment
All sections are counted Only local sections (normally separated by gaps) are counted
35
An optimal local alignment
Si,j: the score of an optimal local alignment ending at ai and bj With proper initializations, Si,j can be computed as follows.
36
Initializations C G G A T C A T C T T A A C T Match: 8 Mismatch: -5
Gap symbol: -3 C G G A T C A T C T T A A C T
37
S1,1 = ? C G G A T C A T ? C T T A A C T Match: 8 Mismatch: -5
Gap symbol: -3 Option 1: S1,1 = S0,0 +w(a1, b1) = 0 +8 = 8 Option 2: S1,1=S0,1 + w(a1, -) = = -3 Option 3: S1,1=S1,0 + w( - , b1) = 0-3 = -3 Option 4: S1,1=0 Optimal: S1,1 = 8 C G G A T C A T ? C T T A A C T
38
local alignment C G G A T C A T 8 5 2 3 13 11 ? C T T A A C T Match: 8
Mismatch: -5 Gap symbol: -3 C G G A T C A T 8 5 2 3 13 11 ? C T T A A C T
39
local alignment A – C - T A T C A T 8-3+8-3+8 = 18 C G G A T C A T 8 5
8 5 2 3 13 11 10 7 18 C T T A A C T The best score
40
BLAST Basic Local Alignment Search Tool
Procedure: Divide all sequences into overlapping constituent words (size k) Build the hash table for Sequence a. Scan Sequence b for hits. Extend hits.
41
BLAST Basic Local Alignment Search Tool
Step 1: Hash table for sequence A
42
Amino acid similarity matrix PAM 120
Instead of using the simple values +8 and -5 for matches and mismatches, this statistically derived score matrix is used to rank the level of similarity between two amino acids
43
Amino acid similarity matrix PAM 250
This is a more popularly used score matrix for ranking the level of similarity of two amino acids. It is derived by consideration of more diverse sets of data and more number of statistical steps.
44
Amino acid similarity matrix Blosum 45
The Blosum matrices were calculated using data from the BLOCKS database which contains alignments of more distantly-related proteins. In principle, Blosum matrices should be more realistic for comparing distantly-related proteins, but may introduce error for conventional proteins. .
45
BLAST Basic Local Alignment Search Tool
46
BLAST Basic Local Alignment Search Tool
Step 2: Use all of the 2-letter words in query sequence to scan against database sequence and mark those with score > 8 Note: Marked points can be on the diagonal and off-diagonal LN:LN=9 NF:NY=8 GW:PW=10
47
BLAST Step2: Scan sequence b for hits.
48
BLAST Step2: Scan sequence b for hits. Step 3: Extend hits.
BLAST 2.0 saves the time spent in extension, and considers gapped alignments. hit Terminate if the score of the extension fades away.
49
Multiple sequence alignment (MSA)
The multiple sequence alignment problem is to simultaneously align more than two sequences. Seq1: GCTC Seq2: AC Seq3: GATC GC-TC A---C G-ATC
50
Multiple sequence alignment MSA
51
How to score an MSA? Sum-of-Pairs (SP-score) Score + Score Score = +
GC-TC A---C Score + GC-TC A---C G-ATC GC-TC G-ATC Score Score = + A---C G-ATC Score
52
How to score an MSA? Sum-of-Pairs (SP-score) Score + Score Score = +
= 5 + = = 5 = 28 SP-score=5+18+5=28 GC-TC A---C Score + GC-TC A---C G-ATC GC-TC G-ATC Score Score = + A---C G-ATC Score
53
Position Specific Iterated BLAST
PSI-BLAST is a rather permissive alignment tool and it can find more distantly related sequences than FASTA or BLAST Especially, in many cases, it is much more sensitive to weak but biologically relevant sequence similarities.
54
Position Specific Iterated BLAST
PSI-BLAST is used for: Distant homology detection Fold assignment: profile-profile comparison Domain identification Evolutionary Analysis (e.g. tree building) Sequence Annotation / function assignment Profile export to other programs Sequence clustering Structural genomics target selection
55
Position Specific Iterated BLAST
Collect all database sequence segments that have been aligned with query sequence with E-value below set threshold (default 0.001, but all sequences with E<10 are displayed for manual inclusion) Construct position specific scoring matrix for collected sequences. Rough idea: Align all sequences to the query sequence as the template. Assign weights to the sequences Construct position specific scoring matrix Iterate
56
How PLS-BLAST works? using profile Take a sequence
. Y using profile Take a sequence MGLLTREIF--ILQQ Search for similar sequences in a full sequence database MGLLTREIF--ILQQ FGLLRT-I-T-YMTN -RLTRD-I---LGLY FGLLRT-I---FMTS New sequences in the multiple alignment FGLGRT-I-T-YMTN -GLVRT-I---LGLE FGLLRT-I---YMTQ Sequences are multiply aligned A C . Y Construct a new profile A C . Y After several iterations of this procedure we have: Sequence information, including links to annotation Several sets of multiple alignments. Profiles, derived by us or by PSI-BLAST Threshold information (alignment statistics) Construct a profile, and represent conservation in each position numerically Profile holds more information than a single sequence: use the profile to retrieve additional sequences
57
Consensus sequence A sequence where each position is defined by majority vote based on multiple sequence alignment. Use consensus sequence for data base search. PEAINYGRFTPFS I KSDVW
58
Flow chart of PSI-BLAST
MGLLTREIF--ILQQ FGLGRT-I-T-YMTN -GLVRT-I---LGLE FGLLRT-I---YMTQ Take a sequence Search for similar sequences in a full sequence database A C . Y Construct a profile, and represent conservation in each position numerically Profile holds more information than a single sequence: use the profile to retrieve additional sequences Sequences are multiply aligned Construct a new profile A C . Y Using profile to search for similar sequences in a full sequence database A Y FGLLRT-I-T-YMTN -RLTRD-I---LGLY FGLLRT-I---FMTS New sequences in the multiple alignments New iteration Next New iteration……
59
PSI-BLAST NCBI PSI-BLAST tutorial :
60
PSI-BLAST NCBI PSI-BLAST tutorial :
61
PSI-BLAST NCBI PSI-BLAST tutorial :
62
PSI-BLAST NCBI PSI-BLAST tutorial :
63
PSI-BLAST NCBI PSI-BLAST tutorial :
64
PSI-BLAST NCBI PSI-BLAST tutorial :
65
PSI-BLAST NCBI PSI-BLAST tutorial :
66
PSI-BLAST NCBI PSI-BLAST tutorial :
67
PSI-BLAST NCBI PSI-BLAST tutorial :
68
PSI-BLAST NCBI PSI-BLAST tutorial :
69
PSI-BLAST NCBI PSI-BLAST tutorial :
70
PSI-BLAST NCBI PSI-BLAST tutorial :
71
PSI-BLAST NCBI PSI-BLAST tutorial :
72
PSI-BLAST NCBI PSI-BLAST tutorial :
73
PSI-BLAST NCBI PSI-BLAST tutorial :
74
PSI-BLAST NCBI PSI-BLAST tutorial :
75
PSI-BLAST NCBI PSI-BLAST tutorial :
76
PSI-BLAST NCBI PSI-BLAST tutorial :
77
PSI-BLAST NCBI PSI-BLAST tutorial :
78
PSI-BLAST NCBI PSI-BLAST tutorial :
79
PSI-BLAST NCBI PSI-BLAST tutorial :
80
PSI-BLAST NCBI PSI-BLAST tutorial :
81
PSI-BLAST NCBI PSI-BLAST tutorial :
82
PSI-BLAST NCBI PSI-BLAST tutorial :
83
PSI-BLAST NCBI PSI-BLAST tutorial :
84
Summary of Today’s lecture
Sequence alignment methods revisited: Pair-wise alignment Multiple sequence alignment BLAST PSI-BLAST Use of PSI-BLAST to probe protein function
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.