Download presentation
Presentation is loading. Please wait.
1
Pairwise Sequence Alignment Part 2
2
Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments
3
Global Alignment -Cont
4
Needleman-Wunsch Alignment Global alignment between sequences –Compare entire sequence against another Create scoring table –Sequence A across top, B down left Cell at column i and row j contains the score of best alignment between the first i elements of A and the first j elements of B –Global alignment score is bottom right cell
5
0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 A 2 T 3 G 4 T 5
6
0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 A 2 T 3 G 4 T 5 A-A-
7
0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 A 2 T 3 G 4 T 5 ACGCTG ------
8
0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 A 2 -2 T 3 -3 G 4 -4 T 5 -5 ----- CATGT
9
0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 A 2 -2 T 3 -3 G 4 -4 T 5 -5 ACAC
10
0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 1 A 2 -2 T 3 -3 G 4 -4 T 5 -5 AC -C
11
0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 10 A 2 -2 T 3 -3 G 4 -4 T 5 -5 ACG -C-
12
0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 10 A 2 -2 T 3 -3 G 4 -4 T 5 -5 ACGC -C-- ACGC ---C
13
0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 10 -2-3 A 2 -2100 T 3 -3 G 4 -4 T 5 -5 ACG -CA
14
0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 10 -2-3 A 2 -2100-2-3 T 3 -300 10 G 4 -4 2103 T 5 -5-2 1132
15
0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 10 -2-3 A 2 -2100-2-3 T 3 -300 10 G 4 -4 2103 T 5 -5-2 1132
16
0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 10 A 2 10 T 3 01 G 4 213 T 5 32
17
0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 10 A 2 10 T 3 01 G 4 213 T 5 32 ACGCTG- -C-ATGT
18
0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 10 A 2 10 T 3 01 G 4 213 T 5 32 ACGCTG- -CA-TGT
19
0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 10 A 2 10 T 3 01 G 4 213 T 5 32 -ACGCTG CATG-T-
20
Global Alignment versus Local Alignment ATTGCAGTG-TCGAGCGTCAGGCT ATTGCGTCGATCGCAC-GCACGCT Global Alignment Local Alignment CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT
21
Global vs. Local alignment DOROTHY HODGKIN Global alignment: DOROTHY--------HODGKIN DOROTHYCROWFOOTHODGKIN Local alignment:
22
Local Alignment Best score for aligning part of sequences –Often beats global alignment score Similar algorithm: Smith-Waterman –Table cells never score below zero
23
0 T1T1 A2A2 C3C3 T4T4 A5A5 A6A6 0 0000000 T 1 0100100 A 2 0020021 A 3 0011013 T 4 0000201 A 5 0010031 TACTA TAATA TAA
24
Problems with DP for sequence alignments -The complexity is very high - Given a score, how to evaluate the significance of the alignment?
25
Complexity Complexity is determined by size of table –Aligning a sequence of length m against one of length n requires calculating (m n) cells Time of calculation Lets say we calculate 10 8 cells per second on a one processor PC –Aligning two mRNA sequences of 8,000 bp requires 64,000,000 cells 0.64 seconds –Aligning an mRNA and a 10 7 bp chromosome requires ~10 11 cells 1,000 secs = 15 minutes
26
Complexity for large databases Let’s say a database contains 3 10 10 base pairs –Searching an mRNA against the database will require ~2.5 10 14 cells 2.5 10 6 secs = 1 month! We need an efficient algorithm to cut down on alignment
27
BLAST Basic Local Alignment Search Technique A set of tools developed at NCBI (BlastN, BlastP,..) BLAST benefits –Search speed –Ease of use –Statistical rigor
28
BLAST A good alignment contains subsequences of absolute identity: –First, identify very short (almost) exact matches. –Next, the best short hits from the 1st step are extended to longer regions of similarity. –Finally, the best hits are optimized using the Smith- Waterman algorithm.
29
Query sequence Words of length W (1) (2) Compare the word list to the database and identify exact matches BLAST Algorithm W default = 11
30
(3) For each word match, extend alignment in both directions (4) Score the alignments using Dynamic Programing (5) Evaluate the statistics significance
31
Using the pairwise comparison, each database search normally yields 2 groups of scores: genuinely related and unrelated sequences, with some overlap between them. A good search method should completely separate between the 2 score groups. Database Searches Random Related
32
E-value The number of hits (with the same similarity score) one can "expect" to see just by chance when searching the given string in a database of a particular size. higher e-value lower similarity –“ sequences with E-value of less than 0.01 are almost always found to be homologous” The lower bound is normally 0 (we want to find the best)
33
Expectation Values Increases linearly with length of query sequence Increases linearly with length of database Decreases exponentially with score of alignment
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.