Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments.

Similar presentations


Presentation on theme: "Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments."— Presentation transcript:

1 Pairwise Sequence Alignment Part 2

2 Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments

3 Global Alignment -Cont

4 Needleman-Wunsch Alignment Global alignment between sequences –Compare entire sequence against another Create scoring table –Sequence A across top, B down left Cell at column i and row j contains the score of best alignment between the first i elements of A and the first j elements of B –Global alignment score is bottom right cell

5 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 A 2 T 3 G 4 T 5

6 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 A 2 T 3 G 4 T 5 A-A-

7 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 A 2 T 3 G 4 T 5 ACGCTG ------

8 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 A 2 -2 T 3 -3 G 4 -4 T 5 -5 ----- CATGT

9 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 A 2 -2 T 3 -3 G 4 -4 T 5 -5 ACAC

10 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 1 A 2 -2 T 3 -3 G 4 -4 T 5 -5 AC -C

11 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 10 A 2 -2 T 3 -3 G 4 -4 T 5 -5 ACG -C-

12 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 10 A 2 -2 T 3 -3 G 4 -4 T 5 -5 ACGC -C-- ACGC ---C

13 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 10 -2-3 A 2 -2100 T 3 -3 G 4 -4 T 5 -5 ACG -CA

14 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 10 -2-3 A 2 -2100-2-3 T 3 -300 10 G 4 -4 2103 T 5 -5-2 1132

15 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0-2-3-4-5-6 C 1 10 -2-3 A 2 -2100-2-3 T 3 -300 10 G 4 -4 2103 T 5 -5-2 1132

16 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 10 A 2 10 T 3 01 G 4 213 T 5 32

17 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 10 A 2 10 T 3 01 G 4 213 T 5 32 ACGCTG- -C-ATGT

18 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 10 A 2 10 T 3 01 G 4 213 T 5 32 ACGCTG- -CA-TGT

19 0 A1A1 C2C2 G3G3 C4C4 T5T5 G6G6 0 0 C 1 10 A 2 10 T 3 01 G 4 213 T 5 32 -ACGCTG CATG-T-

20 Global Alignment versus Local Alignment ATTGCAGTG-TCGAGCGTCAGGCT ATTGCGTCGATCGCAC-GCACGCT Global Alignment Local Alignment CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT

21 Global vs. Local alignment DOROTHY HODGKIN Global alignment: DOROTHY--------HODGKIN DOROTHYCROWFOOTHODGKIN Local alignment:

22 Local Alignment Best score for aligning part of sequences –Often beats global alignment score Similar algorithm: Smith-Waterman –Table cells never score below zero

23 0 T1T1 A2A2 C3C3 T4T4 A5A5 A6A6 0 0000000 T 1 0100100 A 2 0020021 A 3 0011013 T 4 0000201 A 5 0010031 TACTA TAATA TAA

24 Problems with DP for sequence alignments -The complexity is very high - Given a score, how to evaluate the significance of the alignment?

25 Complexity Complexity is determined by size of table –Aligning a sequence of length m against one of length n requires calculating (m  n) cells Time of calculation Lets say we calculate 10 8 cells per second on a one processor PC –Aligning two mRNA sequences of 8,000 bp requires 64,000,000 cells  0.64 seconds –Aligning an mRNA and a 10 7 bp chromosome requires ~10 11 cells  1,000 secs = 15 minutes

26 Complexity for large databases Let’s say a database contains 3  10 10 base pairs –Searching an mRNA against the database will require ~2.5  10 14 cells  2.5  10 6 secs = 1 month! We need an efficient algorithm to cut down on alignment

27 BLAST Basic Local Alignment Search Technique A set of tools developed at NCBI (BlastN, BlastP,..) BLAST benefits –Search speed –Ease of use –Statistical rigor

28 BLAST A good alignment contains subsequences of absolute identity: –First, identify very short (almost) exact matches. –Next, the best short hits from the 1st step are extended to longer regions of similarity. –Finally, the best hits are optimized using the Smith- Waterman algorithm.

29 Query sequence Words of length W (1) (2) Compare the word list to the database and identify exact matches BLAST Algorithm W default = 11

30 (3) For each word match, extend alignment in both directions (4) Score the alignments using Dynamic Programing (5) Evaluate the statistics significance

31 Using the pairwise comparison, each database search normally yields 2 groups of scores: genuinely related and unrelated sequences, with some overlap between them. A good search method should completely separate between the 2 score groups. Database Searches Random Related

32 E-value The number of hits (with the same similarity score) one can "expect" to see just by chance when searching the given string in a database of a particular size. higher e-value lower similarity –“ sequences with E-value of less than 0.01 are almost always found to be homologous” The lower bound is normally 0 (we want to find the best)

33 Expectation Values Increases linearly with length of query sequence Increases linearly with length of database Decreases exponentially with score of alignment


Download ppt "Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments."

Similar presentations


Ads by Google