Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic Local Alignment Search Tool (BLAST)

Similar presentations


Presentation on theme: "Basic Local Alignment Search Tool (BLAST)"— Presentation transcript:

1 Basic Local Alignment Search Tool (BLAST)
Katie Moreland

2 Overview Sequence Alignment Dynamic Programming BLAST tutorial
Example execution of BLAST References

3 Sequence Alignment In bioinformatics, a sequence alignment is a way of arranging the primary sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. ( Example Alignment: G A A T T C A G T T A G G A - T C - G A

4 Sequence Alignment Cont…
Motivations: Similar primary structure in proteins implies similar form and function Similar short sequences can lead to motif finding (ie: promoter regions) Similarities between gene regions can be used for phylogenetic classification

5 Sequence Similarity Alignments are not unique
Need a way to compare alignments to find optimal Optimal Alignment is the alignment that maximizes the overall score (may not be unique) Three possibilities when aligning a character for each string: (perfect match, mismatch, indel) Align the two characters Perfect Match Mismatch C C C G Insertion/Deletion (indel) Gap in 1st string (S) Gap in 2nd string (T) C C

6 Sequence Similarity Cont…
Simple Metric: σ(x,x) = 1 (match) σ(x,y) = -1 (mismatch) σ(x,-) = σ(-,x) = -1 (indel) In practice it is useful to define a substitution matrix such as PAM250 to take probabilities of certain mutations into account. ie: cost of mutation to a chemically similar amino-acid less than cost of mutation to dissimilar amino-acid Cost of indels depends on application

7 Intro to Dynamic Programming
Used to reduce time complexity of algorithms with certain properties Characteristics of Dynamic Programming: Overlapping subproblems (otherwise recursion/divide and conquer) Optimality of subproblems (ie: Shortest Path)

8 Intro to Dynamic Programming
Two types of alignment Global (Needleman-Wunsch) Attempt to align every residue in the sequences Most useful when sequences are similar in size and sequence Local (Smith-Waterman) Finds an alignment for parts of the two strings Most useful for dissimilar sequences that share regions of similarity or contain similar motifs

9 Needleman-Wunsch Algorithm
Input: Two strings, S and T Construct a matrix with |S|+1 rows and |T|+1 columns Label each row with a symbol from S and each column with a symbol from T, except for the first position in each which represents an initial gap Beginning at upper left corner: Move diagonally to represent aligning the two characters from the strings Move right to represent inserting a space in S Move down to represent insert a space in T Update when newScore > oldScore (include arrow to show which cell we came from) Optimal alignment score is in bottom right corner of matrix Backtrack to find optimal alignment

10 Needleman-Wunsch Algorithm
Sequences to Align: S : GCTC T : CGTTC Simple Scoring Function: σ(x,x) = 2 (match) σ(x,y) = -1 (mismatch) σ(x,-) = σ(-,x) = -1 (indel)

11 Tracing Needleman-Wunsch

12 Tracing Needleman-Wunsch

13 Tracing Needleman-Wunsch
-1

14 Tracing Needleman-Wunsch
-1

15 Tracing Needleman-Wunsch
-1 -2 +1

16 Tracing Needleman-Wunsch
-1 -2 +1

17 Tracing Needleman-Wunsch
-1 -2 -3 +1

18 Tracing Needleman-Wunsch
-1 -2 -3 +1

19 Tracing Needleman-Wunsch
-1 -2 -3 -4 +1

20 Tracing Needleman-Wunsch
-1 -2 -3 -4 +1

21 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

22 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

23 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

24 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

25 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

26 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

27 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

28 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

29 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

30 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

31 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

32 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

33 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

34 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

35 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

36 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

37 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

38 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

39 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

40 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1

41 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2

42 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2

43 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2

44 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2

45 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2

46 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2

47 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2

48 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2

49 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2

50 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2

51 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2

52 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2

53 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2

54 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2

55 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2

56 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2

57 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

58 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

59 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

60 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

61 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

62 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

63 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

64 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

65 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

66 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

67 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

68 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

69 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

70 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

71 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

72 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

73 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

74 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

75 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

76 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

77 Tracing Needleman-Wunsch
-1 -2 -3 -4 -5 +1 +2 +4

78 Modifications for Local Alignment
Allow the algorithm to restart whenever it is advantageous to do so (start the algorithm from any position in S or T) If 0 > newScore, set score for cell I,j to 0 The optimal score is now the maximum value in all cells of the matrix (stop at any position in S or T)

79 Other Modifications Use a gap penalty function to accommodate large areas of gaps vs many gaps of size 1 Biological motivations (ie: mutations, cDNA matching)

80 BLAST Basic Local Alignment Search Tool Features: Uses:
Features: Finds regions of local similarity between sequences Heuristic approach achieves efficiency (important when searching entire databases of sequences) Computes statistical significance of matches Uses: Infer evolutionary/functional relationships Identify members of gene families

81 BLAST Algorithm Three Stages
Find hotspots – exact matches of word length=W in the two sequences being considered (idea: good alignments for sequences will share regions of similarity, find first) Extend hotspots in both directions using ungapped alignment to increase alignment score, pass high scoring sequences to stage 3 Perform gapped alignment between the 2 sequences using variation of Smith-Waterman algorithm. Only statistically significant alignments are displayed to the user.

82 BLAST Input FASTA format
>gi|532319|pir|TVFV2E|TVFV2E envelope protein ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLM NTTVTTGLLLNGSYSENRTQIWQKHRTSNDSALILLNKHYNL TVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWCHFPS NWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPE TANLWFNCHGEFFYCKMDWFLNYLNNLTVDADHNECKNTS GTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKKTYAPPRE GHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRY KLVEITPIGFAPTEVRRYTGGHERQKRVPFVXXXXXXXXXXX XXXXXXXXXXXVQSQHLLAGILQQQKNL LAAVEAQQQMLKLTIWGVK Accession/GI Number Found using GenBank In FASTA example, gi number is

83 BLAST Input

84 BLAST Options Select Program: Select database(s) to search
blastp, blastn, etc Select database(s) to search Nr default, contains GenBank, PDB, SwissProt, and others Gapped/Ungapped Alignment Search within certain organism

85 BLAST Options Cont… Filtering on/off E Value Threshold
On by default, locates low complexity regions in a sequence and removes them before performing an alignment Low complexity region: a region with highly biased amino acid composition E Value Threshold Default =10, represents the number of hits one can expect to find by chance when searching the database Substitution Matrix Default: BLOSUM62 Assigns probability for each alignment position that a given substitution is known to occur Other matrices are supported, including PAM matrices

86 BLAST Options

87 Advanced BLAST Options
-G Cost to open a gap [Integer] default = 11 -E Cost to extend a gap [Integer] default = 1 -e Expectation value (E) [Real] default = 10.0 -W Word size default is 11 for blastn, 3 for other programs. -v Number of one-line descriptions (V) [Integer] default = 100 -b Number of alignments to show (B) [Integer]

88 BLAST Output Request ID Query Information Database Information
Taxonomy Reports Link Graphical Display of alignments Description of significant alignments Pairwise alignments

89 BLAST Output Cont…

90 Taxonomy Reports Lineage Report Organism Report Taxonomy Report
Hierarchical tree structure representing how many hits occurred in each group 'focused' on the organism which yielded the strongest BLAST hit Organism Report Groups hits by species Taxonomy Report Summary of relationships between organisms in BLAST hit list

91 Graphical Display of Alignments
displays the top 100 sequence alignments for a search by default Thick red bar at top represents query sequence, numbers correspond to amino acid residues Hits represented by colored bars, mouse over the bar to view the definition and score in the text box, click to go to pairwise alignment Bar color represents alignment similarity score Color Key given above query sequence to determine ranges of similarities for a particular color

92 Graphical Display of Alignments

93 Description of Significant Alignments
Listed in order of decreasing significance Default number displayed=100

94 Pairwise Alignments

95 BLAST Demonstration >gi|2501594|sp|Q57997|Y577_METJA PROTEIN MJ0577
MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVIDEREIKKRDIFSLLLGVAGLNKSVEEFENELKNKLTEEAKNKMENIKKELEDVGFKVKDIIVVGIPHEEIVKIAEDEGVDIIIMGSHGKTNLKEILLGSVTENVIKKSNKPVLVVKRKNS

96 References Altschul, SF, W Gish, W Miller, EW Myers, and DJ Lipman. Basic local alignment search tool. J Mol Biol 215(3):403-10, 1990." 2. BLAST Tutorials Hatzivassiloglou, V.


Download ppt "Basic Local Alignment Search Tool (BLAST)"

Similar presentations


Ads by Google