Download presentation
Presentation is loading. Please wait.
1
1 BLAST – A heuristic algorithm Anjali Tiwari Pannaben Patel Pushkala Venkataraman
2
2
3
3 BLAST Basic Local Alignment Search Tool Rapid Searching of Protein & nucleotide DBs Seeking similar sequences Database nr GenBank SwissProt PDB PIR PRF nr = non redundant database
4
4 Compile Words Scan DB Extend ProgramQueryDatabase Search Level Blastp Amino acid BlastnNucleotideNucleotideNucleotide BlastxNucleotide Tblastn Nucleotide TblastxNucleotideNucleotide BLAST – 3 STEP ALGORITHM
5
5 Alignment BLOSUM62 Gap Process of lining up 2 or more sequences to asses similarity A 20*20 substitution matrix for amino acids Space introduced into alignment to compensate for insertions/deletions in 1 sequence relative to another Some definitions
6
6 Similarity Measures Local Search Algorithms Similarity Matrix - BLOSUM Identities & Conservative Replacements = +ve Unlikely Replacements = -ve
7
7 Query Input 1000’s of sequences Calculate HSP Calculate MSP Display output MSP – Maximal Segment Pair HSP – High Scoring Pair General Concept of working of BLAST
8
8 Compile a list of high scoring words of length w from query (w=3 for proteins, 11 for nucleic acids) Step 1 Step 2 Step 3 Scan for word hits in the database of score greater than threshold, T Extend word hit in both directions to find High Scoring Pairs with scores greater than S Key Idea – BLAST1
9
9 Query – QQGPHUIQEGQQGKEEDPP Words of length 3 –w = QQG, QGP, GPH, PHU, HUI… Take first triple – QQG Make neighborhood words – w’ = QQG, QEG, GQG… Find high scoring triples – Blosum(w, w’) > T where T = Threshold parameter Suppose Blosum (QQG, QEG) =18 Blosum(QQG,GQG) = 12 Blosum(QQG, QQG)= 16 T=13 Choose QQG and QEG since Blosum Value > T value Step -1 Example
10
10 Step -2 Suppose Database Sequence = PKLMMQQGKQEGM Matching Word Pairs in DB sequence
11
11 Step -3 Query QQGPHUIQEGQQGKEEDPP DB Sequence PKLMMQQGKQEGM Blosum(QQG, QQG) =16 QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM Blosum(QQGK, QQGK) =21 QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM Blosum(QQGKE, QQGKQ) =23 QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM Blosum(QQGKEE, QQGKQE) =28 QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM Blosum(QQGKEED, QQGKQEG) =27
12
12 Extension to the right stops here because BLOSUM value is beginning to decrease ADVANTAGES Faster than Dynamic Programming Removes low complexity regions Spends less time on uninteresting search Statistical significance of results can be obtained & these are very good DISADVANTAGES Finds & reports only local alignments Finds too many word hits per Sequence thus reducing speed Does not allow for gaps in sequence *** New Models to combat disadvantages *** BLAST2, PSI Blast
13
13 2 Hit Method - 3 Step method Step 1 and Step 2 as BLAST –1 Step – 3 is where they differ BLAST now looks for 2 words in a sequence instead of 1 while aligning. The 2 words are at a distance < A and are not overlapping. Typically A=40 A BLAST2 – Combination of 2 Hit & Gapped
14
14 Gapped Blast Gapped alignment is introduced to get an optimal alignment Gapped alignment is introduced to get an optimal alignment Two sequences: Two sequences: Seq A = ACGTA Seq B = ACATA Normal alignment is ACGTAACATA But if a penalty of mismatch is larger than the penalty of gap then the best optimal alignment is as below. AC-GTAACG-TA ACA-TAAC-ATA
15
15 Gapped BLAST - Allows gaps to come while aligning Query – ATTGTCAAAGACTTGAGCTGATGCAT DB GGCAGACATGACTGACAAGGGTATCG ATTGTCAAAGACTTGAGCTGATGCAT GGCAGACATGA CTGACAAGGGTATCG Mismatch Gap
16
16 PSI – BLAST - Position specific iterated BLAST. Used for multiple alignments Query Sequence BLAST search of DB Sequences with high scores collected Multiple alignment & profile made DB searched with profile New sequences added & process iterated
17
17 References Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." Journal of Molecular Biology 215:403-410. Altschul, S.F.,Thomas L.M., Alejandro A.S, Jinghui Z, Zheng Z, W. Miller & David J.L. (1997) “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.” Nucleic Acids Research. http://www.ncbi.nlm.nih.gov/ http://bioinf.man.ac.uk/ember/prototype/
18
18 References (Continued) http://www.psc.edu/biomed/training/tutorials/sequence /db/index.html http://www.psc.edu/biomed/training/tutorials/sequence /db/index.html http://www.psc.edu/biomed/training/tutorials/sequence /db/index.html http://www.psc.edu/biomed/training/tutorials/sequence /db/index.html http://aracyc.stanford.edu/~jshrager/jeff/mbcs/match.ht ml http://aracyc.stanford.edu/~jshrager/jeff/mbcs/match.ht ml http://aracyc.stanford.edu/~jshrager/jeff/mbcs/match.ht ml http://aracyc.stanford.edu/~jshrager/jeff/mbcs/match.ht ml http://www.ime.usp.br/~durham/cursos/ibi5032/pub/do c/allignmentTutorial.pdf http://www.ime.usp.br/~durham/cursos/ibi5032/pub/do c/allignmentTutorial.pdf http://www.ime.usp.br/~durham/cursos/ibi5032/pub/do c/allignmentTutorial.pdf http://www.ime.usp.br/~durham/cursos/ibi5032/pub/do c/allignmentTutorial.pdf http://ibivu.cs.vu.nl/teaching/masters/seq_analysis/sa_le cture3.pdf http://ibivu.cs.vu.nl/teaching/masters/seq_analysis/sa_le cture3.pdf http://ibivu.cs.vu.nl/teaching/masters/seq_analysis/sa_le cture3.pdf http://ibivu.cs.vu.nl/teaching/masters/seq_analysis/sa_le cture3.pdf
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.