1 BLAST – A heuristic algorithm Anjali Tiwari Pannaben Patel Pushkala Venkataraman
2
3 BLAST Basic Local Alignment Search Tool Rapid Searching of Protein & nucleotide DBs Seeking similar sequences Database nr GenBank SwissProt PDB PIR PRF nr = non redundant database
4 Compile Words Scan DB Extend ProgramQueryDatabase Search Level Blastp Amino acid BlastnNucleotideNucleotideNucleotide BlastxNucleotide Tblastn Nucleotide TblastxNucleotideNucleotide BLAST – 3 STEP ALGORITHM
5 Alignment BLOSUM62 Gap Process of lining up 2 or more sequences to asses similarity A 20*20 substitution matrix for amino acids Space introduced into alignment to compensate for insertions/deletions in 1 sequence relative to another Some definitions
6 Similarity Measures Local Search Algorithms Similarity Matrix - BLOSUM Identities & Conservative Replacements = +ve Unlikely Replacements = -ve
7 Query Input 1000’s of sequences Calculate HSP Calculate MSP Display output MSP – Maximal Segment Pair HSP – High Scoring Pair General Concept of working of BLAST
8 Compile a list of high scoring words of length w from query (w=3 for proteins, 11 for nucleic acids) Step 1 Step 2 Step 3 Scan for word hits in the database of score greater than threshold, T Extend word hit in both directions to find High Scoring Pairs with scores greater than S Key Idea – BLAST1
9 Query – QQGPHUIQEGQQGKEEDPP Words of length 3 –w = QQG, QGP, GPH, PHU, HUI… Take first triple – QQG Make neighborhood words – w’ = QQG, QEG, GQG… Find high scoring triples – Blosum(w, w’) > T where T = Threshold parameter Suppose Blosum (QQG, QEG) =18 Blosum(QQG,GQG) = 12 Blosum(QQG, QQG)= 16 T=13 Choose QQG and QEG since Blosum Value > T value Step -1 Example
10 Step -2 Suppose Database Sequence = PKLMMQQGKQEGM Matching Word Pairs in DB sequence
11 Step -3 Query QQGPHUIQEGQQGKEEDPP DB Sequence PKLMMQQGKQEGM Blosum(QQG, QQG) =16 QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM Blosum(QQGK, QQGK) =21 QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM Blosum(QQGKE, QQGKQ) =23 QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM Blosum(QQGKEE, QQGKQE) =28 QQGPHUIQEGQQGKEEDPP PKLMMQQGKQEGM Blosum(QQGKEED, QQGKQEG) =27
12 Extension to the right stops here because BLOSUM value is beginning to decrease ADVANTAGES Faster than Dynamic Programming Removes low complexity regions Spends less time on uninteresting search Statistical significance of results can be obtained & these are very good DISADVANTAGES Finds & reports only local alignments Finds too many word hits per Sequence thus reducing speed Does not allow for gaps in sequence *** New Models to combat disadvantages *** BLAST2, PSI Blast
13 2 Hit Method - 3 Step method Step 1 and Step 2 as BLAST –1 Step – 3 is where they differ BLAST now looks for 2 words in a sequence instead of 1 while aligning. The 2 words are at a distance < A and are not overlapping. Typically A=40 A BLAST2 – Combination of 2 Hit & Gapped
14 Gapped Blast Gapped alignment is introduced to get an optimal alignment Gapped alignment is introduced to get an optimal alignment Two sequences: Two sequences: Seq A = ACGTA Seq B = ACATA Normal alignment is ACGTAACATA But if a penalty of mismatch is larger than the penalty of gap then the best optimal alignment is as below. AC-GTAACG-TA ACA-TAAC-ATA
15 Gapped BLAST - Allows gaps to come while aligning Query – ATTGTCAAAGACTTGAGCTGATGCAT DB GGCAGACATGACTGACAAGGGTATCG ATTGTCAAAGACTTGAGCTGATGCAT GGCAGACATGA CTGACAAGGGTATCG Mismatch Gap
16 PSI – BLAST - Position specific iterated BLAST. Used for multiple alignments Query Sequence BLAST search of DB Sequences with high scores collected Multiple alignment & profile made DB searched with profile New sequences added & process iterated
17 References Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." Journal of Molecular Biology 215: Altschul, S.F.,Thomas L.M., Alejandro A.S, Jinghui Z, Zheng Z, W. Miller & David J.L. (1997) “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.” Nucleic Acids Research.
18 References (Continued) /db/index.html /db/index.html /db/index.html /db/index.html ml ml ml ml c/allignmentTutorial.pdf c/allignmentTutorial.pdf c/allignmentTutorial.pdf c/allignmentTutorial.pdf cture3.pdf cture3.pdf cture3.pdf cture3.pdf