Download presentation
Presentation is loading. Please wait.
Published byFrancine Cole Modified over 9 years ago
1
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle
2
The BLAST Topics Exactly What is BLAST? A Quick Recap of Profiles A Few Statistics Behind the BLAST Program The Progression to Gapped BLAST The advancements in PSI BLAST Exactly What is BLAST? A Quick Recap of Profiles A Few Statistics Behind the BLAST Program The Progression to Gapped BLAST The advancements in PSI BLAST
3
Exactly What Is BLAST? Blast Programs are used for searching both protein and DNA databases for sequence similarities. BLAST programs can compare protein to protein, DNA to DNA, Protein to DNA, or DNA to protein. The DNA sequences used in comparison are usually conceptually transcribed before comparison. BLAST programs use a threshold value which can be adjusted to alter speed and probability. A higher value of T will give greater speed, but also a larger probability of missing weaker similarities. Can use various substitution matrices such as Blosum(62) or PAM 250. Blast Programs are used for searching both protein and DNA databases for sequence similarities. BLAST programs can compare protein to protein, DNA to DNA, Protein to DNA, or DNA to protein. The DNA sequences used in comparison are usually conceptually transcribed before comparison. BLAST programs use a threshold value which can be adjusted to alter speed and probability. A higher value of T will give greater speed, but also a larger probability of missing weaker similarities. Can use various substitution matrices such as Blosum(62) or PAM 250.
4
A Quick Recap of Profiles A sequence profile is a position specific scoring matrix generated from a group of aligned sequences and a basic scoring matrix. A profile will have L rows and 22 columns or vice versa. Amino acid matrix scores are multiplied by the ratio of that amino acid in the sequences being compared over the entire number of amino acid possibilities in the matrix. A consensus sequence or profile is then derived and used in future comparisons. A sequence profile is a position specific scoring matrix generated from a group of aligned sequences and a basic scoring matrix. A profile will have L rows and 22 columns or vice versa. Amino acid matrix scores are multiplied by the ratio of that amino acid in the sequences being compared over the entire number of amino acid possibilities in the matrix. A consensus sequence or profile is then derived and used in future comparisons.
5
A Few Statistics Used in BLAST Firstly we require that the expected score for two random amino acids Σ P i P j S ij to be negative. Now we can calculate two parameters λ and K. These two variables allow for a normalized scoring system through the equation S‘ = (λS – ln K) / (ln 2). S’ can now be plugged into the equation E = N/2^s’. E-Value > 0.01 = will return more loosely related similarities. E-Value <= 1*10^-5 will return more strictly related similarities. Firstly we require that the expected score for two random amino acids Σ P i P j S ij to be negative. Now we can calculate two parameters λ and K. These two variables allow for a normalized scoring system through the equation S‘ = (λS – ln K) / (ln 2). S’ can now be plugged into the equation E = N/2^s’. E-Value > 0.01 = will return more loosely related similarities. E-Value <= 1*10^-5 will return more strictly related similarities.
6
The Progression to Gapped BLAST Original BLAST program did not take gaps into account. BLAST used to look for single alignments of at least length T. Each positive alignment “hit” was then extended. Gapped BLAST now allows for two non-overlapping alignments of length T within distance A of one another. These alignments “hits” are then extended. Gapped BLAST allows for gap initiation and extension. ABCDE ABCDE ACD - - A–CD– (Original Blast) (Gapped Blast) Original BLAST program did not take gaps into account. BLAST used to look for single alignments of at least length T. Each positive alignment “hit” was then extended. Gapped BLAST now allows for two non-overlapping alignments of length T within distance A of one another. These alignments “hits” are then extended. Gapped BLAST allows for gap initiation and extension. ABCDE ABCDE ACD - - A–CD– (Original Blast) (Gapped Blast)
7
PSI BLAST Position-Specific Iterated BLAST Incorporates position specific matrices “profiles” Often much better at detecting weak similarities Before PSI BLAST the same techniques were used, but a large degree of expertise and human intervention was required Position-Specific Iterated BLAST Incorporates position specific matrices “profiles” Often much better at detecting weak similarities Before PSI BLAST the same techniques were used, but a large degree of expertise and human intervention was required
8
Score Matrix Architecture Profiles very similar to scoring matrix –Protein or nucleotide aligns to profile position –New profile created with every iteration Profiles created in turn i used in turn i+1 Gap costs may be position-specific with profiles. How position specific protein score matrices draw their power –Improved estimation of the probabilities with which amino acids occur at various pattern positions –Relatively precise definition of the boundaries of important motifs Every matrix constructed has a length exactly the same as the original query sequence Profiles very similar to scoring matrix –Protein or nucleotide aligns to profile position –New profile created with every iteration Profiles created in turn i used in turn i+1 Gap costs may be position-specific with profiles. How position specific protein score matrices draw their power –Improved estimation of the probabilities with which amino acids occur at various pattern positions –Relatively precise definition of the boundaries of important motifs Every matrix constructed has a length exactly the same as the original query sequence
9
Multiple Alignment Construction & Sequence Weights All database sequences whose aligned E-value is below a specific threshold are added to the query Any row (or column) which is >= 98% identical to a previously added alignment is kept out of the profile –Allows for better searching on later iterations Poor restrictions could lead to large scale profile sequence insertion Sequences are given different weights depending on evolutionary importance All database sequences whose aligned E-value is below a specific threshold are added to the query Any row (or column) which is >= 98% identical to a previously added alignment is kept out of the profile –Allows for better searching on later iterations Poor restrictions could lead to large scale profile sequence insertion Sequences are given different weights depending on evolutionary importance
10
PSI BLAST Overview Start off with query and initial score matrix (BLOSUM 62) –Homologs are found using BLAST (align DB to query) –E-Value is used as criteria for sequence insertion into profile A profile(p1) is constructed from the passing sequences and score matrix –Once again search for homologs using BLAST(align DB to profile) –Once again use E-Value as criteria for insertion into profile A profile(p2) is constructed from the approved sequences and score matirx. Start off with query and initial score matrix (BLOSUM 62) –Homologs are found using BLAST (align DB to query) –E-Value is used as criteria for sequence insertion into profile A profile(p1) is constructed from the passing sequences and score matrix –Once again search for homologs using BLAST(align DB to profile) –Once again use E-Value as criteria for insertion into profile A profile(p2) is constructed from the approved sequences and score matirx.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.