Lecture #7: FASTA & LFASTA BIOINF 2051 Fall 2002 2/22/2019
Dot Plot Alpha chain vs. Beta chain of Human Hemoglobin
FASTA and LFASTA Pearson and Lipman (1988) FASTA – program that calculates the initial and optimal similarity scores between two sequences LFASTA – program for detecting local similarities – finds multiple alignments between smaller portions of two sequences
The FASTA algorithm Four steps: Identify regions of similarity: Using the ktup parameter which specifies # consecutive identities required in a match 10 best diagonal regions found based on #matches and distance between matches Rescore regions and identify best initial regions PAM250 or other scoring matrix used for rescoring the 10 diagonal regions identified in step 1 to allow for conservative replacements and runs of identities shorter than ktup For each the best diagonal regions, identify “initial region” that is best scoring subregion
The FASTA algorithm Optimally join initial regions with scores > T Given: location of initial regions, scores, gap penalty Calculate an optimal alignment of initial regions as a combination of compatible regions with maximal score Use resulting score to rank the library sequences Selectivity degradation limited by using initial regions that score greater than some threshold T Align the highest scoring library sequences using modification of global and local alignment algorithms Considers all possible alignments of the query and library sequence that falls within a band centered around the highest scoring initial region
LFASTA FASTA – reports only one highest scoring alignment between two sequences LFASTA – local sequence comparison tool that can identify multiple local alignments between 2 sequences Optimal algorithms for sensitive local sequence comparison are computationally intensive in terms of time and memory
LFASTA vs. FASTA LFASTA uses same first 2 steps for finding initial regions as FASTA, except: Instead of saving 10 initial regions, LFASTA saves all diagonal regions with similarity scores > some threshold Construction of optimized alignments Instead of focusing on a single region, LFASTA computes a local alignment for each initial region Also, apart from band around initial region, LFASTA considers potential sequence alignments for some distance before and after the initial region.
Self-comparison of myosin heavy chain from C. elegans See plot from a local similarity self-comparison of the myosin heavy chain (NBRF code MWKW) using the PAM 250 matrix The amino-terminal half of the molecule forms a large globular head without any periodic structure The symmetrical parallel lines along the C-terminal half correspond to the 28-residue repeat responsible for the a-helical coiled-coil structure of the rod segment