Download presentation
Presentation is loading. Please wait.
1
Lecture #7: FASTA & LFASTA
BIOINF 2051 Fall 2002 2/22/2019
2
Dot Plot Alpha chain vs. Beta chain of Human Hemoglobin
3
FASTA and LFASTA Pearson and Lipman (1988)
FASTA – program that calculates the initial and optimal similarity scores between two sequences LFASTA – program for detecting local similarities – finds multiple alignments between smaller portions of two sequences
4
The FASTA algorithm Four steps: Identify regions of similarity:
Using the ktup parameter which specifies # consecutive identities required in a match 10 best diagonal regions found based on #matches and distance between matches Rescore regions and identify best initial regions PAM250 or other scoring matrix used for rescoring the 10 diagonal regions identified in step 1 to allow for conservative replacements and runs of identities shorter than ktup For each the best diagonal regions, identify “initial region” that is best scoring subregion
5
The FASTA algorithm Optimally join initial regions with scores > T
Given: location of initial regions, scores, gap penalty Calculate an optimal alignment of initial regions as a combination of compatible regions with maximal score Use resulting score to rank the library sequences Selectivity degradation limited by using initial regions that score greater than some threshold T Align the highest scoring library sequences using modification of global and local alignment algorithms Considers all possible alignments of the query and library sequence that falls within a band centered around the highest scoring initial region
6
LFASTA FASTA – reports only one highest scoring alignment between two sequences LFASTA – local sequence comparison tool that can identify multiple local alignments between 2 sequences Optimal algorithms for sensitive local sequence comparison are computationally intensive in terms of time and memory
7
LFASTA vs. FASTA LFASTA uses same first 2 steps for finding initial regions as FASTA, except: Instead of saving 10 initial regions, LFASTA saves all diagonal regions with similarity scores > some threshold Construction of optimized alignments Instead of focusing on a single region, LFASTA computes a local alignment for each initial region Also, apart from band around initial region, LFASTA considers potential sequence alignments for some distance before and after the initial region.
8
Self-comparison of myosin heavy chain from C. elegans
See plot from a local similarity self-comparison of the myosin heavy chain (NBRF code MWKW) using the PAM 250 matrix The amino-terminal half of the molecule forms a large globular head without any periodic structure The symmetrical parallel lines along the C-terminal half correspond to the 28-residue repeat responsible for the a-helical coiled-coil structure of the rod segment
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.