Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture #7: FASTA & LFASTA

Similar presentations


Presentation on theme: "Lecture #7: FASTA & LFASTA"— Presentation transcript:

1 Lecture #7: FASTA & LFASTA
BIOINF 2051 Fall 2002 2/22/2019

2 Dot Plot Alpha chain vs. Beta chain of Human Hemoglobin

3 FASTA and LFASTA Pearson and Lipman (1988)
FASTA – program that calculates the initial and optimal similarity scores between two sequences LFASTA – program for detecting local similarities – finds multiple alignments between smaller portions of two sequences

4 The FASTA algorithm Four steps: Identify regions of similarity:
Using the ktup parameter which specifies # consecutive identities required in a match 10 best diagonal regions found based on #matches and distance between matches Rescore regions and identify best initial regions PAM250 or other scoring matrix used for rescoring the 10 diagonal regions identified in step 1 to allow for conservative replacements and runs of identities shorter than ktup For each the best diagonal regions, identify “initial region” that is best scoring subregion

5 The FASTA algorithm Optimally join initial regions with scores > T
Given: location of initial regions, scores, gap penalty Calculate an optimal alignment of initial regions as a combination of compatible regions with maximal score Use resulting score to rank the library sequences Selectivity degradation limited by using initial regions that score greater than some threshold T Align the highest scoring library sequences using modification of global and local alignment algorithms Considers all possible alignments of the query and library sequence that falls within a band centered around the highest scoring initial region

6 LFASTA FASTA – reports only one highest scoring alignment between two sequences LFASTA – local sequence comparison tool that can identify multiple local alignments between 2 sequences Optimal algorithms for sensitive local sequence comparison are computationally intensive in terms of time and memory

7 LFASTA vs. FASTA LFASTA uses same first 2 steps for finding initial regions as FASTA, except: Instead of saving 10 initial regions, LFASTA saves all diagonal regions with similarity scores > some threshold Construction of optimized alignments Instead of focusing on a single region, LFASTA computes a local alignment for each initial region Also, apart from band around initial region, LFASTA considers potential sequence alignments for some distance before and after the initial region.

8 Self-comparison of myosin heavy chain from C. elegans
See plot from a local similarity self-comparison of the myosin heavy chain (NBRF code MWKW) using the PAM 250 matrix The amino-terminal half of the molecule forms a large globular head without any periodic structure The symmetrical parallel lines along the C-terminal half correspond to the 28-residue repeat responsible for the a-helical coiled-coil structure of the rod segment


Download ppt "Lecture #7: FASTA & LFASTA"

Similar presentations


Ads by Google