Lecture #7: FASTA & LFASTA

Slides:



Advertisements
Similar presentations
Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
Advertisements

Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
1 CAP5510 – Bioinformatics Database Searches for Biological Sequences or Imperfect Alignments Tamer Kahveci CISE Department University of Florida.
Sequence Alignment Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan
Lecture 8 Alignment of pairs of sequence Local and global alignment
Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment.
Sequence Similarity Searching Class 4 March 2010.
Heuristic alignment algorithms and cost matrices
Sequence alignment SEQ1: VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKK VADALTNAVAHVDDPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHA SLDKFLASVSTVLTSKYR.
We continue where we stopped last week: FASTA – BLAST
Overview of sequence database searching techniques and multiple alignment May 1, 2001 Quiz on May 3-Dynamic programming- Needleman-Wunsch method Learning.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
1 Improved tools for biological sequence comparison Author: WILLIAM R. PEARSON, DAVID J. LIPMAN Publisher: Proc. Natl. Acad. Sci. USA 1988 Presenter: Hsin-Mao.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Similar Sequence Similar Function Charles Yan Spring 2006.
Heuristic Approaches for Sequence Alignments
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Rationale for searching sequence databases June 22, 2005 Writing Topics due today Writing projects due July 8 Learning objectives- Review of Smith-Waterman.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
Practical algorithms in Sequence Alignment Sushmita Roy BMI/CS 576 Sep 17 th, 2013.
Speed Up DNA Sequence Database Search and Alignment by Methods of DSP
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
An Introduction to Bioinformatics
Protein Sequence Alignment and Database Searching.
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Computational Biology, Part 9 Efficient database searching methods Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Sequence alignment SEQ1: VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKK VADALTNAVAHVDDPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHA SLDKFLASVSTVLTSKYR.
Construction of Substitution Matrices
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Rationale for searching sequence databases June 25, 2003 Writing projects due July 11 Learning objectives- FASTA and BLAST programs. Psi-Blast Workshop-Use.
FASTA and BLAST Chitta Baral. FASTA : Basic Steps Step 1: –Set a word size. (usually 6 for DNA and 2 for proteins) –Make a plot. –Find the long diagonals.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Lecture 7 CS5661 Heuristic PSA “Words” to describe dot-matrix analysis Approaches –FASTA –BLAST Searching databases for sequence similarities –PSA –Alternative.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2015.
Sequence Alignment.
Construction of Substitution matrices
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2010.
Heuristic Alignment Algorithms Hongchao Li Jan
Heuristic Methods for Sequence Database Searching BMI/CS 776 Mark Craven February 2002.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Homology Search Tools Kun-Mao Chao (趙坤茂)
Homology Search Tools Kun-Mao Chao (趙坤茂)
Sequence comparison: Local alignment
Homology Search Tools Kun-Mao Chao (趙坤茂)
Fast Sequence Alignments
Pairwise Sequence Alignment
Pairwise Alignment Global & local alignment
Basic Local Alignment Search Tool
Homology Search Tools Kun-Mao Chao (趙坤茂)
Presentation transcript:

Lecture #7: FASTA & LFASTA BIOINF 2051 Fall 2002 2/22/2019

Dot Plot Alpha chain vs. Beta chain of Human Hemoglobin

FASTA and LFASTA Pearson and Lipman (1988) FASTA – program that calculates the initial and optimal similarity scores between two sequences LFASTA – program for detecting local similarities – finds multiple alignments between smaller portions of two sequences

The FASTA algorithm Four steps: Identify regions of similarity: Using the ktup parameter which specifies # consecutive identities required in a match 10 best diagonal regions found based on #matches and distance between matches Rescore regions and identify best initial regions PAM250 or other scoring matrix used for rescoring the 10 diagonal regions identified in step 1 to allow for conservative replacements and runs of identities shorter than ktup For each the best diagonal regions, identify “initial region” that is best scoring subregion

The FASTA algorithm Optimally join initial regions with scores > T Given: location of initial regions, scores, gap penalty Calculate an optimal alignment of initial regions as a combination of compatible regions with maximal score Use resulting score to rank the library sequences Selectivity degradation limited by using initial regions that score greater than some threshold T Align the highest scoring library sequences using modification of global and local alignment algorithms Considers all possible alignments of the query and library sequence that falls within a band centered around the highest scoring initial region

LFASTA FASTA – reports only one highest scoring alignment between two sequences LFASTA – local sequence comparison tool that can identify multiple local alignments between 2 sequences Optimal algorithms for sensitive local sequence comparison are computationally intensive in terms of time and memory

LFASTA vs. FASTA LFASTA uses same first 2 steps for finding initial regions as FASTA, except: Instead of saving 10 initial regions, LFASTA saves all diagonal regions with similarity scores > some threshold Construction of optimized alignments Instead of focusing on a single region, LFASTA computes a local alignment for each initial region Also, apart from band around initial region, LFASTA considers potential sequence alignments for some distance before and after the initial region.

Self-comparison of myosin heavy chain from C. elegans See plot from a local similarity self-comparison of the myosin heavy chain (NBRF code MWKW) using the PAM 250 matrix The amino-terminal half of the molecule forms a large globular head without any periodic structure The symmetrical parallel lines along the C-terminal half correspond to the 28-residue repeat responsible for the a-helical coiled-coil structure of the rod segment