Sequence Alignment III CIS 667 February 10, 2004.

Slides:



Advertisements
Similar presentations
Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
Advertisements

Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
BLAST Sequence alignment, E-value & Extreme value distribution.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Measuring the degree of similarity: PAM and blosum Matrix
Sequence Alignment.
Introduction to Bioinformatics
Local alignments Seq X: Seq Y:. Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment.
Heuristic alignment algorithms and cost matrices
Sequence similarity (II). Schedule Mar 23midterm assignedalignment Mar 30midterm dueprot struct/drugs April 6teams assignedprot struct/drugs April 13RNA.
We continue where we stopped last week: FASTA – BLAST
1 1. BLAST (Basic Local Alignment Search Tool) Heuristic Only parts of protein are frequently subject to mutations. For example, active sites (that one.
BNFO 240 Usman Roshan. Last time Traceback for alignment How to select the gap penalties? Benchmark alignments –Structural superimposition –BAliBASE.
Introduction to bioinformatics
Sequence Analysis Tools
Sequence similarity.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Similar Sequence Similar Function Charles Yan Spring 2006.
Heuristic Approaches for Sequence Alignments
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence Alignments Revisited
Multiple Sequence Alignments
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Pairwise alignment Computational Genomics and Proteomics.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Chapter 5 Multiple Sequence Alignment.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
Protein Sequence Alignment and Database Searching.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Lecture 15 Algorithm Analysis
Construction of Substitution matrices
Step 3: Tools Database Searching
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2010.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
BIOINFORMATICS Ayesha M. Khan Spring Lec-6.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Pairwise Sequence Alignment and Database Searching
Sequence Based Analysis Tutorial
Lecture 14 Algorithm Analysis
BLAST Slides adapted & edited from a set by
Sequence alignment, E-value & Extreme value distribution
BLAST Slides adapted & edited from a set by
Presentation transcript:

Sequence Alignment III CIS 667 February 10, 2004

Extensions to the Basic Algorithm We have seen that the basic dynamic programming algorithm can be used to solve  Global alignment  Semi-global alignment  Local alignment We can extend the algorithm to more accurately reflect the cost of gap penalties

General Gap Penalty Functions Gaps are caused by mutations  It is more likely to have a single large gap than several smaller ones  Gap penalties should reflect that  Let w(k) denote a gap penalty function (the cost of a gap of k spaces)  We have been using w(k) = bk - a linear function

General Gap Penalty Functions We can modify the basic algorithm to compute a score with a general gap penalty w (i.e. any function) The modified algorithm is slower, however - O(n 3 )  The new algorithm scoring scheme is no longer additive  The space required is also larger

Affine Gap Penalty Functions Can we do better than O(n 3 ) and still have a reasonable function?  Yes. We need to have w(k)  kw(1)  An affine function - w(k) = h + gk with w(0) = 0 and h, g > 0 works  Think of h as the cost of opening a gap and g as the cost of extending a gap  We can develop an algorithm with time complexity O(n 2 )

Gap Penalties - Overview Imagine we want to align: CAGT CCAAGGTTCAGT Bad alignment: C-A-G-T----- CCAAGGTTCAGT Better alignment: CAGT CCAAGGTTCAGT Gap cost with linear gap penalty (-2) -16 Gap cost with affine gap penalty (h = -2, k = -1)

Multiple Sequence Alignment Once a protein sequence is newly determined, an important goal is to assign possible functions to it  First search for similar sequences in the DNA and protein sequence databases  If more than one similar sequence is found, the next step is to multiply align all of the sequences

Multiple Sequence Alignment Multiple alignments are key starting point for  Protein secondary structure prediction  Residue accessibility  Function Also provide the basis for the most sensitive sequence searching algorithms

Multiple Sequence Alignment A multiple sequence alignment is simply an alignment that contains more than two sequences MPQILLL MLR-LL- MK-ILLL MPPVLIL

Multiple Sequence Alignment We must decide how to score a multiple alignment One possibility is the sum-of-pairs function  Simply add up the pairwise scores of all pairs in a column to get the score of the column  Note that in multiple sequence alignment we may have two spaces in a column - the score of (-,-) then is usually set to 0

Multiple Sequence Alignment A straightforward dynamic programming approach to multiple sequence alignment results in an exponential algorithm  Heuristics can be used to reduce the complexity in most cases

Multiple Sequence Alignment Automatic alignment programs such as CLUSTAL W can be used to produce multiple alignments The PSI-BLAST program uses multiple sequence alignments to make more sensitive searches of protein sequence databases than is possible with a single sequence

PAM Matrices When comparing protein sequences, we need a more complex scoring scheme  A mismatch with two amino acids with similar biochemical properties should score higher than one with two dissimilar ones  Evolution is more likely to result in a similar amino acid (e.g. same size, both hydrophobic, etc.) replacing another

PAM Matrices PAM - Point Accepted Mutations or Percent of Accepted Mutations  1-PAM matrix reflects an amount of evolution producing on average one mutation per hundred amino acids  250-PAM matrix is suitable for comparing sequences that are 250 units of evolution apart  Works well for long, weakly similar sequences  Small values good for short, similar sequences

PAM-250 Matrix

BLOSUM Matrices Another widely used set of matrices is BLOSUM - Blocks Substitution Matrix  BLOSUM is often better for highly divergent sequences  PAM better for more highly similar sequences

BLAST BLAST - Basic Local Alignment Search Tool is a family of sequence similarity tools  Can be used to search sequence databases worldwide  Can be run locally, or via web-based interface on a server  Given a query sequence, BLAST returns all matches above a user-defined threshold

BLAST BLAST uses a heuristic technique  Compile list of high-scoring words (use PAM matrix to score words w characters long)  Search for matches in the database (use a hash table to speed up search) - call a match a seed  Extend the seeds in both directions until the score of the extension falls below a limit