Vineet Bafna. How can we compute the local alignment itself?

Slides:



Advertisements
Similar presentations
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Advertisements

Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Inexact Matching of Strings General Problem –Input Strings S and T –Questions How distant is S from T? How similar is S to T? Solution Technique –Dynamic.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
6/11/2015 © Bud Mishra, 2001 L7-1 Lecture #7: Local Alignment Computational Biology Lecture #7: Local Alignment Bud Mishra Professor of Computer Science.
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
Space Efficient Alignment Algorithms and Affine Gap Penalties
Sequence Alignment Algorithms in Computational Biology Spring 2006 Edited by Itai Sharon Most slides have been created and edited by Nir Friedman, Dan.
Sequencing and Sequence Alignment
Fa05CSE 182 CSE182-L4: Keyword matching. Fa05CSE 182 Backward scoring Defin S b [i,j] : Best scoring alignment of the suffixes s[i+1..n] and t[j+1..m]
Sequence Alignment.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Fa05CSE 182 L3: Blast: Keyword match basics. Fa05CSE 182 Silly Quiz TRUE or FALSE: In New York City at any moment, there are 2 people (not bald) with.
Fa05CSE 182 L3: Blast: Local Alignment and other flavors.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department.
Fa05CSE 182 CSE182-L4: Scoring matrices, Dictionary Matching.
Introduction to bioinformatics
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Sequence Alignment II CIS 667 Spring Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an.
FA05CSE182 CSE 182-L2:Blast & variants I Dynamic Programming
Introduction to Bioinformatics Algorithms Sequence Alignment.
Sequence Alignments Introduction to Bioinformatics.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Alignment II Dynamic Programming
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
FA05CSE182 CSE 182-L2:Blast & variants I Dynamic Programming
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Sequence Alignment.
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 27, 2005 ChengXiang Zhai Department of Computer Science University.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Chapter 3 Computational Molecular Biology Michael Smith
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Doug Raiford Phage class: introduction to sequence databases.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
GA for Sequence Alignment  Pair-wise alignment  Multiple string alignment.
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Calign : aligning sequences with restricted affine gap penalties Kun-Mao Chao.
The ideal approach is simultaneous alignment and tree estimation.
Sequence comparison: Local alignment
BNFO 602 Lecture 2 Usman Roshan.
Sequence Alignment 11/24/2018.
SMA5422: Special Topics in Biotechnology
Pairwise sequence Alignment.
Intro to Alignment Algorithms: Global and Local
Sequence Analysis Alan Christoffels
It is the presentation about the overview of DOT MATRIX and GAP PENALITY..
Presentation transcript:

Vineet Bafna

How can we compute the local alignment itself?

T G C A A T C A T

Gaps appear together

It is more likely for gaps to be contiguous The penalty for a gap of length l should be

What is the time taken for this? What are the values that l can take? Can we get rid of the extra Dimension?

Define D[i,j] : Score of the best alignment, given that the final column is a ‘deletion’ (s i is aligned to a gap) Define I[i,j]: Score of the best alignment, given that the final column is an insertion (t j is aligned to a gap) s[i] Optimum alignment of s[1..i-1], and t[1..j] - Optimum alignment of s[1..i], and t[1..j-1] t[j]

How much space do we need? Is the space requirement too much?

Copyright ©2004 by the National Academy of Sciences Istrail, Sorin et al. (2004) Proc. Natl. Acad. Sci. USA 101, Fig. 1. Dot-plot representation of sample assembly comparison results

Score computation For i = 1 to n For j = 1 to m

In Linear Space, we can do each row of the D.P. We need to compute the optimum path from the origin (0,0) to (m,n) (0,0) (m,n)

Score computation For i = 1 to n For j = 1 to m

At i=n/2, we know scores of all the optimal paths ending at that row. Define F[j] = S[n/2,j] One of these j is on the true path. Which one?

Let S b [i,j] be the optimal score of aligning s[i+1..n] with t[j+1..m] Boundary cases? S b [n,j]? S b [m,j]?

Let S b [i,j] be the optimal score of aligning s[i+1..n] with t[j+1..m] Define B[j] = S b [n/2,j] One of these j is on the true path. Which one?

At the optimal coordinate, j – F[j]+B[j]=S[n,m] In O(nm) time, and O(m) space, we can compute one of the coordinates on the optimum path.

Align(1..n,1..m) – For all 1<=j <= m Compute F[j]=S(n/2,j) – For all 1<=j <= m Compute B[j]=S b (n/2,j) – j* = max j {F[j]+B[j] } – X = Align(1..n/2,1..j*) – Y = Align(n/2+1..n,j*+1..m) – Return X,j*,Y

T(nm) = c.nm + T(nm/2) = O(nm) Space = O(m)

Name an early Bioinformatics Researcher (preferably dead) Name an early Bioinformatics Researcher who is a woman

We have seen that affine gap penalties help concentrate the gaps in small regions. What about substitution errors. Are all substitutions alike?

Scoring protein sequence alignments is a much more complex task than scoring DNA – Not all substitutions are equal Problem was first worked on by Pauling and collaborators In the 1970s, Margaret Dayhoff created the first similarity matrices. – “One size does not fit all” – Homologous proteins which are evolutionarily close should be scored differently than proteins that are evolutionarily distant – Different proteins might evolve at different rates and we need to normalize for that

We have seen that affine gap penalties help concentrate the gaps in small regions. What about substitution errors. Are all substitutions alike?

DNA has structure.

So far, we considered a simple match/mismatch criterion. The nucleotides can be grouped into Purines (A,G) and Pyrimidines. Nucleotide substitutions within a group (transitions) are more likely than those across a group (transversions)

Transversions are more heavily penalized than transitions.

Suppose we are searching with a mouse protein. Blast returns proteins ranked by score – Top hit is to human – Somewhere below is Drosophila – Which one will you trust? hum mus dros