Introduction to Sequence Alignment. Why Align Sequences? Find homology within the same species Find clues to gene function Practical issues in experiments.

Slides:



Advertisements
Similar presentations
Sequence Alignment I Lecture #2
Advertisements

Computational Biology, Part 7 Similarity Functions and Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Solusi DP Menggunakan Software Pertemuan 24 : (Off Class) Mata kuliah:K0164-Pemrograman Matematika Tahun:2008.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Pairwise Sequence Alignment
. Sequence Alignment I Lecture #2 This class has been edited from Nir Friedman’s lecture which is available at Changes made by.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
Sequence Alignment Arthur W. Chou Tunghai University Fall 2005.
Sequence Alignment Tutorial #2
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
 If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i,
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Alignment Tutorial #2
Sequence Similarity Searching Class 4 March 2010.
©CMBI 2005 Sequence Alignment In phylogeny one wants to line up residues that came from a common ancestor. For information transfer one wants to line up.
Sequence Alignment Algorithms in Computational Biology Spring 2006 Edited by Itai Sharon Most slides have been created and edited by Nir Friedman, Dan.
Sequencing and Sequence Alignment
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Sequence Alignment II CIS 667 Spring Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Algorithms Dr. Nancy Warter-Perez June 19, May 20, 2003 Developing Pairwise Sequence Alignment Algorithms2 Outline Programming workshop 2 solutions.
Developing Sequence Alignment Algorithms in C++ Dr. Nancy Warter-Perez May 21, 2002.
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Class 2: Basic Sequence Alignment
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
. Sequence Alignment I Lecture #2 This class has been edited from Nir Friedman’s lecture. Changes made by Dan Geiger, then Shlomo Moran. Background Readings:
LCS and Extensions to Global and Local Alignment Dr. Nancy Warter-Perez June 26, 2003.
1 Introduction to Bioinformatics 2 Introduction to Bioinformatics. LECTURE 3: SEQUENCE ALIGNMENT * Chapter 3: All in the family.
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
. Sequence Alignment. Sequences Much of bioinformatics involves sequences u DNA sequences u RNA sequences u Protein sequences We can think of these sequences.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Alignment, Part I Vasileios Hatzivassiloglou University of Texas at Dallas.
Chapter 3 Computational Molecular Biology Michael Smith
Sequence Alignment Xuhua Xia
1 Sequence Alignment Input: two sequences over the same alphabet Output: an alignment of the two sequences Example: u GCGCATGGATTGAGCGA u TGCGCCATTGATGACCA.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
. Sequence Alignment. Sequences Much of bioinformatics involves sequences u DNA sequences u RNA sequences u Protein sequences We can think of these sequences.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
. Sequence Alignment Author:- Aya Osama Supervision:- Dr.Noha khalifa.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Sequence Alignment Dilvan Moreira (based on Prof. André Carvalho presentation)
Bioinformatics Overview
Introduction to Dynamic Programming
Welcome to Introduction to Bioinformatics
Biology 162 Computational Genetics Todd Vision Fall Aug 2004
Sequence Alignment ..
Predict Protein Sequence by Fuzzy-Association Rules
Global, local, repeated and overlaping
Pairwise sequence Alignment.
#7 Still more DP, Scoring Matrices
Pairwise Sequence Alignment
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Pairwise Alignment Global & local alignment
Sequence alignment BI420 – Introduction to Bioinformatics
Basic Local Alignment Search Tool (BLAST)
Sequence Alignment Tutorial #2
Presentation transcript:

Introduction to Sequence Alignment

Why Align Sequences? Find homology within the same species Find clues to gene function Practical issues in experiments Find homology in other species Gather info for an evolutionary model Gene families

The Most Visual Way of Aligning Two Sequences

Dot Matrix Alignment CACTAGGC AGCTAGGA Gibbs & McIntyre (1970)

Dot Matrix Alignment

Has many variations Can be used to find sequence repeats Find self-complimentary subsequences of RNA to predict secondary structure Still used today

Alignment using Dynamic Programming

An Example GCGCATGGATTGAGCGA TGCGCCATTGATGACCA A possible alignment: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A

Alignments -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A Three elements: Perfect matches Mismatches Gaps

Choosing Alignments There are many possible alignments For example, compare: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A to GCGCATGGATTGAGCGA TGCGCC----ATTGATGACCA-- Which one is better?

Scoring Rule Example Score = (# matches) – (# mismatches) – (# gaps) x 2

Example -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A Score: (+1x13) + (-1x2) + (-2x4) = GCGCATGGATTGAGCGA TGCGCC----ATTGATGACCA-- Score: (+1x5) + (-1x6) + (-2x11) = -23

Optimal Alignment Optimal alignment is achieved at best similarity score d, thus is determined by the scoring rule

Finding the Best Alignment Score The additive form of the score allows to perform dynamic programming to find the best score efficiently Guaranteed to find the best alignment

Assume that an Optimal Score Exists d(s,t) – Optimal score for globally aligning “s” and “t”

The Idea The best alignment that ends at a given pair of bases: the best among best alignments of the sequences up to that point, plus the score for aligning the two additional bases.

Dynamic Programming Consider the best alignment score of two sequences s, t at base/residue i+1, j+1, respectively:

Dynamic Programming The best alignment must be in one of three cases: 1. Last position is ( s[i+1],t[j +1] ) 2. Last position is ( -, t[j +1] ) 3. Last position is ( s[i +1], - )

Dynamic Programming The best alignment must be in one of three cases: 1. Last position is ( s[i+1],t[j +1] ) 2. Last position is ( -, t[j +1] ) 3. Last position is ( s[i +1], - )

Dynamic Programming The best alignment must be in one of three cases: 1. Last position is ( s[i+1],t[j +1] ) 2. Last position is ( -, t[j +1] ) 3. Last position is ( s[i +1], - )

Dynamic Programming

Of course, we first need to handle the base cases in the recursion:

Dynamic Programming We fill the matrix using the recurrence rule – A G C – – A A A C –

Dynamic Programming

Conclusion: d( AAAC, AGC ) = -1

Reconstructing the Best Alignment AAAC AG-C

More than one best alignment AAAC A-GC

Complexity Space: O(mn) Time: O(mn) Filling the matrix O(mn) Backtrace O(m+n)

Needleman & Wunsch (1970) A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins J. Mol. Biol. 48:

Local Alignment We just introduced global alignment Now introduce local alignment: A local Alignment between sequence s and sequence t is an alignment with maximum similarity between a substring of s and a substring of t.

Smith and Waterman (1981) “Identification of Common Molecular Subsequences” J. Mol. Biol., 147:

Best-aligned Subsequences The best score or start over

Note: different scoring rule

Best aligned subsequences