Sequence comparison: Dynamic programming

Slides:



Advertisements
Similar presentations
Global Sequence Alignment by Dynamic Programming.
Advertisements

Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Dynamic Programming Nithya Tarek. Dynamic Programming Dynamic programming solves problems by combining the solutions to sub problems. Paradigms: Divide.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington
Sequence comparison: Introduction and motivation Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Pairwise Sequence Alignment
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
S. Maarschalkerweerd & A. Tjhang1 Probability Theory and Basic Alignment of String Sequences Chapter
1-month Practical Course Genome Analysis (Integrative Bioinformatics & Genomics) Lecture 3: Pair-wise alignment Centre for Integrative Bioinformatics VU.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Sequence Alignment Oct 9, 2002 Joon Lee Genomics & Computational Biology.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Protein Sequence Comparison Patrice Koehl
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Sequence comparison: Local alignment
Solving System of Linear Equations. 1. Diagonal Form of a System of Equations 2. Elementary Row Operations 3. Elementary Row Operation 1 4. Elementary.
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment.
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
Sequence Alignment Algorithms Morten Nielsen Department of systems biology, DTU.
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
We want to calculate the score for the yellow box. The final score that we fill in the yellow box will be the SUM of two other scores, we’ll call them.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
What is Matrix Multiplication? Matrix multiplication is the process of multiplying two matrices together to get another matrix. It differs from scalar.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
An Introduction to Matrix Algebra Math 2240 Appalachian State University Dr. Ginn.
Pairwise sequence comparison
Sequence comparison: Local alignment
Sequence comparison: Traceback and local alignment
GENOME 559: Introduction to statistical and computational genomics
Sequence Alignment 11/24/2018.
Sequence comparison: Dynamic programming
Pairwise sequence Alignment.
Intro to Alignment Algorithms: Global and Local
Dynamic Programming 1/15/2019 8:22 PM Dynamic Programming.
Sequence comparison: Local alignment
Sequence comparison: Traceback
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Find the Best Alignment For These Two Sequences
Pairwise Alignment Global & local alignment
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
Bioinformatics Algorithms and Data Structures
Dynamic Programming Finds the Best Score and the Corresponding Alignment O Alignment: Start in lower right corner and work backwards:
Sequence comparison: Introduction and motivation
Presentation transcript:

Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas http://faculty.washington.edu/jht/GS559_2012/

Sequence comparison overview Problem: Find the “best” alignment between a query sequence and a target sequence. To solve this problem, we need a method for scoring alignments, and an algorithm for finding the alignment with the best score. The alignment score is calculated using a substitution matrix gap penalties. The algorithm for finding the best alignment is dynamic programming (DP).

A simple alignment problem. Problem: find the best pairwise alignment of GAATC and CATAC. Use a linear gap penalty of -4. Use the following substitution matrix: A C G T 10 -5

How many possibilities? GAATC CATAC GAAT-C C-ATAC -GAAT-C C-A-TAC GAATC- CA-TAC GAAT-C CA-TAC GA-ATC CATA-C How many different possible alignments of two sequences of length n exist?

How many possibilities? GAATC CATAC GAAT-C C-ATAC -GAAT-C C-A-TAC GAATC- CA-TAC GAAT-C CA-TAC GA-ATC CATA-C How many different alignments of two sequences of length n exist? 5 2.5x102 10 1.8x105 20 1.4x1011 30 1.2x1017 40 1.1x1023 FYI for two sequences of length m and n, possible alignments number: 2n choose n - the binomial coefficient

DP matrix 5 G A T C GA CA j 0 1 2 3 etc. i 1 2 3 4 5 10 -5 5 The value at (i,j) is the score of the best alignment of the first i characters of one sequence versus the first j characters of the other sequence. GA CA init. row and column j 0 1 2 3 etc. G A T C i 1 2 3 4 5

A C G T 10 -5 DP matrix GAA CA- G A T C 5 1 Moving horizontally in the matrix introduces a gap in the sequence along the left edge.

A C G T 10 -5 GA- CAT DP matrix G A T C 5 1 Moving vertically in the matrix introduces a gap in the sequence along the top edge.

Moving diagonally in the matrix aligns two residues C G T 10 -5 GAA CAT DP matrix G A T C 5 Moving diagonally in the matrix aligns two residues

Initialization G A T C Start at top left and move progressively A C G 10 -5 Initialization Start at top left and move progressively G A T C

A C G T 10 -5 G - Introducing a gap G A T C -4

A C G T 10 -5 - C Introducing a gap G A T C -4

Complete first row and column G T 10 -5 Complete first row and column ----- CATAC G A T C -4 -8 -12 -16 -20

Three ways to get to i=1, j=1 C G T 10 -5 Three ways to get to i=1, j=1 G- -C j 0 1 2 3 etc. G A T C -4 -8 i 1 2 3 4 5

Three ways to get to i=1, j=1 C G T 10 -5 -G C- j 0 1 2 3 etc. G A T C -4 -8 i 1 2 3 4 5

Three ways to get to i=1, j=1 C G T 10 -5 G C j 0 1 2 3 etc. G A T C -5 i 1 2 3 4 5

Accept the highest scoring of the three 10 -5 Accept the highest scoring of the three G A T C -4 -8 -12 -16 -20 -5 Then simply repeat the same rule progressively across the matrix

A C G T 10 -5 DP matrix G A T C -4 -8 -12 -16 -20 -5 ?

DP matrix G A T C -4 -8 -12 -16 -20 -5 ? -G CA G- CA --G CA- -4 -9 -12 10 -5 -G CA G- CA --G CA- DP matrix -4 -9 -12 G A T C -4 -8 -12 -16 -20 -5 ? -4 -4

DP matrix G A T C -4 -8 -12 -16 -20 -5 -G CA G- CA --G CA- -4 -9 -12 10 -5 -G CA G- CA --G CA- DP matrix -4 -9 -12 G A T C -4 -8 -12 -16 -20 -5 -4 -4

A C G T 10 -5 DP matrix G A T C -4 -8 -12 -16 -20 -5 ?

A C G T 10 -5 DP matrix G A T C -4 -8 -12 -16 -20 -5

A C G T 10 -5 DP matrix G A T C -4 -8 -12 -16 -20 -5 ?

What is the alignment associated with this entry? 10 -5 Traceback G A T C -4 -8 -12 -16 -20 -5 -9 5 1 2 -2 What is the alignment associated with this entry? (just follow the arrows back - this is called the traceback)

DP matrix G A T C -4 -8 -12 -16 -20 -5 -9 5 1 2 -2 -G-A CATA A C G T 10 -5 DP matrix G A T C -4 -8 -12 -16 -20 -5 -9 5 1 2 -2 -G-A CATA

Continue and find the optimal global alignment, and its score. 10 -5 DP matrix G A T C -4 -8 -12 -16 -20 -5 -9 5 1 2 -2 ? Continue and find the optimal global alignment, and its score.

A C G T 10 -5 Best alignment starts at bottom right and follows traceback arrows to top left G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17

One best traceback GA-ATC CATA-C G A T C -4 -8 -12 -16 -20 -5 -9 -13 10 -5 GA-ATC CATA-C One best traceback G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17

Another best traceback G T 10 -5 GAAT-C -CATAC Another best traceback G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17

GAAT-C -CATAC GA-ATC CATA-C G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 10 -5 GAAT-C -CATAC GA-ATC CATA-C G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17

Multiple solutions GA-ATC CATA-C When a program returns a single sequence alignment, it may not be the only best alignment but it is guaranteed to be one of them. In our example, all of the alignments at the left have equal scores. GAAT-C CA-TAC GAAT-C C-ATAC GAAT-C -CATAC

DP in equation form Align sequence x and y. F is the DP matrix; s is the substitution matrix; d is the linear gap penalty.

DP equation graphically take the max of these three

Dynamic programming Yes, it’s a weird name. DP is closely related to recursion and to mathematical induction. We can prove that the resulting score is optimal.

What you should know Scoring a pairwise alignment requires a substitution matrix and gap penalties. Dynamic programming (DP) is an efficient algorithm for finding an optimal alignment. Entry (i,j) in the DP matrix stores the score of the best-scoring alignment up to that position. DP iteratively fills in the matrix using a simple mathematical rule. How to use DP to find an alignment.

Practice problem: find a best pairwise alignment of GAATC and AATTC 10 -5 G A T C d = -4