Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Lecture on Numerical Analysis Dr.-Ing. Michael Dumbser
Dynamic Programming Introduction Prof. Muhammad Saeed.
MULTIPLICATION EQUATIONS 1. SOLVE FOR X 3. WHAT EVER YOU DO TO ONE SIDE YOU HAVE TO DO TO THE OTHER 2. DIVIDE BY THE NUMBER IN FRONT OF THE VARIABLE.
Addition Facts
Polygon Scan Conversion – 11b
Copyright © Cengage Learning. All rights reserved.
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Solve by Substitution: Isolate one variable in an equation
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Bioinformatics (4) Sequence Analysis. figure NA1: Common & simple DNA2: the last 5000 generations Sequence Similarity and Homology.
Addition 1’s to 20.
25 seconds left…...
Global Sequence Alignment by Dynamic Programming.
Chapter 4 Systems of Linear Equations; Matrices
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
EXAMPLE 4 Multiply matrices Multiply 2 –3 –1 8. SOLUTION The matrices are both 2 2, so their product is defined. Use the following steps to find.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington
Sequence comparison: Introduction and motivation Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Pairwise Sequence Alignment
S. Maarschalkerweerd & A. Tjhang1 Probability Theory and Basic Alignment of String Sequences Chapter
Space Efficient Alignment Algorithms and Affine Gap Penalties
1-month Practical Course Genome Analysis (Integrative Bioinformatics & Genomics) Lecture 3: Pair-wise alignment Centre for Integrative Bioinformatics VU.
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Sequence Alignment Oct 9, 2002 Joon Lee Genomics & Computational Biology.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Protein Sequence Comparison Patrice Koehl
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Sequence comparison: Local alignment
Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Developing Pairwise Sequence Alignment Algorithms
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
Sequence Alignment Algorithms Morten Nielsen Department of systems biology, DTU.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Chapter 3 Computational Molecular Biology Michael Smith
We want to calculate the score for the yellow box. The final score that we fill in the yellow box will be the SUM of two other scores, we’ll call them.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Pairwise sequence comparison
Sequence comparison: Dynamic programming
Sequence comparison: Local alignment
Sequence comparison: Traceback and local alignment
GENOME 559: Introduction to statistical and computational genomics
Sequence Alignment 11/24/2018.
Sequence comparison: Dynamic programming
Pairwise sequence Alignment.
Sequence Alignment with Traceback on Reconfigurable Hardware
Sequence comparison: Local alignment
Sequence comparison: Traceback
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Find the Best Alignment For These Two Sequences
Pairwise Alignment Global & local alignment
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
Dynamic Programming Finds the Best Score and the Corresponding Alignment O Alignment: Start in lower right corner and work backwards:
Sequence comparison: Introduction and motivation
Presentation transcript:

Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence comparison overview Problem: Find the “best” alignment between a query sequence and a target sequence. To solve this problem, we need –a method for scoring alignments, and –an algorithm for finding the alignment with the best score. The alignment score is calculated using –a substitution matrix –gap penalties. The algorithm for finding the best alignment is dynamic programming (DP).

A simple alignment problem. Problem: find the best pairwise alignment of GAATC and CATAC. Use a linear gap penalty of -4. Use the following substitution matrix: ACGT A C G T 0 10

How many possibilities? How many different possible alignments of two sequences of length n exist? GAATC CATAC GAATC- CA-TAC GAAT-C C-ATAC GAAT-C CA-TAC -GAAT-C C-A-TAC GA-ATC CATA-C

How many possibilities? How many different alignments of two sequences of length n exist? GAATC CATAC GAATC- CA-TAC GAAT-C C-ATAC GAAT-C CA-TAC -GAAT-C C-A-TAC GA-ATC CATA-C 5 2.5x x x x x10 23 FYI for two sequences of length m and n, possible alignments number:

DP matrix GAATC C A T A C ACGT A C G T 0 10 i j etc. 5 The value in position ( i,j ) is the score of the best alignment of the first i positions of one sequence versus the first j positions of the other sequence. GA CA init. row and column

DP matrix GAATC C A 5 1 T A C Moving horizontally in the matrix introduces a gap in the sequence along the left edge. GAA CA- ACGT A C G T 0 10

DP matrix GAATC C A 5 T 1 A C Moving vertically in the matrix introduces a gap in the sequence along the top edge. GA- CAT ACGT A C G T 0 10

DP matrix GAATC C A 5 T 0 A C Moving diagonally in the matrix aligns two residues GAA CAT ACGT A C G T 0 10

Initialization GAATC 0 C A T A C ACGT A C G T 0 10 Start at top left and move progressively

Introducing a gap GAATC 0-4 C A T A C G-G- ACGT A C G T 0 10

GAATC 0-4 C A T A C -C-C ACGT A C G T 0 10 Introducing a gap

Complete first row and column GAATC C-4 A-8 T-12 A-16 C-20 ACGT A C G T CATAC

Three ways to get to i=1, j=1 GAATC 0-4 C-8 A T A C ACGT A C G T 0 10 G- -C j etc. i

GAATC 0 C-4-8 A T A C ACGT A C G T G C- Three ways to get to i=1, j=1 j etc. i

GAATC 0 C-5 A T A C GCGC ACGT A C G T 0 10 Three ways to get to i=1, j=1 i j etc.

Accept the highest scoring of the three GAATC C-4-5 A-8 T-12 A-16 C-20 ACGT A C G T 0 10 Then simply repeat the same rule progressively across the matrix

DP matrix GAATC C-4-5 A-8? T-12 A-16 C-20 ACGT A C G T 0 10

DP matrix GAATC C-4-5 A-8 ? T-12 A-16 C G CA G- CA --G CA ACGT A C G T 0 10

DP matrix GAATC C-4-5 A-8 -4 T-12 A-16 C G CA G- CA --G CA ACGT A C G T 0 10

DP matrix GAATC C-4-5 A-8-4 T-12? A-16? C-20? ACGT A C G T 0 10

DP matrix GAATC C-4-5 A-8-4 T-12-8 A C ACGT A C G T 0 10

DP matrix GAATC C-4-5? A-8-4? T-12-8? A-16-12? C-20-16? ACGT A C G T 0 10

DP matrix GAATC C A-8-45 T A C What is the alignment associated with this entry? ACGT A C G T 0 10 (just follow the arrows back - this is called the traceback)

DP matrix GAATC C A-8-45 T A C ACGT A C G T G-A CATA

GAATC C A-8-45 T A C ? ACGT A C G T 0 10 Continue and find the optimal global alignment, and its score. DP matrix

GAATC C A T A C ACGT A C G T 0 10 Best alignment starts at lower right and follows traceback arrows to top left

GAATC C A T A C ACGT A C G T 0 10 GA-ATC CATA-C One best traceback

GAATC C A T A C ACGT A C G T 0 10 GAAT-C -CATAC Another best traceback

GAATC C A T A C ACGT A C G T 0 10 GA-ATC CATA-C GAAT-C -CATAC

Multiple solutions When a program returns a single sequence alignment, it may not be the only best alignment but it is guaranteed to be one of them. In our example, all of the alignments at the left have equal scores. GA-ATC CATA-C GAAT-C CA-TAC GAAT-C C-ATAC GAAT-C -CATAC

DP in equation form Align sequence x and y. F is the DP matrix; s is the substitution matrix; d is the linear gap penalty.

DP equation graphically take the max of these three

Dynamic programming Yes, it’s a weird name. DP is closely related to recursion and to mathematical induction. We can prove that the resulting score is optimal.

Summary Scoring a pairwise alignment requires a substitution matrix and gap penalties. Dynamic programming is an efficient algorithm for finding an optimal alignment. Entry ( i,j ) in the DP matrix stores the score of the best-scoring alignment up to that position. DP iteratively fills in the matrix using a simple mathematical rule.

ACGT A10-50 C G T 0 10 GAATC 0 A A T T C Practice problem: find a best pairwise alignment of GAATC and AATTC d = -4