Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Slides:

Advertisements

Similar presentations

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST

Advertisements

Lecture on Numerical Analysis Dr.-Ing. Michael Dumbser

Dynamic Programming Introduction Prof. Muhammad Saeed.

MULTIPLICATION EQUATIONS 1. SOLVE FOR X 3. WHAT EVER YOU DO TO ONE SIDE YOU HAVE TO DO TO THE OTHER 2. DIVIDE BY THE NUMBER IN FRONT OF THE VARIABLE.

Polygon Scan Conversion – 11b

Copyright © Cengage Learning. All rights reserved.

Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Solve by Substitution: Isolate one variable in an equation

Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-

Bioinformatics (4) Sequence Analysis. figure NA1: Common & simple DNA2: the last 5000 generations Sequence Similarity and Homology.

Addition 1’s to 20.

25 seconds left…...

Global Sequence Alignment by Dynamic Programming.

Chapter 4 Systems of Linear Equations; Matrices

1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.

EXAMPLE 4 Multiply matrices Multiply 2 –3 –1 8. SOLUTION The matrices are both 2 2, so their product is defined. Use the following steps to find.

Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.

Bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Sequence comparison: Introduction and motivation Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Pairwise Sequence Alignment

S. Maarschalkerweerd & A. Tjhang1 Probability Theory and Basic Alignment of String Sequences Chapter

Space Efficient Alignment Algorithms and Affine Gap Penalties

1-month Practical Course Genome Analysis (Integrative Bioinformatics & Genomics) Lecture 3: Pair-wise alignment Centre for Integrative Bioinformatics VU.

C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.

Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.

Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.

Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.

Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.

Sequence Alignment Oct 9, 2002 Joon Lee Genomics & Computational Biology.

Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.

Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.

Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.

Protein Sequence Comparison Patrice Koehl

Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.

Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.

Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Sequence comparison: Local alignment

Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.

Developing Pairwise Sequence Alignment Algorithms

Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.

Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.

Sequence Alignment Algorithms Morten Nielsen Department of systems biology, DTU.

Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.

Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.

Chapter 3 Computational Molecular Biology Michael Smith

We want to calculate the score for the yellow box. The final score that we fill in the yellow box will be the SUM of two other scores, we’ll call them.

Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-

Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.

Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.

Sequence Alignment.

V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.

Pairwise sequence comparison

Sequence comparison: Dynamic programming

Sequence comparison: Local alignment

Sequence comparison: Traceback and local alignment

GENOME 559: Introduction to statistical and computational genomics

Sequence Alignment 11/24/2018.

Sequence comparison: Dynamic programming

Pairwise sequence Alignment.

Sequence Alignment with Traceback on Reconfigurable Hardware

Sequence comparison: Local alignment

Sequence comparison: Traceback

BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment

Find the Best Alignment For These Two Sequences

Pairwise Alignment Global & local alignment

Sequence Alignment Algorithms Morten Nielsen BioSys, DTU

Dynamic Programming Finds the Best Score and the Corresponding Alignment O Alignment: Start in lower right corner and work backwards:

Sequence comparison: Introduction and motivation

Presentation transcript:

Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence comparison overview Problem: Find the “best” alignment between a query sequence and a target sequence. To solve this problem, we need –a method for scoring alignments, and –an algorithm for finding the alignment with the best score. The alignment score is calculated using –a substitution matrix –gap penalties. The algorithm for finding the best alignment is dynamic programming (DP).

A simple alignment problem. Problem: find the best pairwise alignment of GAATC and CATAC. Use a linear gap penalty of -4. Use the following substitution matrix: ACGT A C G T 0 10

How many possibilities? How many different possible alignments of two sequences of length n exist? GAATC CATAC GAATC- CA-TAC GAAT-C C-ATAC GAAT-C CA-TAC -GAAT-C C-A-TAC GA-ATC CATA-C

How many possibilities? How many different alignments of two sequences of length n exist? GAATC CATAC GAATC- CA-TAC GAAT-C C-ATAC GAAT-C CA-TAC -GAAT-C C-A-TAC GA-ATC CATA-C 5 2.5x x x x x10 23 FYI for two sequences of length m and n, possible alignments number:

DP matrix GAATC C A T A C ACGT A C G T 0 10 i j etc. 5 The value in position ( i,j ) is the score of the best alignment of the first i positions of one sequence versus the first j positions of the other sequence. GA CA init. row and column

DP matrix GAATC C A 5 1 T A C Moving horizontally in the matrix introduces a gap in the sequence along the left edge. GAA CA- ACGT A C G T 0 10

DP matrix GAATC C A 5 T 1 A C Moving vertically in the matrix introduces a gap in the sequence along the top edge. GA- CAT ACGT A C G T 0 10

DP matrix GAATC C A 5 T 0 A C Moving diagonally in the matrix aligns two residues GAA CAT ACGT A C G T 0 10

Initialization GAATC 0 C A T A C ACGT A C G T 0 10 Start at top left and move progressively

Introducing a gap GAATC 0-4 C A T A C G-G- ACGT A C G T 0 10

GAATC 0-4 C A T A C -C-C ACGT A C G T 0 10 Introducing a gap

Complete first row and column GAATC C-4 A-8 T-12 A-16 C-20 ACGT A C G T CATAC

Three ways to get to i=1, j=1 GAATC 0-4 C-8 A T A C ACGT A C G T 0 10 G- -C j etc. i

GAATC 0 C-4-8 A T A C ACGT A C G T G C- Three ways to get to i=1, j=1 j etc. i

GAATC 0 C-5 A T A C GCGC ACGT A C G T 0 10 Three ways to get to i=1, j=1 i j etc.

Accept the highest scoring of the three GAATC C-4-5 A-8 T-12 A-16 C-20 ACGT A C G T 0 10 Then simply repeat the same rule progressively across the matrix

DP matrix GAATC C-4-5 A-8? T-12 A-16 C-20 ACGT A C G T 0 10

DP matrix GAATC C-4-5 A-8 ? T-12 A-16 C G CA G- CA --G CA ACGT A C G T 0 10

DP matrix GAATC C-4-5 A-8 -4 T-12 A-16 C G CA G- CA --G CA ACGT A C G T 0 10

DP matrix GAATC C-4-5 A-8-4 T-12? A-16? C-20? ACGT A C G T 0 10

DP matrix GAATC C-4-5 A-8-4 T-12-8 A C ACGT A C G T 0 10

DP matrix GAATC C-4-5? A-8-4? T-12-8? A-16-12? C-20-16? ACGT A C G T 0 10

DP matrix GAATC C A-8-45 T A C What is the alignment associated with this entry? ACGT A C G T 0 10 (just follow the arrows back - this is called the traceback)

DP matrix GAATC C A-8-45 T A C ACGT A C G T G-A CATA

GAATC C A-8-45 T A C ? ACGT A C G T 0 10 Continue and find the optimal global alignment, and its score. DP matrix

GAATC C A T A C ACGT A C G T 0 10 Best alignment starts at lower right and follows traceback arrows to top left

GAATC C A T A C ACGT A C G T 0 10 GA-ATC CATA-C One best traceback

GAATC C A T A C ACGT A C G T 0 10 GAAT-C -CATAC Another best traceback

GAATC C A T A C ACGT A C G T 0 10 GA-ATC CATA-C GAAT-C -CATAC

Multiple solutions When a program returns a single sequence alignment, it may not be the only best alignment but it is guaranteed to be one of them. In our example, all of the alignments at the left have equal scores. GA-ATC CATA-C GAAT-C CA-TAC GAAT-C C-ATAC GAAT-C -CATAC

DP in equation form Align sequence x and y. F is the DP matrix; s is the substitution matrix; d is the linear gap penalty.

DP equation graphically take the max of these three

Dynamic programming Yes, it’s a weird name. DP is closely related to recursion and to mathematical induction. We can prove that the resulting score is optimal.

Summary Scoring a pairwise alignment requires a substitution matrix and gap penalties. Dynamic programming is an efficient algorithm for finding an optimal alignment. Entry ( i,j ) in the DP matrix stores the score of the best-scoring alignment up to that position. DP iteratively fills in the matrix using a simple mathematical rule.

ACGT A10-50 C G T 0 10 GAATC 0 A A T T C Practice problem: find a best pairwise alignment of GAATC and AATTC d = -4