Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.

Slides:



Advertisements
Similar presentations
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Advertisements

Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Sequence allignement 1 Chitta Baral. Sequences and Sequence allignment Two main kind of sequences –Sequence of base pairs in DNA molecules (A+T+C+G)*
BLAST, PSI-BLAST and position- specific scoring matrices Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington
Sequence comparison: Introduction and motivation Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Pairwise Sequence Alignment
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
 If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i,
6/11/2015 © Bud Mishra, 2001 L7-1 Lecture #7: Local Alignment Computational Biology Lecture #7: Local Alignment Bud Mishra Professor of Computer Science.
Reminder -Structure of a genome Human 3x10 9 bp Genome: ~30,000 genes ~200,000 exons ~23 Mb coding ~15 Mb noncoding pre-mRNA transcription splicing translation.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Introduction to Sequence Alignment PENCE Bioinformatics Research Group University of Alberta May 2001.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Protein Sequence Comparison Patrice Koehl
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Local alignment
Genome Sciences 373 Genome Informatics Quiz Section 4 April 21, 2015.
Motif search and discovery Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Developing Pairwise Sequence Alignment Algorithms
Presented by Liu Qi Pairwise Sequence Alignment. Presented By Liu Qi Why align sequences? Functional predictions based on identifying homologues. Assumes:
Sequence Alignment Algorithms Morten Nielsen Department of systems biology, DTU.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Multiple testing correction
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
Motif search Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Inferring phylogenetic trees: Maximum likelihood methods Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
We want to calculate the score for the yellow box. The final score that we fill in the yellow box will be the SUM of two other scores, we’ll call them.
Statistical significance of alignment scores Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Genome Revolution: COMPSCI 004G 8.1 BLAST l What is BLAST? What is it good for?  Basic.
Learning to Align: a Statistical Approach
Pairwise sequence comparison
Sequence comparison: Dynamic programming
Sequence comparison: Local alignment
Sequence comparison: Significance of similarity scores
Sequence comparison: Traceback and local alignment
GENOME 559: Introduction to statistical and computational genomics
Global, local, repeated and overlaping
Sequence comparison: Dynamic programming
Pairwise sequence Alignment.
Pairwise Sequence Alignment
Sequence comparison: Local alignment
Sequence comparison: Traceback
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Find the Best Alignment For These Two Sequences
Pairwise Alignment Global & local alignment
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
Sequence comparison: Significance of similarity scores
Dynamic Programming Finds the Best Score and the Corresponding Alignment O Alignment: Start in lower right corner and work backwards:
False discovery rate estimation
Sequence comparison: Introduction and motivation
Presentation transcript:

Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington

Outline Responses from last class Sequence alignment – Motivation – Scoring alignments Python

One-minute responses Thank you for doing the one-minute responses and the revisions. Liked that you let us solve the DP. Liked going to lab in second half of lecture. More slowly with Python programming. Please explain more about string handling in Python, especially sys.argv. sys.argv is now more clear. Need more practical problems using sys. I know how to compute the DP score but not how to get the alignment. I still do not get how sequence alignment works. Can we go over DP again? Please don’t give us a test because our essay phase has begun already. Please limit the number of questions in class. Moving too slow with previous work and too fast with current work. I understood about 95% of the lecture. Can you give us some Python documentation or some interesting web site to improve and learn?

Revision What two things are needed to score a pairwise alignment? – A substitution matrix and a gap penalty. What does entry (i,j) in the DP matrix store? – The score of the best-scoring alignment up to those positions. What are the three valid moves when filling in the DP matrix? – Horizontal and vertical, corresponding to gaps. Diagonal, corresponding to a substitution.

A small example ACGT A C 2 -5 G -72 T AAG A G C Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5.

A simple example ACGT A C 2 -5 G -72 T AAG A-5 G-10 C-15 Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5.

A simple example ACGT A C 2 -5 G -72 T AAG A G-10-3 C Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5.

Traceback Start from the lower right corner and trace back to the upper left. Each arrow introduces one character at the end of each aligned sequence. A horizontal move puts a gap in the left sequence. A vertical move puts a gap in the top sequence. A diagonal move uses one character from each sequence.

Start from the lower right corner and trace back to the upper left. Each arrow introduces one character at the end of each aligned sequence. A horizontal move puts a gap in the left sequence. A vertical move puts a gap in the top sequence. A diagonal move uses one character from each sequence. A simple example AAG 0-5 A2-3 G C-6 Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5.

Start from the lower right corner and trace back to the upper left. Each arrow introduces one character at the end of each aligned sequence. A horizontal move puts a gap in the left sequence. A vertical move puts a gap in the top sequence. A diagonal move uses one character from each sequence. A simple example AAG 0-5 A2-3 G C-6 Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. AAG- -AGC A-GC

DP matrix GAATC C A T A C GA-ATC CATA-C

DP matrix GAATC C A T A C GAAT-C CA-TAC

DP matrix GAATC C A T A C GAAT-C C-ATAC

DP matrix GAATC C A T A C GAAT-C -CATAC

Multiple solutions When a program returns a sequence alignment, it may not be the only best alignment. GA-ATC CATA-C GAAT-C CA-TAC GAAT-C C-ATAC GAAT-C -CATAC

Traceback problem #1 GAATC C A T A C Write down the alignment corresponding to the circled score.

Solution #1 GAATC C A T A C Write down the alignment corresponding to the circled score. GA CA

Traceback problem #2 GAATC C A T A C Write down three alignments corresponding to the circled score.

GAATC C A T A C GAATC CA--- Solution #2 Write down three alignments corresponding to the circled score.

GAATC C A T A C GAATC CA--- GAATC C-A-- Solution #2 Write down three alignments corresponding to the circled score.

Solution #2 GAATC C A T A C GAATC CA--- GAATC C-A-- GAATC -CA-- Write down three alignments corresponding to the circled score.

Local alignment A protein may be homologous to a region within a second protein. Usually, an alignment that spans the complete length of both sequences is not required.

BLAST allows local alignments Global alignment Local alignment

Global alignment DP Align sequence x and y. F is the DP matrix; s is the substitution matrix; d is the linear gap penalty.

Local alignment DP Align sequence x and y. F is the DP matrix; s is the substitution matrix; d is the linear gap penalty.

Local DP in equation form 0

A simple example ACGT A C 2 -5 G -72 T AAG A G C Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=-5. 0

A simple example ACGT A C 2 -5 G -72 T AAG 0000 A0 G0 C0 Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=-5. 0

A simple example ACGT A C 2 -5 G -72 T AAG 0000 A0 G0 C0 Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=

A simple example ACGT A C 2 -5 G -72 T AAG 0000 A02 G0? C0? Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=-5. 0

A simple example ACGT A C 2 -5 G -72 T AAG 0000 A02?? G00?? C00?? Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=-5. 0

A simple example ACGT A C 2 -5 G -72 T AAG 0000 A0220 G0004 C0000 Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=-5. 0

Local alignment Two differences with respect to global alignment: – No score is negative. – Traceback begins at the highest score in the matrix and continues until you reach 0. Global alignment algorithm: Needleman- Wunsch. Local alignment algorithm: Smith-Waterman.

A simple example ACGT A C 2 -5 G -72 T AAG 0000 A0220 G0004 C0000 Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=-5. 0 AG

Local alignment ACGT A C 2 -5 G -72 T AAG 0000 G0 A0 A0 G0 G0 C0 Find the optimal local alignment of AAG and GAAGGC. Use a gap penalty of d=-5. 0

Local alignment ACGT A C 2 -5 G -72 T AAG 0000 G0002 A0220 A0240 G0006 G0002 C0000 Find the optimal local alignment of AAG and GAAGGC. Use a gap penalty of d=-5. 0

Local alignment ACGT A C 2 -5 G -72 T AAG 0000 G0002 A0220 A0240 G0006 G0002 C0000 Find the optimal local alignment of AAG and GAAGGC. Use a gap penalty of d=-5. 0 AAG

Summary Local alignment finds the best match between subsequences. Smith-Waterman local alignment algorithm: – No score is negative. – Trace back from the largest score in the matrix.

Sample problem #1 Given: – Two letters – A substitution matrix written as three columns (letter, letter, value) A C 0 A D -2 Return: – The value associated with the two letters You must store the substitution matrix in memory. You must account for the fact that the order of the letters doesn’t matter.

Solution outline Read the substitution matrix into a dictionary. – Each line of the file has three values. A C 4 A D -2 – Each line is stored in a dictionary with the first two entries as a tuple key and the last entry as the value. substitionMatrix[(letter1, letter2)] = value

Sample problem #2 Given: – A file containing two sequences of equal length (one per line) – A substitution matrix written as three columns (letter, letter, value) Return: – The score of the ungapped alignment between the sequences Test using input files from class web page. – Solutions: 1 = 69, 2 = 104, 3 = 153

Solution outline Store the substitution matrix in memory, as before. Check to be sure the sequences are the same length, and report an error if they are not. Use a for loop to traverse both sequences at once.

One-minute response At the end of each class Write for about one minute. Provide feedback about the class. Was part of the lecture unclear? What did you like about the class? Do you have unanswered questions? I will begin the next class by responding to the one-minute responses