Sequence comparison: Traceback and local alignment

Slides:



Advertisements
Similar presentations
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Advertisements

Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Sequence allignement 1 Chitta Baral. Sequences and Sequence allignment Two main kind of sequences –Sequence of base pairs in DNA molecules (A+T+C+G)*
Bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington
Sequence comparison: Introduction and motivation Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Pairwise Sequence Alignment
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
 If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i,
Sequence Similarity Searching Class 4 March 2010.
©CMBI 2005 Sequence Alignment In phylogeny one wants to line up residues that came from a common ancestor. For information transfer one wants to line up.
Reminder -Structure of a genome Human 3x10 9 bp Genome: ~30,000 genes ~200,000 exons ~23 Mb coding ~15 Mb noncoding pre-mRNA transcription splicing translation.
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Similar Sequence Similar Function Charles Yan Spring 2006.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Protein Sequence Comparison Patrice Koehl
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Local alignment
Motif search and discovery Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Presented by Liu Qi Pairwise Sequence Alignment. Presented By Liu Qi Why align sequences? Functional predictions based on identifying homologues. Assumes:
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Pairwise sequence comparison
Sequence comparison: Dynamic programming
Welcome to Introduction to Bioinformatics
Sequence comparison: Local alignment
Sequence comparison: Significance of similarity scores
Motif p-values GENOME 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Sequence comparison: Multiple testing correction
GENOME 559: Introduction to statistical and computational genomics
Global, local, repeated and overlaping
Sequence Alignment 11/24/2018.
Sequence comparison: Dynamic programming
Pairwise sequence Alignment.
Pairwise Sequence Alignment
Sequence comparison: Local alignment
Sequence comparison: Traceback
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Find the Best Alignment For These Two Sequences
Sequence comparison: Multiple testing correction
Pairwise Alignment Global & local alignment
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
Sequence comparison: Significance of similarity scores
Dynamic Programming Finds the Best Score and the Corresponding Alignment O Alignment: Start in lower right corner and work backwards:
False discovery rate estimation
Sequence comparison: Introduction and motivation
Sequence alignment BI420 – Introduction to Bioinformatics
Presentation transcript:

Sequence comparison: Traceback and local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble Notes from 2009: This lecture is very light on new content, especially because I had basically explained traceback in the last class. On the other hand, few students complained.

Things people liked I was surprised how poorly I understood, until the formula made it clear. Like the inclusion of mathematical form of DP. Great explanation of DP. x2 Going through DP matrix by hand was helpful. x3 Going through Python programs by hand was also helpful. Like the review at the beginning of the lecture. Thank you for the summary slides with operations and methods. x2 I like the format of the class. Examples / practice in class was really good. In-class exercises were very do-able. Loved everything. Level is challenging but not overwhelming. Already feel I’ve learned a lot. Python part of the class is good. Really, really like practice problems. I like how you take time to address concerns at the start of each class. Slides were very clear.

Pacing The lecture was a bit fast but kept me engaged. Great pace. Speed was perfect. Pacing was good for Python problem. Everything seemed to go at a good pace. Pace of the class was good. Much better on timing this time. Pace was a bit quick for Python part of class. I found the first part on DP pretty fast moving, slightly confusing. I thought the programming section went a bit fast. Be sure each student understands the first problem before moving to the second.

Suggestions and problems Can you include the solutions to the problems in the notes? Solutions are in the online versions of the slides. The alignment section is a bit unclear. We will go over this again today. Black on blue hard to read. Agreed. This had to do with “upgrading” PowerPoint. I have now fixed the problem. Text a bit small. Mixed feelings about second part of class. Having laptops out helps us follow along, but it can be distracting, especially when I get stuck on my terrible syntax and the class moves on. I would like more practice in class. I retain more that way. 2nd exercise could be more clear about dealing with inputs of different lengths. Maybe more detailed about the algorithm. Need more time to digest it after the class.

DP matrix G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17

Three legal moves A diagonal move aligns a character from the left sequence with a character from the top sequence. A vertical move introduces a gap in the sequence along the top edge. A horizontal move introduces a gap in the sequence along the left edge.

DP matrix GA-ATC CATA-C G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17

DP matrix GAAT-C CA-TAC G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17

DP matrix GAAT-C C-ATAC G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17

DP matrix GAAT-C -CATAC G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17

Multiple solutions GA-ATC CATA-C When a program returns a sequence alignment, it may not be the only best alignment. GAAT-C CA-TAC GAAT-C C-ATAC GAAT-C -CATAC

DP in equation form Align sequence x and y. F is the DP matrix; s is the substitution matrix; d is the linear gap penalty.

A simple example A G C Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G C

A simple example A G C Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G C

A simple example A G -5 -10 -15 C Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G -5 -10 -15 C

A simple example A G -5 -10 -15 2 -3 -8 -1 C -6 Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G -5 -10 -15 2 -3 -8 -1 C -6

Traceback Start from the lower right corner and trace back to the upper left. Each arrow introduces one character at the end of each aligned sequence. A horizontal move puts a gap in the left sequence. A vertical move puts a gap in the top sequence. A diagonal move uses one character from each sequence.

A simple example A G -5 2 -3 -1 C -6 Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. Start from the lower right corner and trace back to the upper left. Each arrow introduces one character at the end of each aligned sequence. A horizontal move puts a gap in the left sequence. A vertical move puts a gap in the top sequence. A diagonal move uses one character from each sequence. A G -5 2 -3 -1 C -6

A simple example A G -5 2 -3 -1 C -6 Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. Start from the lower right corner and trace back to the upper left. Each arrow introduces one character at the end of each aligned sequence. A horizontal move puts a gap in the left sequence. A vertical move puts a gap in the top sequence. A diagonal move uses one character from each sequence. A G -5 2 -3 -1 C -6 AAG- AAG- -AGC A-GC

Traceback problem #1 G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17 Write down the alignment corresponding to the circled score.

Solution #1 GA CA G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17 Write down the alignment corresponding to the circled score.

Traceback problem #2 G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17 Write down three alignments corresponding to the circled score.

Solution #2 GAATC CA--- G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17 Write down three alignments corresponding to the circled score.

Solution #2 GAATC C-A-- GAATC CA--- G A T C -4 -8 -12 -16 -20 -5 -9 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17 Write down three alignments corresponding to the circled score.

Solution #2 GAATC -CA-- GAATC C-A-- GAATC CA--- G A T C -4 -8 -12 -16 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17 Write down three alignments corresponding to the circled score.

Local alignment A single-domain protein may be homologous to a region within a multi-domain protein. Usually, an alignment that spans the complete length of both sequences is not required.

BLAST allows local alignments Global alignment Local alignment

Global alignment DP Align sequence x and y. F is the DP matrix; s is the substitution matrix; d is the linear gap penalty.

Local alignment DP Align sequence x and y. F is the DP matrix; s is the substitution matrix; d is the linear gap penalty.

Local DP in equation form

A simple example Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G C

A simple example Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G C

A simple example Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G C 2 -5 -5

A simple example Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G 2 ? C

A simple example Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G 2 ? C

A simple example Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G 2 4 C

Local alignment Two differences with respect to global alignment: No score is negative. Traceback begins at the highest score in the matrix and continues until you reach 0. Global alignment algorithm: Needleman-Wunsch. Local alignment algorithm: Smith-Waterman.

A simple example Find the optimal local alignment of AAG and AGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G 2 4 C AG

Local alignment Find the optimal local alignment of AAG and GAAGGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G C

Local alignment Find the optimal local alignment of AAG and GAAGGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G 2 4 6 C

Local alignment AAG A G 2 4 6 C Find the optimal local alignment of AAG and GAAGGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G 2 4 6 C AAG

Summary Local alignment finds the best match between subsequences. Smith-Waterman local alignment algorithm: No score is negative. Trace back from the largest score in the matrix.