Sequence comparison: Traceback

Slides:



Advertisements
Similar presentations
Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Chapter 4 Systems of Linear Equations; Matrices
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Sequence allignement 1 Chitta Baral. Sequences and Sequence allignment Two main kind of sequences –Sequence of base pairs in DNA molecules (A+T+C+G)*
Table of Contents Solving Linear Systems of Equations - Triangular Form Consider the following system of equations... The system is easily solved by starting.
Bioinformatics Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington
1 ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES.
Sequence comparison: Introduction and motivation Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Pairwise Sequence Alignment
Universiteit Utrecht BLAST CD Session 2 | Wednesday 4 May 2005 Bram Raats Lee Provoost.
Sequence Similarity Searching Class 4 March 2010.
Heuristic alignment algorithms and cost matrices
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 12: Sequence Analysis (2) Martin Russell.
Reminder -Structure of a genome Human 3x10 9 bp Genome: ~30,000 genes ~200,000 exons ~23 Mb coding ~15 Mb noncoding pre-mRNA transcription splicing translation.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
1 1. BLAST (Basic Local Alignment Search Tool) Heuristic Only parts of protein are frequently subject to mutations. For example, active sites (that one.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Similar Sequence Similar Function Charles Yan Spring 2006.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
1 Theory I Algorithm Design and Analysis (11 - Edit distance and approximate string matching) Prof. Dr. Th. Ottmann.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Sequence comparison: Local alignment
Table of Contents Solving Systems of Linear Equations - Gaussian Elimination The method of solving a linear system of equations by Gaussian Elimination.
Motif search and discovery Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Presented by Liu Qi Pairwise Sequence Alignment. Presented By Liu Qi Why align sequences? Functional predictions based on identifying homologues. Assumes:
Sequence Alignment.
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall.
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Construction of Substitution Matrices
Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Chapter 3 Computational Molecular Biology Michael Smith
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Construction of Substitution matrices
Day 7 Carlow Bioinformatics Aligning sequences. What is an alignment? CENTRAL concept in bioinformatics Easy if straight-forward, similar seqs –THISTHESAME.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Pairwise sequence comparison
Sequence comparison: Dynamic programming
Sequence comparison: Local alignment
Introduction to bioinformatics 2007
Sequence comparison: Significance of similarity scores
Sequence comparison: Traceback and local alignment
Motif p-values GENOME 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Sequence comparison: Multiple testing correction
GENOME 559: Introduction to statistical and computational genomics
Sequence comparison: Dynamic programming
Pairwise sequence Alignment.
Pairwise Sequence Alignment
While loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Sequence comparison: Local alignment
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Find the Best Alignment For These Two Sequences
Sequence comparison: Multiple testing correction
Pairwise Alignment Global & local alignment
While loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Sequence comparison: Significance of similarity scores
Sequence comparison: Introduction and motivation
Presentation transcript:

Sequence comparison: Traceback Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble Notes from 2009: This lecture is very light on new content, especially because I had basically explained traceback in the last class. On the other hand, few students complained.

Things people liked Pacing was good. (x4) Theory behind the first lecture was very clear. I like how you started from the beginning and waited for everyone to be on the same page before proceeding. It’s helpful that the class is interactive. Overall, the class was very informative. Easy to understand instructions for people who never programmed before. I liked that there was an opportunity for us to ask questions while we worked. I thought the lecture was clear and easy to follow. I appreciated the introduction to alignment algorithms. Particularly excited to be able to do some sequence analysis.

Suggestions and problems The slide on BLOSUM62 was a bit confusing, particularly the “statistics” part (not explained, but some background would be nice). Would have been useful to spend time talking about the terminal and what it really is. Some of the concepts around what a directory is may not be clear to everyone. In Windows it looks like you need to do print(pi) instead of print pi. This is a difference between python versions, unrelated to Windows. Getting Python to work on a PC is a bit confusing. In particular, quitting Python is different under Windows. I was unclear to use Notepad to print hello world in Windows. Only issue is making sure which editor to have have for ??? (unclear word). Didn’t totally figure out how to get to the working directory. Not clear how to find Python in Windows.

Other questions Is Jupyter acceptable or not recommended? It is acceptable, though in some cases you may be asked to write stand-alone programs. Have the DNA matrices been set, or is there something similar to BLOSUM for nucleotides? There are only two values required for DNA (cost for transition versus transversion). There are default values used by BLAST, for example. I am not sure where those values come from.

DP matrix G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17

Three legal moves A diagonal move aligns a character from the left sequence with a character from the top sequence. A vertical move introduces a gap in the sequence along the top edge. A horizontal move introduces a gap in the sequence along the left edge.

DP matrix GA-ATC CATA-C G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17

DP matrix GAAT-C CA-TAC G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17

DP matrix GAAT-C C-ATAC G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17

DP matrix GAAT-C -CATAC G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17

Multiple solutions GA-ATC CATA-C When a program returns a sequence alignment, it may not be the only best alignment. GAAT-C CA-TAC GAAT-C C-ATAC GAAT-C -CATAC

DP in equation form Align sequence x and y. F is the DP matrix; s is the substitution matrix; d is the linear gap penalty.

A simple example A G C Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G C

A simple example A G C Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G C

A simple example A G -5 -10 -15 C Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G -5 -10 -15 C

A simple example A G -5 -10 -15 2 -3 -8 -1 C -6 Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. A C G T 2 -7 -5 A G -5 -10 -15 2 -3 -8 -1 C -6

Traceback Start from the lower right corner and trace back to the upper left. Each arrow introduces one character at the end of each aligned sequence. A horizontal move puts a gap in the left sequence. A vertical move puts a gap in the top sequence. A diagonal move uses one character from each sequence.

A simple example A G -5 2 -3 -1 C -6 Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. Start from the lower right corner and trace back to the upper left. Each arrow introduces one character at the end of each aligned sequence. A horizontal move puts a gap in the left sequence. A vertical move puts a gap in the top sequence. A diagonal move uses one character from each sequence. A G -5 2 -3 -1 C -6

A simple example A G -5 2 -3 -1 C -6 Find the optimal alignment of AAG and AGC. Use a gap penalty of d=-5. Start from the lower right corner and trace back to the upper left. Each arrow introduces one character at the end of each aligned sequence. A horizontal move puts a gap in the left sequence. A vertical move puts a gap in the top sequence. A diagonal move uses one character from each sequence. A G -5 2 -3 -1 C -6 AAG- AAG- -AGC A-GC

Traceback problem #1 G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17 Write down the alignment corresponding to the circled score.

Solution #1 GA CA G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17 Write down the alignment corresponding to the circled score.

Traceback problem #2 G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17 Write down three alignments corresponding to the circled score.

Solution #2 GAATC CA--- G A T C -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17 Write down three alignments corresponding to the circled score.

Solution #2 GAATC C-A-- GAATC CA--- G A T C -4 -8 -12 -16 -20 -5 -9 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17 Write down three alignments corresponding to the circled score.

Solution #2 GAATC -CA-- GAATC C-A-- GAATC CA--- G A T C -4 -8 -12 -16 -4 -8 -12 -16 -20 -5 -9 -13 -6 5 1 -3 -7 11 7 2 6 -2 17 Write down three alignments corresponding to the circled score.