Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-

Slides:



Advertisements
Similar presentations
Global Sequence Alignment by Dynamic Programming.
Advertisements

Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
DNA sequences alignment measurement
Lecture 8 Alignment of pairs of sequence Local and global alignment
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Sequence Similarity Searching Class 4 March 2010.
©CMBI 2005 Sequence Alignment In phylogeny one wants to line up residues that came from a common ancestor. For information transfer one wants to line up.
Heuristic alignment algorithms and cost matrices
1-month Practical Course Genome Analysis (Integrative Bioinformatics & Genomics) Lecture 3: Pair-wise alignment Centre for Integrative Bioinformatics VU.
Sequencing and Sequence Alignment
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
DNA Alignment. Dynamic Programming R. Bellman ~ 1950.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Introduction To Bioinformatics Tutorial 2. Local Alignment Tutorial 2.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Sequence Alignment III CIS 667 February 10, 2004.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Pairwise alignment Computational Genomics and Proteomics.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
LCS and Extensions to Global and Local Alignment Dr. Nancy Warter-Perez June 26, 2003.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Local alignment
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Developing Pairwise Sequence Alignment Algorithms
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Pairwise & Multiple sequence alignments
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
Arun Goja MITCON BIOPHARMA
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
DNA, RNA and protein are an alien language
Sequence Alignment. Assignment Read Lesk, Problem: Given two sequences R and S of length n, how many alignments of R and S are possible? If you.
Introduction to Dynamic Programming
Sequence comparison: Dynamic programming
Sequence comparison: Local alignment
Bioinformatics: The pair-wise alignment problem
Biology 162 Computational Genetics Todd Vision Fall Aug 2004
Global, local, repeated and overlaping
Pairwise sequence Alignment.
Pairwise Sequence Alignment
Lecture 14 Algorithm Analysis
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Find the Best Alignment For These Two Sequences
Pairwise Alignment Global & local alignment
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
Dynamic Programming Finds the Best Score and the Corresponding Alignment O Alignment: Start in lower right corner and work backwards:
Sequence alignment BI420 – Introduction to Bioinformatics
Presentation transcript:

Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY- MHH-ALQRRTVWVNAY Blosum Score = 2 (end = -6) Score = 79 (gap = -6) An alignment must have equal length aligned sequences – So, we must add gaps at the start and the ends Combinatorially difficult problem to find best indel solution

Gap So far we ignored gaps A gap corresponds to an insertion or a deletion of a residue A conventional wisdom dictates that the penalty for a gap must be several times greater than the penalty for a mutation. That is because a gap/extra residue – Interrupts the entire polymer chain – In DNA shifts the reading frame

Gap Penalties Gaps are penalised – Write w x to indicate the penalty for a gap of length x – For example, each gap scores -6, so w x = -6*x One common scheme is – Score -12 for opening a gap – And -2 for every subsequent gap – i.e., w x = *(x-1) Start and end gap penalties often set to zero – But this can leave a doubt About evolutionary conclusions

Dot Matrix Representations (Dotplots) To help visualise best alignments Plot where each pair is the same, then draw best line MNALSQLN N A L M S Q N H MNALSQLN N A L M S Q N H

Getting Alignments from Dotplot Paths MNALSQLN N A L M S Q N H Indicates that M matches with a gap Indicates that L matches with a gap Stage 1: – Align middle – Use triangles To indicate gaps NAL-SQLN NALMSQ-N Stage 2: – Sort the ends out MNAL-SQLN- -NALMSQ-NH

Dotplots for Real Proteins Need a way to automatically find the best path(s)

Dynamic Programming Approach BLAST is quick – But not guaranteed to find best alignment – Gapped blast has indels, but no guarantee… Dynamic Programming: – Also known as: Needleman-Wunsch Algorithm Can use it to draw the Dotplot paths – From that we can get the alignment Mathematically guaranteed – To find the best scoring alignment – Given a substitution scheme (scoring scheme, e.g., BLOSUM) – And given a gap penalty

The Needleman-Wunsch algorithm A smart way to reduce the massive number of possibilities that need to be considered, yet still guarantees that the best solution will be found (Saul Needleman and Christian Wunsch, 1970). The basic idea is to build up the best alignment by using optimal alignments of smaller subsequences. The Needleman-Wunsch algorithm is an example of dynamic programming, a discipline invented by Richard Bellman (an American mathematician) in 1953!

Dynamic Programming A divide-and-conquer strategy: – Break the problem into smaller subproblems. – Solve the smaller problems optimally. – Use the sub-problem solutions to construct an optimal solution for the original problem. Dynamic programming can be applied only to problems exhibiting the properties of overlapping subproblems. Examples include – Trevelling salesman problem – Finding the best chess move

Overview of Needleman-Wunsch Four Stages 1. Initialise a matrix for the sequences 2. Fill in the entries of that matrix (call these S i,j ) At the same time drawing arrows in the matrix 3. Use the arrows to find the best scoring path(s) 4. Interpret the paths as alignments as before Illustrate with: MNALQM & NALMSQA

Stage 1 Initialising the Matrix Draw the grid Put in increasing gap penalties Then put in BLOSUM scores

Stage 2 Putting Scores and Arrows in Put the score in Draw the arrow

Mathematically, we are calculating: Where: – S i,j is the matrix entry at (i,j) [the one we want to fill in] S i-1,j-1 is above and to the left of this – s(a i,b j ) is the BLOSUM score for the i-th residue from the horizontal sequence and j-th residue from the vertical sequance (i.e., just the scores we have written in brackets)

This diagram might help:

Fill in the next row and column

A Close up View

Continue filling in the S i,j entries

Stage 3 Finding the best path Scores S i,j in the matrix – Are the BLOSUM scores for alignments However! – We must take into account final gap penalties Look down the final column and along the final row – Find the highest scoring number – Remembering to take off the gap penalty the correct number of times

Finding the best path

So, the best path is:

Stage 4: Generating the Alignment Firstly, draw the Dotplot

Secondly, Generate the Alignment Using the technique previously mentioned – This path gives us an alignment with three gaps M N A L - - Q M - N A L M S Q A S = = 0 Should check that you get the same score – As on the diagram

Other Alignments MNALQ-M- MNALQM-- -NALMSQA (score=-4) -NALMSQA (score=-5)

Smith - Waterman Alterations To make the algorithm find best local alignments Adjustments only to the scoring scheme for S i,j : – The scoring scheme must include: Some negative scores for mismatches – When S i,j becomes negative, set it to zero So local paths are not penalised for earlier bad routes To find best local alignment – Find highest scoring matrix position (anywhere) – And work backwards until a zero is reached

Local and Global Alignments Needleman & Wunsch best global alignments Smith & Waterman best local alignments For illustration purposes only – Calculations done slightly differently (don’t worry)