Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Slides:



Advertisements
Similar presentations
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Advertisements

Computational Biology, Part 7 Similarity Functions and Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Using Sparse Matrix Reordering Algorithms for Cluster Identification Chris Mueller Dec 9, 2004.
Graphical comparison of sequences using “Dotplots”. ACCTGCCCTGTCCAGCTTACATGCATGCTTATAGGGGCATTTTACAT ACCTGCCGATTCCATATTACGCATGCTTCTGGGTTACCGTTCAGGGCATTTTACATGTGCTG.
Measuring the degree of similarity: PAM and blosum Matrix
1 ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES.
DNA sequences alignment measurement
Lecture 8 Alignment of pairs of sequence Local and global alignment
Sequence Similarity Searching Class 4 March 2010.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Lecture invitation AI, 14:30 (we will finish earlier this day, at 14:20) How Lab IT Accelerates Pharmaceutical Research For more information,
Sequencing a genome and Basic Sequence Alignment
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
Special Topics BSC5936: An Introduction to Bioinformatics. Florida State University The Department of Biological Science Sept. 9, 2003.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Presented by Liu Qi Pairwise Sequence Alignment. Presented By Liu Qi Why align sequences? Functional predictions based on identifying homologues. Assumes:
© Wiley Publishing All Rights Reserved.
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
Pairwise & Multiple sequence alignments
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Introduction to Bioinformatics Dot Plots. One of the simplest and oldest methods for sequence alignment Visualization of regions of similarity –Assign.
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
Pairwise Alignment, Part I Constructing the Values and Directions Tables from 2 related DNA (or Protein) Sequences.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology STAT115 Lab 3 PART I Homework Q8 The Dot Matrix Method.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Sequencing a genome and Basic Sequence Alignment
Lecture 9 CS5661 RNA – The “REAL nucleic acid” Motivation Concepts Structural prediction –Dot-matrix –Dynamic programming Simple cost model Energy cost.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Improving the prediction of RNA secondary structure by detecting and assessing conserved stems Xiaoyong Fang, et al.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Learning Objectives for Section 4.5 Inverse of a Square Matrix
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Nucleic Acid Secondarily
Sequence comparisons April 9, 2002 Review homework Learning objectives-Review amino acids. Understand difference between identity, similarity and homology.
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Last lecture summary. identity vs. similarity homology vs. similarity gap penalty affine gap penalty gap penalty high fewer gaps, if investigating related.
Day 7 Carlow Bioinformatics Aligning sequences. What is an alignment? CENTRAL concept in bioinformatics Easy if straight-forward, similar seqs –THISTHESAME.
DNA sequences alignment measurement Lecture 13. Introduction Measurement of “strength” alignment Nucleic acid and amino acid substitutions Measurement.
Last lecture summary.
1 4. Nucleic acids and proteins in one and more dimensions - second part.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Sequence Analysis Topics to be covered What is sequence analysis? Why we need it? How we do it? What are the tools available?
Linear Algebra Review.
Bioinformatics for Research
Sequence comparison: Local alignment
Dot Plots Dot Plots provide a graphic view of the amount of similarity between two sequences. The two axes represent the two sequences. In its simplest.
Special Topics BSC5936: An Introduction to Bioinformatics
Sorting "There's nothing in your head the sorting hat can't see. So try me on and I will tell you where you ought to be." -The Sorting Hat, Harry Potter.
Sequence Analysis Alan Christoffels
It is the presentation about the overview of DOT MATRIX and GAP PENALITY..
Presentation transcript:

Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)

Dot-matrix Alignment Mount Bioinformatics Cold Spring Harbor Laboratory Press, Cold Spring Harbor (2001)

Similarity ≠ Homology, 1) 25% similarity ≥ 100 AAs is likely homology 2) Homology is an evolutionary statement which means “descent from a common ancestor” –common 3D structure –usually common function –all or nothing, cannot say "50% homologous"

Similarity is Based on Dot Plots 1) two sequences on vertical and horizontal axes of graph 2) put dots wherever there is a match 3) diagonal line is region of identity (local alignment) 4) apply a window filter - look at a group of bases, must meet % identity to get a dot

Similitud de dos secuencias ATGCTAGCACGCGTGCGCAA AGGCAAGCGCTCGTGCGTAA %Identidad = 15/20 = 75% Son similares? Depende del punto de corte para el % de identidad escogido

Similitud de dos secuencias ATGCTAGCACGCGTGCGCAA AGGCAAGCGCTCGTGCGTAA identity = 75% Tamaño de ventana (window size) = 20 Nivel de restricción (stringency)= ? Si‘stringency’ son similares Si‘stringency’ >15 ( No son similares

Dot-matrix Alignment Dot-matrix Alignment GCTTAGGCTGA AGGCTGAACTA GCTTAGGCTGA AMM GMMMM GMMM CMM TMMM GMMM AMM AMM CMM TMMM AMM Window = 1 Stringency = 1

Dot-matrix Alignment Dot-matrix Alignment GCTTAGGCTGA AGGCTGAACTA GCTTAGGCTGA AMM GMMMM GMMM CMM TMMM GMMM AMM AMM CMM TMMM AMM Window = 1 Stringency = 1

Dot-matrix Alignment Dot-matrix Alignment GCTTAGGCTGA AGGCTGAACTA GCTTAGGCTGA AMM GMMMM GMMM CMM TMMM GMMM AMM AMM CMM TMMM AMM Window = 1 Stringency = 1

Since this is a comparison between two of the same sequences, an intrasequence comparison, the most obvious feature is the main identity diagonal. Two short perfect palindromes can also be seen as crosses directly off the main diagonal; they are “ANA” and “SIS.” Window = 1 Stringency = 1

Since this is a comparison between two of the same sequences, an intrasequence comparison, the most obvious feature is the main identity diagonal. Two short perfect palindromes can also be seen as crosses directly off the main diagonal; they are “ANA” and “SIS.” Window = 1 Stringency = 1

Dot-matrix Alignment GCTTAGGCTGA A G G C T G A A C T A Window = 3 Stringency = 3

The only remaining dots indicate the two runs of identity between the two sequences; however, any indication of the palindrome, “ANA” has been lost. This is because our filtering approach was too stringent to catch such a short element. In general you need to make your window about the same size as the element you are attempting to locate. In the case of our palindrome, “AN” and “NA”’ are the inverted repeat sequences and since our window was set to three, we will not be able to see an element only two letters long. Had we set our stringency filter to one in a window of two, then these would be visible. The Wisconsin Package’s implementation of dot matrix analysis, the paired programs Compare and DotPlot use the window/stringency method by default.

Dot plot of real data

Another phenomenon that is very easy to visualize with dot matrix analysis are duplications or direct repeats. The ‘duplication’ here is seen as a distinct column of diagonals; whenever you see either a row or column of diagonals in a dotplot, you are looking at direct repeats. Window = 1 Stringency = 1

Now consider the more complicated ‘mutation’ in the following comparison: Again, notice the diagonals. However, they have now been displaced off of the center diagonal of the plot and, in fact, in this example, show the occurrence of a ‘transposition.’ Dot matrix analysis is one of the only sensible ways to locate such transpositions in sequences. Inverted repeats still show up as perpendicular lines to the diagonals, they are just now not on the center of the plot. The ‘deletion’ of ‘PRIMER’ is shown by the lack of a corresponding diagonal. Window = 1 Stringency = 1

Reconsider the same plot. Notice the extraneous dots that neither indicate runs of identity between the two sequences nor inverted repeats. These merely contribute ‘noise’ to the plot and are due to the ‘random’ occurrence of the letters in the sequences, the composition of the sequences themselves. How can we ‘clean up’ the plots so that this noise does not detract from our interpretations? Consider the implementation of a filtered windowing approach; a dot will only be placed if some ‘stringency’ is met. What is meant by this is that if within some defined window size, and when some defined criteria is met, then and only then, will a dot be placed at the middle of that window. Then the window is shifted one position and the entire process is repeated. This very successfully rids the plot of unwanted noise. In the next plot a window of size three and a stringency of two was used to considerably improve the signal to noise ratio (remember, I am using a 1:0 identity scoring function). Filtered Windowing —

Default RNA self comparison (Phe tRNA) (window of 21 and stringency of 14) —

window size to 7 stringency value to 5 Several direct repeats are now obvious that remained obscured in the previous analysis.

22 GAGCGCCAGACT G 12, 22 || | ||||| | A 48 CTGGAGGTCTAG A 3 Base position 22 through position 33 base pairs with (think — is quite similar to the reverse- complement of) itself from base position 37 through position 48. MFold, Zuker’s RNA folding algorithm uses base pairing energies to find the family of optimal and suboptimal structures; the most stable structure found is shown to possess a stem at positions 27 to 31 with 39 to 43. However the region around position 38 is represented as a loop. The actual modeled structure as seen in PDB’s 1TRA shows ‘reality’ lies somewhere in between.

RNA comparisons of the reverse, complement of a sequence to itself can often be very informative. Here the yeast tRNA sequence is compared to its reverse, complement using the same 5 out of 7 stringency setting as previously. The stem-loop, inverted repeats of the tRNA clover-leaf molecular shape become obvious.

That same region ‘zoomed in on’ has some small direct repeats seen by comparing the sequence against itself without reversal:

But looking at the same region of the sequence against its reverse- complement shows a wealth of potential stem-loop structure in the transfer RNA:

Conclusion: Dot-matrix Alignment Strengths:  Simple  All possible matches generated  Can identify repeated sequence elements  Often provides a starting point for other alignment algorithms Weaknesses:  Noise level can be high  Cannot discriminate optimal from suboptimal alignments  Doesn’t handle gaps well