Assessment of sequence alignment Lecture 10 1. Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.

Slides:



Advertisements
Similar presentations
Prokaryotic Gene Regulation:
Advertisements

Computational Biology, Part 7 Similarity Functions and Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Prokaryotic Gene Regulation: Lecture 5. Introduction The two types of transcription regulation control in prokaryotic cells The lac operon an inducible.
Regulation of eukaryotic gene sequence expression Lecture 6.
Measuring the degree of similarity: PAM and blosum Matrix
DNA sequences alignment measurement
Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment.
Matrices A set of elements organized in a table (along rows and columns) Wikipedia image.
Sequencing a genome and Basic Sequence Alignment Lecture 10 1Global Sequence.
Finding Eukaryotic Open reading frames.
Sequence Similarity Searching Class 4 March 2010.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence.
Prokaryotic Gene Regulation:
Finding prokaryotic genes and non intronic eukaryotic genes
Sequencing a genome and Basic Sequence Alignment
Regulation of eukaryotic gene sequence expression
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
© Wiley Publishing All Rights Reserved.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
DOT PLOT Daniel Svozil. Software choice source: Bioinformatics for Dummies.
Essentials of the Living World Second Edition George B. Johnson Jonathan B. Losos Chapter 13 How Genes Work Copyright © The McGraw-Hill Companies, Inc.
Introduction to Bioinformatics Dot Plots. One of the simplest and oldest methods for sequence alignment Visualization of regions of similarity –Assign.
Activate Prior Knowledge
Protein Sequence Alignment and Database Searching.
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Copyright, ©, 2002, John Wiley & Sons, Inc.,Karp/CELL & MOLECULAR BIOLOGY 3E Control of Gene Expression Prokaryotes and Operons.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
Sequencing a genome and Basic Sequence Alignment
M.M. Dalkilic, PhD Monday, September 08, 2008 Class V Indiana University, Bloomington, IN Sequence Homology 1 Sequence Similiarty (Computation) M.M. Dalkilic,
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
Transcription Packet #10 Chapter #8.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Assignment sample solution: Lecture 5. overview Generic types of regulation control Regulation of the “sugar” lactose gene(s) for the bactria e. coli.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
RNA, transcription & translation Unit 1 – Human Cells.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Construction of Substitution matrices
Finding genes in the genome
DNA sequences alignment measurement Lecture 13. Introduction Measurement of “strength” alignment Nucleic acid and amino acid substitutions Measurement.
Transcription and Translation HL 2014!
Bioinformatics Overview
Sequence comparison: Local alignment
GENE EXPRESSION AND REGULATION
Human Cells Gene Expression
Transcription Definition
BLAST.
Pairwise Sequence Alignment
Basic Local Alignment Search Tool
Lecture #7: FASTA & LFASTA
Evolution of Genomes Chapter 21.
It is the presentation about the overview of DOT MATRIX and GAP PENALITY..
Presentation transcript:

Assessment of sequence alignment Lecture 10 1

Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot matching 2 sequences – Tandems repeats self matching – Inverted repeats: genetic palindromes 2

Sequence alignment Analysis In order to measure the degree of similarity between sequences they must first be aligned to maximise the matching score: 3 Example 2 I am ---- from Cork I am not from Cork **** ********** (14 matches out of 18; based on length of bottom string) Example 1 I am from Cork I am not from Cork **** ( 4 matches out of 18; based on length of bottom string)

The Dot plot A better way of doing this is to represent each sequence as a table or matrix, where one sequence represents the rows and the other the columns. The Dot plot Matrix is a visual way of seeing the alignment between two sequences: – The first sequence (query sequence) represents the rows and the other sequence (subject sequence) represents the columns. – All elements (row/column) are checked for a match and if there the cell is marked. – This will show all areas of both sequences where matches occur. 4

Dot plot Consider the following: – Diagnol lines represent a alignments (match) – Horizontal lines between aligned sequences indicate gaps are required (where the gaps indicate a deletion/insertion) – This has four “potential” aligned sequences: – D->Y; – H->N – R->0 – 0->H Longest sequence of alignments are: – D->Y; and H->Y and Do you think, assuming this represented DNA/AA sequences that “gaps” should be used to join these sequences? 5 adapted from Lesk 2008.

Dot plot Matrix This allows us to visualise areas of local alignment as opposed to global alignment. One of the main purpose to find domains / motifs that match. This could be useful for many reasons; e.g. promoter factor binding site…. Can you think of any others (refer to previous lecture’s)? 6

Dot plot: the previous examples Klug 7ed p. 403; There is sample DNA sequences which we will match via: the blast program on the NCBI website:blast program Go to the exploring genomics part of the books website and cut and paste the sequences into the query and subject “windows” You must ensure that you set the search low. [discussed in next lecture] Run the blast program [ a tool that aligns and measures the alignment (discussed in next lecture)] 7

8

Dot plot: example 1 9 Refer to saved web page

Dot plot: example 1 10

Dot plot: Example 2 11

Sequence Matching: Example 2 12

Dot plot for Tandem Repeats The human genome has many tandem repeats small sequences of nucleic acids (bases)/ Amino acids that are repeated and are ubiquitous in genomes and can compromise 50% of genome. (Richard 2008)Richard 2008 They can be used as genealogical markers To determine specific regions of interest; e.g. introns Play a significant part in evolution Gemayel 2010Gemayel 2010 An example of a protein with multiple repeats is human mucin (Baxevanis 2005 p. 297) 13

Dot plot of tandem repeats 14

Tandem repeat as a sequence Tandem repeat 1 ABRACADABRACADABRA ABRACADABRACADABRA Tandem repeat 2 ABRACADABRACADABRA ABRACADABRACADABRA 15

Tandem repeat dot plot To determine if there is tandem repeats the sequence is compared with itself (refer table 1) The more diagonals the more repeats The diagonals at the bottom left compare the start with the finish The fact the main diagonal means the both sequences are the same. The lines are symmetrical around the main diagonal: 16

Genetic Palindromes A palindrome is a word that is spelt the same from right to left as well as from left to write: This will give an “X” shaped dot-plot. (try; eye, navan; never odd or even …..) Remember left to right is (5’ to 3’) on primary strand and right to left is (5’ to 3’) on the complimentary strand. Alternatively it means a match between a strand and its reverse compliment. 2 possible types of “Genetic Palindromes” [the difference being that the left to right, read, is on one strand while the right to left, read, is on its complimentary strand]: – Restrictive enzymes such as EcoR1: 5’ GAATTC 3’ 3’ CTTAAG 5’ – Inverted repeats On different segments; each repeat read the same (GTGAG) but in opposite directions. An example is promoter region for the CAP protein in the lac operon : – 5‘ GTGAGnnnCTCAC 3' 3' CACTCnnnGAGTG 5’ What will the dot plot for the above 2 sequences look like. ( 17

Using dot plot and BLAST Refer to fig 9.19 understanding bioinformatics which shows how the DOT plot and a BLAST can be used to find longer alignments. 18

Exam Question Describe how to construct a dot plot matrix for the alignment of DNA/AA sequences and explain how they can be used to check for the presence of different types repeating sequences. 19

References Baxevanis A.D Bioinformatics: a practical guide to the analysis of genes and proteins chapter 11; Wiley Klug, W. S. (2010); the essentials of genetics; 7 th ed Pearson Education Gemayel, R. et al 2010 Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annu Rev genet 44: Richard, G.F. (2008) Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol Mol biol rev 2008 Dec;72(4): More general DOT PLOT information: introduction to dotplotintroduction to dotplot Inverted repeats and dotplot. Inverted repeats and dotplot 20