© Wiley Publishing All Rights Reserved.

Slides:



Advertisements
Similar presentations
Computational Biology, Part 7 Similarity Functions and Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Advertisements

© Wiley Publishing All Rights Reserved. How Most People Use Bioinformatics.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Lecture 8 Alignment of pairs of sequence Local and global alignment
© Wiley Publishing All Rights Reserved. Analyzing Protein Sequences.
Structural bioinformatics
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Protein Modules An Introduction to Bioinformatics.
Zebra Finch Seg Dup Analysis 1.Genome 2.Parameters for Pipeline 3.Analysis.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
An Introduction to Bioinformatics
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Protein Sequence Alignment and Database Searching.
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Chapter 11 Assessing Pairwise Sequence Similarity: BLAST and FASTA (Lecture follows chapter pretty closely) This lecture is designed to introduce you to.
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
© Wiley Publishing All Rights Reserved. Building Multiple- Sequence Alignments.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
Sequencing a genome and Basic Sequence Alignment
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
COT 6930 HPC and Bioinformatics Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Sequence Alignment.
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Copyright OpenHelix. No use or reproduction without express written consent1.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
What is sequencing? Video: WlxM (Illumina video) WlxM.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Scoring Sequence Alignments Calculating E
Basics of Comparative Genomics
Dot Plots Dot Plots provide a graphic view of the amount of similarity between two sequences. The two axes represent the two sequences. In its simplest.
BLAST.
Basic Local Alignment Search Tool
Basics of Comparative Genomics
MULTIPLE SEQUENCE ALIGNMENT
Basic Local Alignment Search Tool
Sequence Analysis Alan Christoffels
It is the presentation about the overview of DOT MATRIX and GAP PENALITY..
Presentation transcript:

© Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences © Wiley Publishing. 2007. All Rights Reserved.

Learning Objectives Get the basics about dot plots Know how to interpret the most common patterns in a dot plot Use Dotlet Use Lalign to extract local alignments

Outline Some reasons for comparing two sequences Basic principles of dot-plot comparisons Using Dotlet Making local alignments with Lalign

Why Compare Two Sequences? Database searches are useful for finding homologues Database searches don’t provide precise comparisons More precise tools are needed to analyze the sequences in detail including Dot plots for graphic analysis Local or global alignments for residue/residue analysis The alignment of two sequences is called a pairwise alignment

Using The Right Tool

Some Applications of Pairwise Alignments Convince yourself two sequences are homologous Identify a shared domain Identify a duplicated region Locate important features such as Catalytic domains Disulphide bridges Compare a gene and its product

What Is a Dot Plot ? A dot plot is a graphic representation of pairwise similarity The simplicity of dot plots prevents artifacts Ideal for looking for features that may come in different orders Reveal complex patterns Benefit from the most sophisticated statistical-analysis tool in the universe . . . your brain

Choosing Your Two Sequences Making pairwise comparisons takes time Use BLAST to rapidly select your sequences More than 70% identity for DNA More than 25% identity for proteins If your sequences are too similar, comparing them yields no useful information

Self-comparisons Start comparing your sequence with itself You can discover Repeated domains Motifs repeated many times (low complexity) Mirror regions (palindromes) in nucleic acids

What Can You Analyze with a Dot Plot ? Any pair of sequences DNA Proteins RNA DNA with proteins Dotlet is an appropriate tool To compare full genomes, install the program locally Sequences longer than 1000 symbols are hard to analyze online

Some Typical Dot-plot Comparisons Divergent sequences where only a segment is homologous Long insertions and deletions Tandem repeats The square shape of the pattern is characteristic of these repeats

Using Dotlet Dotlet is one of the handiest tools for making dot plots Dotlet is a Java applet Open and download the applet at the following site: www.isrec.isb-sib.ch/java/dotlet Use Firefox or IE (if one doesn’t work, use the other)

Set Dotlet Parameters Dotlet slides a window along each sequence If the windows are more similar than the threshold, Dotlet prints a dot at their intersection You can control the similarity threshold with the little window on the left Window size Window Size Threshold Threshold

The Dotlet Threshold Every dot has a score given by the window comparison When the score is Below threshold 1  black dot Between thresholds 1 and 2  grey dot Above threshold 2  white dot The blue curve is the distribution of scores in the sequences The peak  most common score, Most common  less informative Log curve

Getting Your Dot Plot Right Window size and the stringency control the aspect of your dot plot Very stringent = clean dot plot, little signal Not stringent enough = noisy dot plot, too much signal Play with the threshold until a usable signal appears

Which Size for the Window? Long window Clean dot plots Little sensitivity Short window Noisy dot plots Very sensitive The size of the window should be in the range of the elements you are looking for Conserved domains: 50 amino acids Transmembrane segments: 20 amino acids Shorten the window to compare distantly related sequences

Looking at Repeated Domains with Dotlet The square shape is typical of tandem repeats The repeats are not perfect because the sequences have diverged after their duplication

Comparing a Gene and Its Product Eukaryotic genes are transcribed into RNA The RNA is then spliced to remove the introns’ sequences It may be necessary to compare the gene and its product Dotlet makes this comparative analysis easy

Aligning Sequences Dotlet dot plots are a good way to provide an overview Dot plots don’t provide residue/residue analysis For this analysis you need an alignment The most convenient tool for making precise local alignments is Lalign

Lalign and BLAST Lalign is like a very precise BLAST It works on only two sequences at a time You must provide both sequences

Lalign Output Lalign produces an output similar to the alignment section of BLAST The E-value indicates the significance of each alignment Low E-value  good alignment

Going Farther If you need to align coding DNA with a protein, try these sites: www.tcoffee.org => protogene coot.embl.de/pal2nal If you need to align very large sequences, try this site: www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi If you need a precise estimate of your alignment’s statistical significance, use PRSS The program is available at fasta.bioch.virginia.edu