Sequence Similarity Searching 75321 Class 4 March 2010.

Slides:



Advertisements
Similar presentations
Sequence Alignments.
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
Previous Lecture: Probability
1 ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Structural bioinformatics
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Heuristic alignment algorithms and cost matrices
Overview of sequence database searching techniques and multiple alignment May 1, 2001 Quiz on May 3-Dynamic programming- Needleman-Wunsch method Learning.
Summer Bioinformatics Workshop 2008 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University.
Project Proposals Due Monday Feb. 12 Two Parts: Background—describe the question Why is it important and interesting? What is already known about it? Proposed.
Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Heuristic Approaches for Sequence Alignments
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
Basics of Sequence Alignment and Weight Matrices and DOT Plot
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
Thursday and Friday Dr Michael Carton Formerly VO’F group, now National Disease Surveillance Centre (NDSC) Wed (tomorrow) 10am - this suite booked for.
. Sequence Alignment and Database Searching 2 Biological Motivation u Inference of Homology  Two genes are homologous if they share a common evolutionary.
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Alineamiento Matricial (Harr Plot, Matrix Plot, Dot Plot, Dot Matrix)
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Previous Lecture: Descriptive Statistics ComplexNormalSkewedLong tails.
Construction of Substitution Matrices
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
Arun Goja MITCON BIOPHARMA
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Sequence Alignment.
Construction of Substitution matrices
Sequence comparisons April 9, 2002 Review homework Learning objectives-Review amino acids. Understand difference between identity, similarity and homology.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Copyright OpenHelix. No use or reproduction without express written consent1.
Day 7 Carlow Bioinformatics Aligning sequences. What is an alignment? CENTRAL concept in bioinformatics Easy if straight-forward, similar seqs –THISTHESAME.
Sequence Alignment. Assignment Read Lesk, Problem: Given two sequences R and S of length n, how many alignments of R and S are possible? If you.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Introduction to sequence alignment Mike Hallett (David Walsh)
Bioinformatics for Research
Welcome to Introduction to Bioinformatics
Sequence comparison: Local alignment
Pairwise Sequence Alignment
Basic Local Alignment Search Tool
30% grade = class presentations
Pairwise Alignment Global & local alignment
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
Sequence alignment BI420 – Introduction to Bioinformatics
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Sequence Analysis Alan Christoffels
Presentation transcript:

Sequence Similarity Searching Class 4 March 2010

Why Compare Sequences? Identify sequences found in lab experiments What is this thing I just found? Compare new genes to known ones Compare genes from different species information about evolution Guess functions for entire genomes full of new gene sequences

Are there other sequences like this one? 1) Huge public databases - GenBank, Swissprot, etc. 2) Sequence comparison is the most powerful and reliable method to determine evolutionary relationships between genes 3) Similarity searching is based on alignment 4) BLAST and FASTA provide rapid similarity searching a. rapid = approximate (heuristic) b. false + and - scores

Similarity ≠ Homology 1) 25% similarity ≥ 100 AAs is strong evidence for homology 2) Homology is an evolutionary statement which means “descent from a common ancestor” common 3D structure usually common function homology is all or nothing, you cannot say "50% homologous"

Manually line them up and count? an alignment program can do it for you or a just use a text editor Dot Plot shows regions of similarity as diagonals GATGCCATAGAGCTGTAGTCGTACCCT < — — > CTAGAGAGC-GTAGTCAGAGTGTCTTTGAGTTCC How to Compare Sequences?

Global vs Local similarity 1) Global similarity uses complete aligned sequences - total % matches GCG GAP program, Needleman & Wunch algorithm 2) Local similarity looks for best internal matching region between 2 sequences GCG BESTFIT program, Smith-Waterman algorithm, BLAST and FASTA 3) dynamic programming optimal computer solution, not approximate

Search with Protein, not DNA Sequences 1) 4 DNA bases vs. 20 amino acids - less chance similarity 2) can have varying degrees of similarity between different AAs - # of mutations, chemical similarity, PAM matrix 3) protein databanks are much smaller than DNA databanks

Similarity is Based on Dot Plots 1) two sequences on vertical and horizontal axes of graph 2) put dots wherever there is a match 3) diagonal line is region of identity (local alignment) 4) apply a window filter - look at a group of bases, must meet % identity to get a dot

Simple Dot Plot

Dot plot filtered with 4 base window and 75% identity

Dot plot of real data

Global vs. Local Alignments