Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Last lecture summary.
Measuring the degree of similarity: PAM and blosum Matrix
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Searching Sequence Databases
Sequence Similarity Searching Class 4 March 2010.
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
1 1. BLAST (Basic Local Alignment Search Tool) Heuristic Only parts of protein are frequently subject to mutations. For example, active sites (that one.
Introduction to bioinformatics
Sequence Analysis Tools
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Similar Sequence Similar Function Charles Yan Spring 2006.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
Thursday and Friday Dr Michael Carton Formerly VO’F group, now National Disease Surveillance Centre (NDSC) Wed (tomorrow) 10am - this suite booked for.
An Introduction to Bioinformatics
Protein Sequence Alignment and Database Searching.
Chapter 11 Assessing Pairwise Sequence Similarity: BLAST and FASTA (Lecture follows chapter pretty closely) This lecture is designed to introduce you to.
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
BLAST Workshop Maya Schushan June 2009.
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
Construction of Substitution Matrices
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Database Similarity Search. 2 Sequences that are similar probably have the same function Why do we care to align sequences?
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Construction of Substitution matrices
Doug Raiford Phage class: introduction to sequence databases.
Step 3: Tools Database Searching
Day 7 Carlow Bioinformatics Aligning sequences. What is an alignment? CENTRAL concept in bioinformatics Easy if straight-forward, similar seqs –THISTHESAME.
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
Sequence Alignment. Assignment Read Lesk, Problem: Given two sequences R and S of length n, how many alignments of R and S are possible? If you.
What is sequencing? Video: WlxM (Illumina video) WlxM.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
BIOINFORMATICS Ayesha M. Khan Spring Lec-6.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Sequence similarity, BLAST alignments & multiple sequence alignments
Sequence comparison: Local alignment
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
BLAST Slides adapted & edited from a set by
Sequence Analysis Alan Christoffels
Presentation transcript:

Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology searching, the equivalent genes in one species to those known to be important in other model species. The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology searching, the equivalent genes in one species to those known to be important in other model species. Logic: if the linear alignment of a pair of sequences is similar, then we can infer that the 3-dimensional structure is similar; if the 3-D structure is similar then there is a good chance that the function is similar. Logic: if the linear alignment of a pair of sequences is similar, then we can infer that the 3-dimensional structure is similar; if the 3-D structure is similar then there is a good chance that the function is similar.

BASIC LOCAL ALIGNMENT SEARCH TOOLS (BLAST) BLAST programs (there are several) compare a query sequence to all the sequences in a database in a pairwise manner. BLAST programs (there are several) compare a query sequence to all the sequences in a database in a pairwise manner. Breaks: query and database sequences into fragments known as "words", and seeks matches between them. Breaks: query and database sequences into fragments known as "words", and seeks matches between them. Attempts to align query words of length "W" to words in the database such that the alignment scores at least a threshold value, "T". known as High-Scoring Segment Pairs (HSPs) Attempts to align query words of length "W" to words in the database such that the alignment scores at least a threshold value, "T". known as High-Scoring Segment Pairs (HSPs) HSPs are then extended in either direction in an attempt to generate an alignment with a score exceeding another threshold, "S", known as a Maximal-Scoring Segment Pair (MSP) HSPs are then extended in either direction in an attempt to generate an alignment with a score exceeding another threshold, "S", known as a Maximal-Scoring Segment Pair (MSP)

2 sequence alignment To align GARFIELDTHECAT with GARFIELDTHERAT is easy GARFIELDTHECAT ||||||||||| || GARFIELDTHERAT

Gaps Sometimes, you can get a better overall alignment if you insert gaps GARFIELDTHECAT |||||||| ||| GARFIELDA--CAT is better (scores higher) than GARFIELDTHECAT||||||||GARFIELDACAT

No gap penalty But there has to be some sort of a gap- penalty otherwise you can align ANY two sequences: G-R--E------AT | | | || GARFIELDTHECAT

Affine gap penalty Could set a score for each indel Could set a score for each indel Usually use affine (open + extend) Usually use affine (open + extend) Open –10, extend Open –10, extend -0.05

2+ similar sequences When doing a similarity search against a database When doing a similarity search against a database you are trying to decide which of many sequences is the CLOSEST match to your search sequence. Which of the following alignment pairs is better?: Which of the following alignment pairs is better?:

Scoring Alignments GARFIELDTHECAT |||| ||||||| GARFRIEDTHECATGARFIELDTHECAT GARWIELESHECATGARFIELDTHECAT GAVGIELDTHEMAT

Willie Taylor’s AA Venn Diagram

Substitution matrices #BLOSUM 90 A R N D C Q E G H I L A R N D C Q E G H I L A R N D C Q E G H I L

Low Complexity Masking Some sequences are similar even if they have no recent Some sequences are similar even if they have no recent common ancestor. Huntington's disease is caused by poly CAG tracks in the DNA which results in polyGlutamine (Gln, Q) tracks in the protein. Huntington's disease is caused by poly CAG tracks in the DNA which results in polyGlutamine (Gln, Q) tracks in the protein. If you do a homology search with QQQQQQQQQQ you get hits to other proteins that have a lot of glutamines but have totally different function. If you do a homology search with QQQQQQQQQQ you get hits to other proteins that have a lot of glutamines but have totally different function.

2 sequence alignment Huntingtin: MATLEKLMKA FESLKSFQQQ QQQQQQQQQQ QQQQQQQQQQ PPPPPPPPPP PQLPQPPPQA hits >MM16_MOUSE MATRIX METALLOPROTEINASE-16 Score = 34.4 bits (78), Expect = 0.18 Identities = 21/65 (32%), Positives = 25/65 (38%), Gaps = 2/65 (3%): FQQQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQ--AQPLLPQPQPPPPPP F Q + + Q Q+ PP PPP LP PP P P+ P PP FYQYMETDNFKLPNDDLQGIQKIYGPPDKIPPPTRPLPTVPPHRSVPPADPRRHDRPKPP But not because it is involved in microtubule mediated transport!

E values An E-value is a measure of the probability of any given hit occurring by chance. An E-value is a measure of the probability of any given hit occurring by chance. Dependent on the size of the query sequence and the database. Dependent on the size of the query sequence and the database. The lower the E-value the more confidence you can have that a hit is a true homologue (sequence related by common descent). The lower the E-value the more confidence you can have that a hit is a true homologue (sequence related by common descent).

Dotplot theory A T G A T A T T C T T A T G T C Task: align ATGATATTCTT and ATTGTTC Another way of comparing 2 sequences

A T G A T A T T C T T A T T G T C Go along the first seq inserting a + wherever 2/3 bases in a moving window match. The first seq is compared to ATT (the first 3 bases in the vertical sequence)

A T G A T A T T C T T A T T G T C Then go along the first seq inserting a + wherever 2/3 bases in a moving window match. The first seq is compared to TTG (the next 3 in the vertical sequence).

A T G A T A T T C T T A T T G T T C Iterate until

A T G A T A T T C T T A T T + + G + + T + + T + C The human eye is particularly good at picking up structure from the pattern of dots. You might see a hint of a duplicated region in the horizontal sequence that is not so clear from the sequence itself